• Login
    • University Home
    • Library Home
    • Lib Catalogue
    • Advance Search
    View Item 
    •   IR@KDU Home
    • INTERNATIONAL RESEARCH CONFERENCE ARTICLES (KDU IRC)
    • 2015 IRC Articles
    • Computing
    • View Item
    •   IR@KDU Home
    • INTERNATIONAL RESEARCH CONFERENCE ARTICLES (KDU IRC)
    • 2015 IRC Articles
    • Computing
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    A Comparative Study on Web Scraping

    Thumbnail
    View/Open
    com-059.pdf (431.3Kb)
    Date
    2015
    Author
    De S Sirisuriya, SCM
    Metadata
    Show full item record
    Abstract
    The World Wide Web contains all kinds of information of different origins; some of those are social, financial, security and academic. Most people access information through internet for educational purposes. Information on the web is available in different formats and through different access interfaces. Therefore, indexing or semantic processing of the data through websites could be cumbersome. Web Scraping is the technique which aims to address this issue. Web scraping is used to transform unstructured data on the web into structured data that can be stored and analysed in a central local database or spreadsheet. There are various web scraping techniques including Traditional copy-andpaste, Text grapping and regular expression matching, HTTP programming, HTML parsing, DOM parsing, Webscraping software, Vertical aggregation platforms, Semantic annotation recognizing and Computer vision web-page analysers. Traditional copy and paste is the basic and tiresome web scraping technique where people need to scrap lots of datasets. Web scraping software is the easiest scraping technique since all the other techniques except traditional copy and paste require some form of technical expertise. There are hundreds of web scraping software available today, most of them designed by using Java, Python and Ruby. There are also some open source web scraping software and as well as commercial software. Web scraping software such as YahooPipes, Google Web Scrapers and Outwit Firefox extensions are the best tools for beginners in web scraping. This study focused on giving comparative clarification about web scraping techniques and famous web scraping software. To accomplish this, we compare and contrast several web scraping techniques and some famous web scraping software. The outcome of this study offers a review on web scraping techniques and software which can be used to extract data from educational web sites.
    URI
    http://ir.kdu.ac.lk/handle/345/1051
    Collections
    • Computing [32]

    Library copyright © 2017  General Sir John Kotelawala Defence University, Sri Lanka
    Contact Us | Send Feedback
     

     

    Browse

    All of IR@KDUCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsFacultyDocument TypeThis CollectionBy Issue DateAuthorsTitlesSubjectsFacultyDocument Type

    My Account

    LoginRegister

    Library copyright © 2017  General Sir John Kotelawala Defence University, Sri Lanka
    Contact Us | Send Feedback