We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Reflecting Design Considerations: An End-to-End Case Study on Preparing Cricket Data Available on Net Analysis Ready.
- Authors
Ray, Subhasis; Sengupta, Kalyan
- Abstract
The use of Internet as a source of secondary data is becoming more popular day by day. Websites are made up of webpages that contain a huge volume of useful information in textual form. However, webpages are coded using text-based mark-up languages (e.g., HTML, XHTML, XML, etc.) to facilitate end-user viewing rather than any automated use of them. This has led to a new science called web scraping that fetches webpages and then extracts data for future use. Many organizations have picked up this business opportunity to come up with efficient web scraping tools. The paper exposes the readers to how data can be sourced from the internet for scientific or commercial purpose. This elaborates on the available design options for data fetching, extracting, validating and transforming in the absence of any end-to-end tool or to supplement a tool. This is followed up by a specific case study which deals with reactive analysis of structured data from multiple predetermined sources/pages. This paper concludes that design considerations for web scraping have to be dynamic. Neither traditional copy-andpaste nor trapping feeds using Application Programming Interfaces (API) nor Java, Python or R programming nor the end-to-end tool available is uniformly better than the rest.
- Subjects
CRICKET (Sport); HTML (Document markup language); ELECTRONIC data processing; DATA transformations (Statistics); PYTHON programming language
- Publication
IUP Journal of Information Technology, 2018, Vol 14, Issue 3, p7
- ISSN
0973-2896
- Publication type
Article