Data Scraping
Data Scraping, a technique in which a computer program extracts data from website and import into a file saved on your computer. It is used for various purposes like
research for business intelligence,
competitor monitoring,
pricing negotiation,
product optimization,
investment decisions,
to gather public opinion,
to improve online reputation,
social media insights,
lead generation,
fake review detection
How Scraping works?
Tools for Data Scraping
There are various tools available for data scraping. Some of the tools are listed below:
1. ParseHub
2. Scrapy
3. OctoParse
4. Scraper API
5. Mozenda
6. Webhose.io
7. Content Grabber
8. Common Crawl
In this blog, let us discuss in detail about the Octoparse tool.
Octoparse 8
Octoparse is a widely used data scraper tool without having to write a single line of code. It is a modern software tool where both experienced and inexperienced users would find easy to extract data and allows the user to save it as a clear structured data in a format of their choice.
Let us see a simple example of extracting data using Octoparse 8:
1. Copy paste the URL which you want to scrape in the home page of Octoparse tool.
2. Once you click start, web page will be auto detected.
3. Once auto detection completes, you can see the data scraped in the bottom of the page. You can delete or rename the columns. Then click save and then run.
4. Click run on your device.
5. You can see an option for export data once the process is completed and we can export data and save it in the desired format.
Data Driven Scenarios
Let us see some of the data driven scenarios for the above case study.
Feature: As a traveler, I would like to know the Outer banks Hilton hotel details so that I can plan for my vacation.
Scenario 1: Searching Hotel Name
Given I am on https://tinyurl.com/ve9wy632
when searching for hotel name
then show "Hilton Garden Inn Outer Banks/Kitty Hawk"
Scenario 2: Searching Price
Given I am on https://tinyurl.com/ve9wy632
when searching for price
then show "$342"
Scenario 3: Searching Site
Given I am on https://tinyurl.com/ve9wy632
when searching for site
then show "official"
Scenario 4: Searching Cancellation Fee
Given I am on https://tinyurl.com/ve9wy632
when searching for cancellation fee
then show "free cancellation"
Scenario 5: Searching Review Rating
Given I am on https://tinyurl.com/ve9wy632
when searching for review rating
then show "4.5 star"
Summary
Octoparse is an easy data scraping tool which can be used by both experienced and inexperienced users. Let’s scrape data with Octoparse!