What is Data Scraping?
Data scraping also known as web scraping , is the process of importing information from a website into a Excel or local file. It's one of the most efficient ways to get data from the web.
What are the Tools used?
Many tools are used some are
· Octoparse
· Scraping Bee
· Import io
· Scrape box
Octoparse :
This tool is used by coders and non-coders for web scraping. It can scrape large amount of web data and store in the excel, csv, json files.
The tool is user friendly where can learn the configuration easily.It has a free edition for the trial and lots of resources to learn about the tool. The main advantages are
· We can schedule the data scraping and get updates regularly
· Scrape data without coding
· Cloud based scraping data with speed and productivity
Now we can see the few Gherkin scenarios with Octoparse:
Scenario: Create the task with task templates prebuild
Given: user login in the octoparse
When: The user click on task template
Then: the user should see the Task template
Scenario: Using the URL in the task template
Given: the user in the task template
When: The user can paste the URL manually
Then: the user should be able to save and run
Scenario :scrape the data Using the advanced mode template
Given: The user use the URL wants to scrape in advanced mode
When : click on the advance mode
Then : The user in the advance mode template with the URL pasted
Scenario: Create a pagination loop for every page
Given: user in the workflow mode
when : scroll down and click the next
Then : click the loop click to scrape the data in every page
Scenario: Data to be extracted from the web page
Given: user enter the URL and in work flow model
When: select the item, price and other needed
Then : click on extract data on Action tips
Scenario: saving the task
Given: The task with URL is in new task
when: The data to be scraped is selected
Then: click the save the task should be saved
Scenario: Run the task in advance mode
Given: the task is saved already
When : The completed set of data to be extracted is selected in fields and click the run Then : The run should start
Comments