Scraping The Data With Octoparse

What is Data Scraping?

Data scraping also known as web scraping , is the process of importing information from a website into a Excel or local file. It's one of the most efficient ways to get data from the web.

What are the Tools used?

Many tools are used some are

· Octoparse

· Scraping Bee

· Import io

· Scrape box

Octoparse :

This tool is used by coders and non-coders for web scraping. It can scrape large amount of web data and store in the excel, csv, json files.

The tool is user friendly where can learn the configuration easily.It has a free edition for the trial and lots of resources to learn about the tool. The main advantages are

· We can schedule the data scraping and get updates regularly

· Scrape data without coding

· Cloud based scraping data with speed and productivity

Now we can see the few Gherkin scenarios with Octoparse:

Scenario: Create the task with task templates prebuild

Given: user login in the octoparse

When: The user click on task template

Then: the user should see the Task template

Scenario: Using the URL in the task template

Given: the user in the task template

When: The user can paste the URL manually

Then: the user should be able to save and run

Scenario :scrape the data Using the advanced mode template

Given: The user use the URL wants to scrape in advanced mode

When : click on the advance mode

Then : The user in the advance mode template with the URL pasted

Scenario: Create a pagination loop for every page

Given: user in the workflow mode

when : scroll down and click the next

Then : click the loop click to scrape the data in every page

Scenario: Data to be extracted from the web page

Given: user enter the URL and in work flow model

When: select the item, price and other needed

Then : click on extract data on Action tips

Scenario: saving the task

Given: The task with URL is in new task

when: The data to be scraped is selected

Then: click the save the task should be saved

Scenario: Run the task in advance mode

Given: the task is saved already

When : The completed set of data to be extracted is selected in fields and click the run Then : The run should start

11 views0 comments

Recent Posts

See All

How to Create your Own API

APIs are everywhere, and they play a vital role in modern-day technology. Application Programming Interface, which is a software intermediary that allows two applications to talk to each other. Each t