Internet - which is widely used around the world, is an ocean of data. When used wisely, its great for businesses, marketing, sales, networking and what not!
Its very important to have relevant quality data than the quantity!
How do we do that?
The simple answer would be Web Scraping.
Web scraping is nothing but data extraction from the websites. A specific form of data is gathered and copied from the web to database or spreadsheet.
There are many advantages of web scraping for businesses namely:
Easy to implement.
It turns any unstructured web data into data ready for research, marketing, growth, sales, etc.
Price monitoring.
It provides data about what kind of products the customers are interested in.
Networking with right audience.
There are many tools which are automated to perform web scrapping, one of them is Octoparse.
Octoparse is an easy to use web crawler application, used to extract data without writing any code. It is easy to use because it is UI based. The data can be extracted in Excel, CSV, JSON file format.
Few advantages of Octoparse are :
- You can build crawler without writing code
- It offers unlimited storage
- It provides option to extract data in the cloud as well as local machine.
Lets write few BDD scripts to test Octoparse below:
Feature - Login page
Scenario 1 - Successul login to the app
Given you enter valid credentials
When you click on "Login" button
Then you should be successully logged in
Feature 2 - Extracting the data using advance mode
Scenario 1: Enter the url for macbook air
Given you are logged into your account
When you enter url for macbook air https://www.ebay.co.uk/itm/macbook-air-13-inch-early-2015/353451506203?hash=item524b59fa1b:g:hHgAAOSwTdRgb0no
Then you should be taken to data preview page with data fields like Title, title_URL, Image, price, Info, Info1
Scenario 2: Creating workflow
Given you are in data preview page
When you click on "Create workflow" button
Then workflow should be created on the left hand side with loop_item, loop_item1
Scenario 3: SAve and run the task
Given workflow is created
When you click on "Save" and "Run"
Then "Run on your device" option should be displayed.
Scenario 4: Run completed
Given workflow is created
When you click on "Run on your device"
Then Run completed pop up should be displayed with task name as "macbook air 13 inch early 2015 | eBay
Scenario 5:
Given you click on "Export Data"
When you select "Excel" option
Then Data should be extracted in local machine in "Excel" format
And option to Open file should be displayed.
Feature 3: Extracting data using Templates
Scenario 1: Click on task template option
Given you are logged into your account
When you hover on the "New" button
Then click on "Task Template" option
Scenario 2: Task template page
Given you hover on the "New" button
When you click on "Task Template" option
Then it should take you to "Task Template" page
Senario 3: Categories of templates
Given you are logged into your account
When you are in "Task Template" page
Then page should display different categories of templates
Scenario 4: ebay template in "Hot" category
Given you are in "Task Template" page
When you click on ebay template
Then different options in ebay category should be displayed
Scenario 5: Details page ebay template
Given you are in ebay template page
When you click on "Detail Pages eBay" option
Then "Detail pages ebay" template page
Scenario 6: "task info" page
Given you are in "Detail Pages eBay" template page
When you click on "Try it" button
Then it should display "Task info" page
Scenario 7: Run task page
Given you are in "Task info" page
When you enter valid data in all the fields
And click on "Save and Run"
Then "Run on your device" and "Run in the cloud" options should display in Run task page.
Scenario 8: Data extracted page
Given you are in Run task page
When you click on "Run on your device" button
Then it should display "Data Extracted" page
Scenario 9: Export data
Given you are in "Data Extracted" page
When you click on "Export data" button
Then data should be extracted.
Scenario 10:Export later
Given you are in "Data Extracted" page
When you click on "Export later" button
Then data should not be extracted.
Comments