Web Scraping

Updated: Apr 17


Web scraping is the process of extracting data from a website. This process is generally automated by a bot or a web crawler at scheduled times. Once the required data is extracted, it can be used to parse through, searched, filtered or reformatted and exported to a database, spreadsheet etc. It is used for applications involving price change monitoring, weather data monitoring, website change detection etc.


Let’s examine the Gherkin test cases for one such tool, Octoparse



Feature: Loading and logging into the Scraper tool (eg:Octoparse)


Scenario: Opening the Octoparse app

Given: The app is downloaded

When: The user clicks on the Octoparse app

Then: The octoparse app opens and loads


Scenario: Registering into the Octoparse app

Given: The Octoparse login page is open and the user is not already registered

When: The user clicks on ‘Sign up for Free’

Then: The octoparse app opens a signup page to get new login details


Scenario: Signing up with new login

Given: The Sign up pop up is open

When: The user enters correct details to sign up and clicks on the submit button

Then: The user is registered to the app and logged in


Scenario: Logging into the Octoparse app

Given: The Octoparse login page is open, and the user is already registered

When: The user enters the correct username and password and clicks on the Login button

Then: The user logs into the octoparse app and the home page is displayed


Scenario: Incorrect Login details

Given: The Octoparse login page is open and the user is already registered

When: The user enters incorrect username or password and clicks on the Login button

Then: An error occurs displaying ‘Invalid credentials, please try again’


Feature: ‘New’ button features


Scenario: ‘New’ button click options

Given: The user is on the home page

When: The user clicks on the ‘New’ button

Then: The user should be able to select from the options ‘Advanced Mode’, ‘Task Template’, ‘Import’, Tasks’, ‘Create a new group’



Feature : Creating a custom new task


Scenario: Starting a new custom task

Given: The user is on the home page

When: The user enters the URL(generated with the search keyword) and clicks on the ‘Start’ button

Then: Octoparse should start loading the page along with processing the data extraction


Scenario: Data extraction from a new custom task

Given: The URL is entered and the start button is clicked

When: The page is loaded completely

Then: Octoparse should display the extracted data with some preselected elements as a table


Scenario: Turn off or cancel auto detect

Given: The page is loading, and the data is being extracted through the auto detect feature

When: User should be able to click on a ‘Turn off auto detect’, ‘Cancel auto detect’ button

Then: The auto detection should stop and the data should not be extracted


Scenario: Creating a workflow with the Auto detection

Given: The extracted data is displayed as a table

When: All the data needed by the user is available in the displayed table

Then: The user should be able to proceed by Creating the workflow (saving the settings)


Scenario: Add a page scroll

Given: The page is loaded and table with auto detection results has been populated

When: The user wants to add a page scroll to the extraction

Then: The user should be able to select a checkbox ‘add a page scroll’


Scenario: Edit a page scroll setup

Given: The page is loaded and table with auto detection results has been populated

When: The user wants to edit the page scroll setup

Then: The user should be able to edit the repeats, wait time, etc of a page scroll



Scenario: Edit the Paginate set up

Given: The page is loaded and table with auto detection results has been populated

When: The user wants to edit the pagination set up

Then: The user should be able to edit the pagination set up



Scenario: Switch the auto-detect results for at least 5 times

Given: The extracted data is displayed as a table

When: The user needs more elements to be extracted apart from the auto detection data

Then: The user should be able to switch auto-detect results at least 5 times


Scenario: Switch the auto-detect results button

Given: The user needs more elements to be extracted apart from the auto detection data

When: The user clicks on ‘switch auto-detect results’

Then: A new table with different set of elements should be extracted and the auto detect result trail number should be increased by 1


Scenario: Switch the auto-detect results button

Given: The user needs more elements to be extracted apart from the auto detection data

When: The user clicks on ‘switch auto-detect results’ more the 5 times

Then: An alert should be displayed intimating ‘the user has tried the maximum trial of auto detect for 5 times already. The user can manually edit the extraction’



Feature: Edit the task


Scenario: Edit the task manually

Given: The auto detect data is displayed as a table

When: The user clicks on + and ‘select an element on the page’ and clicks on the required element

Then: The selected element should get added to the extraction table


Scenario: Edit the layout

Given: Data has been extracted and displayed as a table

When: The user wants edit the layout

Then: The user should be able edit the workflow created with the help of an edit button and a more button


Scenario: Rearrange the layout

Given: Data has been extracted and displayed as a table

When: The user wants sort the layout by rearranging the columns

Then: The user should be drag and drop the columns to rearrange the layout


Scenario: Edit the elements

Given: Data has been extracted and displayed as a table

When: The user clicks on the edit button on an element(column)

Then: The user should be able to edit the name if the element (The title of the column) in the workflow table


Scenario: Other edit features

Given: Data has been extracted and displayed as a table

When: The user clicks on the more button on an element(column)

Then: The user should be able to perform the following actions customize the Xpath, customize the fields, clean data, combine data, when data cannot be found, delete, copy


Scenario: Delete individual data

Given: Data has been extracted and displayed as a table

When: The user clicks on the delete button of a particular dataset(row)

Then: The corresponding dataset(row) should be deleted





Now lets see the test cases for extracting a laptop search on the BestBuy website using Octoparse


Feature: Extracting data for a laptop search on Bestbuy website


Scenario: Start the extraction for a laptop search on BestBuy

Given: The user is logged into the app and is on the home page

When: The user enters the URL (used for laptop search on the BestBuy) and clicks on the ‘Start’ button

Then: The page is loaded with all the available laptops and is ready for auto detect or a manual extraction


Scenario: Auto detect of the web page

Given: The web page of the laptop search has been loaded

When: The user clicks on the auto detect web page from the ‘Tips’ pop up

Then: A table should be populated with certain elements from the search. In this case: name of the laptop, price of the laptop, Model of the laptop etc.


Scenario: Add price element to the table

Given: The page has been fetched

When: The user clicks on the price element and 'Extract the text of the selected element’ from the ‘tips’ pop up.

Then: The price element should be added to the table.


Scenario: Add model number element to the table

Given: The page has been fetched

When: The user clicks on the model number element and 'Extract the text of the selected element’ from the ‘tips’ pop up.

Then: The model number element should be added to the table.


Scenario: Add SKU element to the table

Given: The page has been fetched

When: The user clicks on the SKU element and 'Extract the text of the selected element’ from the ‘tips’ pop up.

Then: The SKU element should be added to the table.


Scenario: Edit the layout

Given: A table is populated with certain elements of the search by auto detection

When: The user clicks on the edit button on a column of the extracted table

Then: The user should be able to edit the column name. eg: ‘Model_Name’


Scenario : Delete from the layout

Given: A table is populated with certain elements of the search by auto detection

When: The user clicks on the delete button on a column of the extracted table

Then: The user should be able to delete the unwanted columns. Eg: Delete the review number column


Scenario : Create the workflow

Given: The table is populated with the needed data

When: The user clicks on Create workflow button

Then: The workflow should be populated on the left navigation bar


Scenario : Save the workflow

Given: The workflow has been created

When: The user clicks on Save button

Then: The user should be able to save the workflow


Scenario : Run the workflow

Given: The workflow has been saved

When: The user clicks on Run button

Then: A pop should be displayed with options ‘Run on your device’, ‘Schedule (local)’ or ‘ Run in the cloud’, ‘Schedule (cloud)’


Scenario : Run on your device

Given: On clicking the ‘Run’ button a pop up is displayed with options ‘Run on your device’, ‘Schedule (local)’ or ‘ Run in the cloud’, ‘Schedule (cloud)’

When: The user clicks on ‘Run on your device’ button

Then: A pop should be displayed with all the extracted data in the user defined layout


Scenario : Stop the run

Given: Data is extracting and being displayed on the pop up

When: The user clicks on ‘Stop Run’ button and yes on the confirmation pop up

Then: The system should stop the run and display options to save or export the data


Scenario : Export the data

Given: The data has been extracted

When: The user clicks on ‘Export the data’ button

Then: Then the data should be exported based on the option selected. eg: ‘Export to Spreadsheet’ will save a excel file with all the data extracted.



Now lets see the test cases for detecting the data of a HP laptop


Feature: Detecting data from a specific URL


Scenario: Detect the title of the laptop

Given: User is on ‘https://www.bestbuy.com/site/hp-14-laptop-amd-athlon-4gb-memory-128gb-ssd-jet-black/6450167.p?skuId=6450167’

When: Detecting the title

Then: Show ‘HP - 14" Laptop - AMD Athlon - 4GB Memory - 128GB SSD - Jet Black’


Scenario: Detect the model of the laptop

Given: User is on ‘https://www.bestbuy.com/site/hp-14-laptop-amd-athlon-4gb-memory-128gb-ssd-jet-black/6450167.p?skuId=6450167’

When: Detecting the model of the laptop

Then: Show ‘14-dk1013dx


Scenario: Detect the SKU of the laptop

Given: User is on ‘https://www.bestbuy.com/site/hp-14-laptop-amd-athlon-4gb-memory-128gb-ssd-jet-black/6450167.p?skuId=6450167’

When: Detecting the SKU of the laptop

Then: Show ‘6450167


Scenario: Detect the price of the laptop

Given: User is on ‘https://www.bestbuy.com/site/hp-14-laptop-amd-athlon-4gb-memory-128gb-ssd-jet-black/6450167.p?skuId=6450167’

When: Detecting the price of the laptop

Then: Show ‘$299.99’


Scenario: Detect the rating of the laptop

Given: User is on ‘https://www.bestbuy.com/site/hp-14-laptop-amd-athlon-4gb-memory-128gb-ssd-jet-black/6450167.p?skuId=6450167’

When: Detecting the rating of the laptop

Then: Show ‘4.2’




28 views0 comments

Recent Posts

See All