Web scraping is the process of extracting data from a website. This process is generally automated by a bot or a web crawler at scheduled times. Once the required data is extracted, it can be used to parse through, searched, filtered or reformatted and exported to a database, spreadsheet etc. It is used for applications involving price change monitoring, weather data monitoring, website change detection etc.
Let’s examine the Gherkin test cases for one such tool, Octoparse
Feature: Loading and logging into the Scraper tool (eg:Octoparse)
Scenario: Opening the Octoparse app
Given: The app is downloaded
When: The user clicks on the Octoparse app
Then: The octoparse app opens and loads
Scenario: Registering into the Octoparse app
Given: The Octoparse login page is open and the user is not already registered
When: The user clicks on ‘Sign up for Free’
Then: The octoparse app opens a signup page to get new login details
Scenario: Signing up with new login
Given: The Sign up pop up is open
When: The user enters correct details to sign up and clicks on the submit button
Then: The user is registered to the app and logged in
Scenario: Logging into the Octoparse app
Given: The Octoparse login page is open, and the user is already registered
When: The user enters the correct username and password and clicks on the Login button
Then: The user logs into the octoparse app and the home page is displayed
Scenario: Incorrect Login details
Given: The Octoparse login page is open and the user is already registered
When: The user enters incorrect username or password and clicks on the Login button
Then: An error occurs displaying ‘Invalid credentials, please try again’
Feature: ‘New’ button features
Scenario: ‘New’ button click options
Given: The user is on the home page
When: The user clicks on the ‘New’ button
Then: The user should be able to select from the options ‘Advanced Mode’, ‘Task Template’, ‘Import’, Tasks’, ‘Create a new group’
Feature : Creating a custom new task
Scenario: Starting a new custom task
Given: The user is on the home page
When: The user enters the URL(generated with the search keyword) and clicks on the ‘Start’ button
Then: Octoparse should start loading the page along with processing the data extraction
Scenario: Data extraction from a new custom task
Given: The URL is entered and the start button is clicked
When: The page is loaded completely
Then: Octoparse should display the extracted data with some preselected elements as a table
Scenario: Turn off or cancel auto detect
Given: The page is loading, and the data is being extracted through the auto detect feature
When: User should be able to click on a ‘Turn off auto detect’, ‘Cancel auto detect’ button
Then: The auto detection should stop and the data should not be extracted
Scenario: Creating a workflow with the Auto detection
Given: The extracted data is displayed as a table
When: All the data needed by the user is available in the displayed table
Then: The user should be able to proceed by Creating the workflow (saving the settings)
Scenario: Add a page scroll
Given: The page is loaded and table with auto detection results has been populated
When: The user wants to add a page scroll to the extraction
Then: The user should be able to select a checkbox ‘add a page scroll’
Scenario: Edit a page scroll setup
Given: The page is loaded and table with auto detection results has been populated
When: The user wants to edit the page scroll setup
Then: The user should be able to edit the repeats, wait time, etc of a page scroll
Scenario: Edit the Paginate set up
Given: The page is loaded and table with auto detection results has been populated
When: The user wants to edit the pagination set up
Then: The user should be able to edit the pagination set up
Scenario: Switch the auto-detect results for at least 5 times
Given: The extracted data is displayed as a table
When: The user needs more elements to be extracted apart from the auto detection data
Then: The user should be able to switch auto-detect results at least 5 times
Scenario: Switch the auto-detect results button
Given: The user needs more elements to be extracted apart from the auto detection data
When: The user clicks on ‘switch auto-detect results’
Then: A new table with different set of elements should be extracted and the auto detect result trail number should be increased by 1
Scenario: Switch the auto-detect results button
Given: The user needs more elements to be extracted apart from the auto detection data
When: The user clicks on ‘switch auto-detect results’ more the 5 times
Then: An alert should be displayed intimating ‘the user has tried the maximum trial of auto detect for 5 times already. The user can manually edit the extraction’
Feature: Edit the task
Scenario: Edit the task manually
Given: The auto detect data is displayed as a table
When: The user clicks on + and ‘select an element on the page’ and clicks on the required element
Then: The selected element should get added to the extraction table
Scenario: Edit the layout
Given: Data has been extracted and displayed as a table
When: The user wants edit the layout
Then: The user should be able edit the workflow created with the help of an edit button and a more button
Scenario: Rearrange the layout
Given: Data has been extracted and displayed as a table
When: The user wants sort the layout by rearranging the columns
Then: The user should be drag and drop the columns to rearrange the layout
Scenario: Edit the elements
Given: Data has been extracted and displayed as a table
When: The user clicks on the edit button on an element(column)
Then: The user should be able to edit the name if the element (The title of the column) in the workflow table
Scenario: Other edit features
Given: Data has been extracted and displayed as a table
When: The user clicks on the more button on an element(column)
Then: The user should be able to perform the following actions customize the Xpath, customize the fields, clean data, combine data, when data cannot be found, delete, copy
Scenario: Delete individual data
Given: Data has been extracted and displayed as a table
When: The user clicks on the delete button of a particular dataset(row)
Then: The corresponding dataset(row) should be deleted
Now lets see the test cases for extracting a laptop search on the BestBuy website using Octoparse
Feature: Extracting data for a laptop search on Bestbuy website
Scenario: Start the extraction for a laptop search on BestBuy
Given: The user is logged into the app and is on the home page
When: The user enters the URL (used for laptop search on the BestBuy) and clicks on the ‘Start’ button
Then: The page is loaded with all the available laptops and is ready for auto detect or a manual extraction
Scenario: Auto detect of the web page
Given: The web page of the laptop search has been loaded
When: The user clicks on the auto detect web page from the ‘Tips’ pop up
Then: A table should be populated with certain elements from the search. In this case: name of the laptop, price of the laptop, Model of the laptop etc.
Scenario: Add price element to the table
Given: The page has been fetched
When: The user clicks on the price element and 'Extract the text of the selected element’ from the ‘tips’ pop up.
Then: The price element should be added to the table.
Scenario: Add model number element to the table
Given: The page has been fetched
When: The user clicks on the model number element and 'Extract the text of the selected element’ from the ‘tips’ pop up.
Then: The model number element should be added to the table.
Scenario: Add SKU element to the table
Given: The page has been fetched
When: The user clicks on the SKU element and 'Extract the text of the selected element’ from the ‘tips’ pop up.
Then: The SKU element should be added to the table.
Scenario: Edit the layout
Given: A table is populated with certain elements of the search by auto detection
When: The user clicks on the edit button on a column of the extracted table
Then: The user should be able to edit the column name. eg: ‘Model_Name’
Scenario : Delete from the layout
Given: A table is populated with certain elements of the search by auto detection
When: The user clicks on the delete button on a column of the extracted table
Then: The user should be able to delete the unwanted columns. Eg: Delete the review number column
Scenario : Create the workflow
Given: The table is populated with the needed data
When: The user clicks on Create workflow button
Then: The workflow should be populated on the left navigation bar
Scenario : Save the workflow
Given: The workflow has been created
When: The user clicks on Save button
Then: The user should be able to save the workflow
Scenario : Run the workflow
Given: The workflow has been saved
When: The user clicks on Run button
Then: A pop should be displayed with options ‘Run on your device’, ‘Schedule (local)’ or ‘ Run in the cloud’, ‘Schedule (cloud)’
Scenario : Run on your device
Given: On clicking the ‘Run’ button a pop up is displayed with options ‘Run on your device’, ‘Schedule (local)’ or ‘ Run in the cloud’, ‘Schedule (cloud)’
When: The user clicks on ‘Run on your device’ button
Then: A pop should be displayed with all the extracted data in the user defined layout
Scenario : Stop the run
Given: Data is extracting and being displayed on the pop up
When: The user clicks on ‘Stop Run’ button and yes on the confirmation pop up
Then: The system should stop the run and display options to save or export the data
Scenario : Export the data
Given: The data has been extracted
When: The user clicks on ‘Export the data’ button
Then: Then the data should be exported based on the option selected. eg: ‘Export to Spreadsheet’ will save a excel file with all the data extracted.
Now lets see the test cases for detecting the data of a HP laptop
Feature: Detecting data from a specific URL
Scenario: Detect the title of the laptop
Given: User is on ‘https://www.bestbuy.com/site/hp-14-laptop-amd-athlon-4gb-memory-128gb-ssd-jet-black/6450167.p?skuId=6450167’
When: Detecting the title
Then: Show ‘HP - 14" Laptop - AMD Athlon - 4GB Memory - 128GB SSD - Jet Black’
Scenario: Detect the model of the laptop
Given: User is on ‘https://www.bestbuy.com/site/hp-14-laptop-amd-athlon-4gb-memory-128gb-ssd-jet-black/6450167.p?skuId=6450167’
When: Detecting the model of the laptop
Then: Show ‘14-dk1013dx’
Scenario: Detect the SKU of the laptop
Given: User is on ‘https://www.bestbuy.com/site/hp-14-laptop-amd-athlon-4gb-memory-128gb-ssd-jet-black/6450167.p?skuId=6450167’
When: Detecting the SKU of the laptop
Then: Show ‘6450167’
Scenario: Detect the price of the laptop
Given: User is on ‘https://www.bestbuy.com/site/hp-14-laptop-amd-athlon-4gb-memory-128gb-ssd-jet-black/6450167.p?skuId=6450167’
When: Detecting the price of the laptop
Then: Show ‘$299.99’
Scenario: Detect the rating of the laptop
Given: User is on ‘https://www.bestbuy.com/site/hp-14-laptop-amd-athlon-4gb-memory-128gb-ssd-jet-black/6450167.p?skuId=6450167’
When: Detecting the rating of the laptop
Then: Show ‘4.2’
Comentarios