top of page

Web Scrapping

Web scraping refers to the extraction of data from a website. This information is collected and then exported into a format that is more useful for the user. Be it a spreadsheet or an API.

Although web scraping can be done manually, in most cases, automated tools are preferred when scraping web data as they can be less costly and work at a faster rate.

WebSites Used :


Python , Selenium, Chrome Driver

Quickstart :

Once you have downloaded both Chrome and Chromedriver and installed the Selenium package, you should be ready to start the browser

This will launch Chrome in headfull mode (like regular Chrome, which is controlled by your Python code). You should see a message stating that the browser is controlled by automated software.

To run Chrome in headless mode (without any graphical user interface), you can run it on a server. See the following example:

Here are two other interesting Web Driver properties:

  • driver.title gets the page's title

  • Driver.current_url  gets the current URL (this can be useful when there are redirections on the website and you need the final URL)

Locating Elements

There are many methods available in the Selenium API to select elements on the page. You can use:

  • Tag name

  • Class name

  • IDs

  • XPath

  • CSS selectors


There are many ways to locate an element in selenium. Let's say that we want to locate the tr tag in this HTML:


A Web Element is a Selenium object representing an HTML element.

There are many actions that you can perform on those HTML elements, here are the most useful:

  • Accessing the text of the element with the property element.text()

  • Clicking on the element with

  • Accessing an attribute with element.get_attribute(‘class’)

  • Sending text to an input with: element.send_keys(‘mypassword’)

There are some other interesting methods like is_displayed(). This returns True if an element is visible to the user.

Executing Javascript

Blocking images and JavaScript

With Selenium, by using the correct Chrome options, you can block some requests from being made.

This can be useful if you need to speed up your scrapers or reduce your bandwidth usage.

To do this, you need to launch Chrome with the below options:

Print to Data Dictionary or Json and Excel

Observe the output in Json format and CSV

The End

57 views0 comments

Recent Posts

See All


דירוג של 0 מתוך 5 כוכבים
אין עדיין דירוגים

הוספת דירוג
bottom of page