top of page

What is Selenium and Architecture of Selenium

sirishasaripalli20

Selenium is an open-source automation testing tool to automate web applications enabling testers and developers to automate browser interactions and perform functional testing.

Selenium is a testing framework used to validate web applications across different browsers and platforms.

Selenium supports multiple languages like Java, Python, Ruby, etc to create Selenium Test scripts

Testing done using the Selenium Testing tools is usually called Selenium Testing.


Who developed Selenium?

Selenium was first created by Jason Huggins in the year 2004

He created this when Manual testing was repetitive and was becoming increasingly inefficient, he created a JavaScript program that would automatically control the browser’s actions.

He named his program as “JavaScriptTestRunner” which was later renamed as Selenium Core.


Components of Selenium

There are multiple different selenium components, also called Selenium Tool Suite. Selenium Software is not just a single tool but a suite of software, each piece catering to different testing needs of an organization. Below is the list of tools that selenium provides

Selenium Integrated Development Environment(IDE)

Selenium Remote Control(RC)

Selenium WebDriver

Selenium Grid



Selenium IDE: Shinya Kasatani of Japan created IDE, a Firefox and Chrome extension that can automate the browser through record and playback feature. It is the simplest framework in the Selenium suite and the easiest one to learn. However, because of its simplicity, Selenium IDE should only be used as a prototyping tool. If you want to create more advanced test cases, you will have to use WebDriver.


Selenium RC: Paul Hammant came up with the concept of Selenium RC to overcome the problems of Selenium Core. This system is known as Selenium RC or Selenium 1. Selenium RC was the testing framework of the whole Selenium project for a long time. This was the first tool that allowed users to use a programming language they preferred like Java, Python, Ruby, etc. However after the introduction of WebDriver in the Selenium 2 version, people started using WebDriver, and RC is deprecated from Selenium 3 onwards. Selenium 4 is the latest release. It introduces new features and full W3C compliance.


Selenium WebDriver:  Simon Stewart created WebDriver in 2006. It was the first cross-platform testing framework to control the browser from the OS level. In 2008, Selenium RC and WebDriver were merged into a single framework to form Selenium 2, with WebDriver being the core. Selenium WebDriver is a Java Interface. In Java, we have a concept of Interface which contains n number of abstract methods and variables. WebDriver interface is implemented by multiple classes like chromeDriver, FireFoxDriver, EdgeDriver classes, etc. Each browser has a different class and those classes have implemented the webDriver class Interface.

WebDriver is also an API. API contains many classes and methods by which we can communicate with the client and server. WD acts as a mediator between browser and client libraries. 

WebDriver is one of the components in Selenium by which we can automate the browser. WebDriver has different commands by which we can interact with the elements on the Webpage and we can perform different types of actions(Clicking buttons, entering texts, navigating pages) on the web applications.


Selenium Grid: It is a tool that runs parallel tests across different machines and browsers.


Architecture of Selenium WebDriver(Selenium3 & Selenium 4)


Selenium WebDriver is a robust tool for automating web browsers. Its architecture consists of several key components:


Client Library:

The Client Library provides language-specific bindings (like Java, Python, C#, etc.) that enable developers to write Selenium scripts. These scripts interact with the WebDriver API to automate browser actions.


WebDriver API:

Acting as a mediator, the WebDriver API translates commands from Selenium scripts into a format that browser-specific drivers can understand. It defines a set of interfaces and methods for interacting with browsers.


Browser-Specific Drivers:

Browser-specific drivers (e.g., ChromeDriver, GeckoDriver) are executables provided by browser vendors. They receive commands from the WebDriver API, translate them into actions that browsers can execute, and relay back responses to the API.


Browser:

The actual web browser (e.g., Chrome, Firefox) where web pages are displayed and actions are performed during automation. It executes commands sent by the WebDriver API via the driver, interacts with web elements, executes JavaScript, and renders web pages.


The major difference between Selenium 3 and Selenium 4 Architecture


  • Client libraries communicated with browser-specific drivers using the JSON Wire Protocol.

  • This is the architecture followed until the selenium 3.8 version.

  • Since browser-specific drivers and their browsers were beginning to adhere to W3C WebDriver standards and Client libraries were following the Jason Wire protocol, there was a need for encoding and decoding scripts to bridge the gap between these two protocols.

  • This dual-protocol approach led to inconsistencies and instability in applications using Selenium WebDriver

  • Starting from Selenium 4, client libraries communicate with browser-specific drivers using the W3C WebDriver protocol.

  • This is the architecture followed from the Selenium 4 version

  • Since all components (browser-specific drivers, browsers, client libraries) adhere to the same WebDriver protocol standards, there is now improved stability and consistency in applications. Data communication is stable.

Conclusion:


In conclusion, Selenium WebDriver is a powerful tool for automating web browser interactions. The architecture of Selenium has evolved, Selenium 4 introduced WebDriver W3C Protocol. The communication between client libraries, browser drivers, and browsers is made easier by the W3C protocol, and it provides better compatibility, efficiency, and maintainability. The encoding and decoding process is removed and the code base is optimized. Action APIs were added which offer keyboard actions like zoom-in, zoom-out, drag and drop mouse operations.


Happy Learning!!!





134 views

Recent Posts

See All
bottom of page