top of page
hand-businesswoman-touching-hand-artificial-intelligence-meaning-technology-connection-go-

Unleashing the Power of XPath


Introduction:


XPath (XML Path Language), it's a powerful query language that allows users to identify and interact with specific elements such as buttons, inputs, fields, or links within XML and HTML documents. It’s mostly useful when other locators like ID , Name , or Class name are not available or effective.


When a web page is loaded in a browser, it generates a DOM(Document object Model) structure. In Selenium, XPath is used to traverse the DOM and locate desired elements based on their tag names, attributes, or text content.


Types Of XPath Locators & Basic XPath :

When it comes to accurately locating elements on web pages, there are various types of locators that can be used. Below are the few examples of XPath locators.


Basic XPath Expression : //tagname[@attribute=’value’] or //*[@attribute=’value’]

XPath locators

Syntax &Examples(https://www.numpyninja.com)

ID

//tagname[@attribute=’value’] //div[@id="Containerc51lm"]

ClassName

//tagname[@attribute=’value’]

//div[@class="wix-global-css"]


Name

//tagname[@attribute=’value’]

//*[@ name="format-detection"]


Link Text

//tagname[text()=’value’]

//*[text()='Log In']


Here’s a breakdown of above expression components and some more :

Components

Description

Single forward slash(/)

Selects from the root node.

Double forward slash(//)

The double forward slash indicates that to select the current node.

Tagname

This refers to the HTML tag of the element to locate. For example to find a <div> element, replace tagname with <div>.

@

Selects any attribute of the given node.

​[@attribute='value']

​Within the square brackets, specify the attribute and its corresponding value to be match. For example, to find a <div> element with the attribute class having the value "myClass", replace attribute with class and value with myClass.

​Asterisk(*)

​Serves as a wildcard for the element name. It matches any element , regardless of its specific tag name.

​Dot(.)

​Select current node.

Double Dot(..)

Select the parent of the current node.

Different ways to write XPath expressions :


1.Absolute XPath : This is the complete path from the root element to the target element. It starts with a single forward slash(/).

For example : /html[1]/body[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/a[1]


  • /html[1]: This selects the root element of the HTML document, <html>, and the [1] indicates that it is the first occurrence of the <html> tag in the document.

  • /html[1]/body[1]: This moves to the <body> element, which is the child of the <html> element.

  • /html[1]/body[1]/div[1]: This selects the first <div> element within the <body> element.

  • /html[1]/body[1]/div[1]/div[1]: This further narrows down to the first <div> element within the previous <div> element.

  • This pattern continues with /div[1] for each subsequent level, indicating the first occurrence of the <div> element at that level.




This absolute XPath provides the complete path, starting from the root element to locate the desired input element on the web page. However its important to note that this is very much possible to break if the structure of the web page changes or elements are rearranged.


2.Relative XPath : This is a more flexible and commonly used approach. It starts with a double forward slash (//) and navigates through the entire document to find the desired element.


For example: //input[@id='systemid'] selects an input element with the id "systemId" anywhere in the document.


Relative XPath can be used when the attributes or positions of elements may vary based on different scenarios or dynamic content on a web page. Below are the XPath functions using which we can locate dynamic web elements.


i. Single attribute

//img[@alt='Google']

ii. Double attribute

//img[@alt='Google'][@class='lnXdpd']

iii. Parent : A parent is an element that is directly above and connected to an element in the document tree.

//img[@alt='Google']//parent::div

iv. Ancestor : An ancestor refers to any element that is connected but further up the document tree.

//a[@class='MV3Tnb']//ancestor::div[1]

v. Child: A child is an element that is directly below and connected to an element in the document tree.

//div[@class=’home-page‘]//child::div

vi. Following: XPath expressions using the following are useful when we locate elements that come after a specific element, enabling you to access related content or perform actions on subsequent elements.

//a[@class='MV3Tnb']//following::input

//following:: input : selects any <input> element that appears after the <a>.The following axis allows navigation to elements that come after the current node in the document order.

//a[@class='MV3Tnb']//following-sibling::div

//following-sibling::div : Select any <div> element that comes after the class of 'MV3Tnb’ as a sibling.

vii. Preceding : This is used to select nodes that appear before a given context node .It's basically useful when we want to select elements that come before a specific point in the document.

//a[@class='MV3Tnb']//preceding::input

//preceding::input : This selects any input element that appears before the anchor tag found

//a[@class='MV3Tnb']//preceding-sibling::div

//preceding-sibling::div : This selects any preceding sibling div element that appears before the anchor tag found

viii. Index based: This is useful to select elements based on their position or index in the document hierarchy , particularly when the elements don't have unique identifiers or attributes.

//a[@class='MV3Tnb'][2]

//a[@class='MV3Tnb'][last()]

//a[@class='MV3Tnb'][last()-1]

ix. Text value : Text value XPath expressions are used to target elements based on their textual content.

text() : //a[text()='About'] - Selecting elements with exact text.

contains() : //a[contains(text(),'About')] - Selecting elements containing specific text.

starts-with : //a[starts-with(text(),'About')] - Select elements with matches starting value.

//a[contains(.,'Google')] : Select all <a> elements that have the text “Google” anywhere in the DOM.

//a[.='Store'] : Select elements that have the exact text content “Store”. It won't select elements with extra white space or text.

x. And /Or : These operators allow combining multiple conditions together.

//img[@alt='Google' or @class='lnXdpd']

//img[@alt='Google' and @class='lnXdpd']

Advantages of XPath:

  • It has robust and versatile selection options to target elements based on attributes, text content, hierarchy, and more.

  • It is platform and language-independent making it versatile.

  • It allows to navigate through complex document structures, including traversing elements parent, child, sibling, and ancestor.

  • It traverse both ways from parents to child and child to parent as well.

  • It provides functions, and operators for advanced navigation.

Disadvantages of XPath:

  • Understanding the syntax, and functions may require some time and practice.

  • XPath expressions can become fragile if the structure of the document change.

  • XPath is slower in terms of performance and speed.

Conclusion:

XPath is a powerful tool which provides flexibility and precision while interacting with web elements. It plays a vital role in automated testing and web scraping , and efficiently navigate and extract information from the web pages.























25 views1 comment

Recent Posts

See All

In the early stages of a visualization project, we often start with two interrelated questions: Where can I find reliable data? What does this data truly represent? Information does not magically appe

bottom of page