XPath Tester Guide: Axes, Predicates & Browser Tips
XPath (XML Path Language) is the standard query language for selecting nodes in XML and HTML documents. Whether you are writing web scrapers, processing XML feeds, or debugging DOM structure, understanding XPath will save you hours. Use the XPath Tester to experiment with every example in this guide interactively.
This guide covers the concepts that trip up developers most often: the axis system, predicate filtering, the performance pitfalls of //, and running live XPath queries inside the browser console.
Absolute vs Relative Paths
XPath expressions are either absolute or relative. An absolute path starts from the document root with a single /:
<!-- Absolute: starts from root -->
/html/body/div/ul/li
<!-- Relative: starts from the context node -->
.//li
<!-- Relative: selects child elements of current node -->
div/spanAbsolute paths are brittle — adding one wrapper element in the HTML breaks the selector. Prefer relative paths anchored to a stable landmark element whenever possible.
The // Descendant-or-Self Pitfall
The // shorthand expands to /descendant-or-self::node()/. It is convenient but carries two risks:
- Unexpected matches.
//divmatches every<div>in the entire document, including deeply nested ones you did not intend to select. - Performance cost. On large documents the engine must traverse every node in the tree. Using a more specific axis or anchoring to a parent element is dramatically faster.
<!-- Avoid: scans the whole document -->
//table//td
<!-- Better: scoped to a known parent -->
//div[@id="results"]//td
<!-- Best: explicit axis from a known node -->
//div[@id="results"]/table/tbody/tr/tdXPath Axes
An axis defines the direction of traversal relative to the context node. Every XPath step has the form axis::nodetest[predicate]. The abbreviated syntax (child:: is implied, .. means parent::node()) hides the axis but it is always there.
| Axis | Selects | Example |
|---|---|---|
child | Direct children of context node | child::li or li |
descendant | All descendants (children, grandchildren, …) | descendant::span |
descendant-or-self | Context node + all descendants | .//span (abbreviated) |
parent | Direct parent of context node | parent::div or .. |
ancestor | All ancestors up to root | ancestor::section |
ancestor-or-self | Context node + all ancestors | ancestor-or-self::article |
following-sibling | Siblings after context node | following-sibling::li |
preceding-sibling | Siblings before context node | preceding-sibling::dt |
following | All nodes after context node (document order) | following::h2 |
preceding | All nodes before context node (document order) | preceding::h2 |
self | Context node itself | self::div or . |
attribute | Attributes of context node | attribute::href or @href |
namespace | Namespace nodes | namespace::xsl |
The following-sibling axis is particularly useful in HTML tables and definition lists where you need to select the cell next to a labelled header:
<!-- Select the <dd> that follows the <dt> containing "Price" -->
//dt[normalize-space(text())="Price"]/following-sibling::dd[1]Predicates
Predicates appear in square brackets […] and filter the node-set produced by the axis. You can stack multiple predicates; they are evaluated left to right.
<!-- By position (1-based) -->
//ul/li[1] <!-- first item -->
//ul/li[last()] <!-- last item -->
//ul/li[position() <= 3] <!-- first three items -->
<!-- By attribute presence -->
//a[@href]
<!-- By attribute value -->
//a[@href="https://example.com"]
//input[@type="submit"]
<!-- Partial attribute match (contains) -->
//div[contains(@class, "card")]
<!-- Multiple predicates (AND) -->
//input[@type="text"][@required]
<!-- By child element existence -->
//li[a]
<!-- By text content -->
//button[text()="Submit"]
//h2[contains(text(), "XPath")]When you need case-insensitive matching in XPath 1.0, combine translate() with contains() because there is no lower-case() function in the 1.0 spec:
//div[contains(
translate(@class, "ABCDEFGHIJKLMNOPQRSTUVWXYZ",
"abcdefghijklmnopqrstuvwxyz"),
"highlight"
)]Selecting Attributes and Text
Attributes are selected with the @ shorthand for the attribute:: axis. Text nodes are selected with text() or the string-value functions.
| Goal | XPath | Notes |
|---|---|---|
Get href value | //a/@href | Returns attribute node |
Get src of images | //img/@src | |
| Get raw text node | //p/text() | Only direct text, not descendants |
| Get all text (including nested) | string(//p) | Concatenates all text nodes |
| Trim whitespace | normalize-space(//h1) | Collapses whitespace runs |
| Substring extraction | substring(//span, 1, 5) | 1-based indexing |
A common mistake is using text() when the element contains mixed content (text interleaved with child elements like <strong> or <em>). In that case text() returns only the first bare text node. Use normalize-space(.) or string(.) to collect the full string value of the element.
XPath for HTML Scraping vs Strict XML
XPath was designed for well-formed XML, but most web scraping targets HTML which is frequently malformed. Differences to keep in mind:
- Case sensitivity. XML element names are case-sensitive.
//DIVand//divare different in XML. HTML parsers typically normalise tag names to lowercase, so//divworks reliably for HTML scraping. - Namespaces. XHTML documents use the
http://www.w3.org/1999/xhtmlnamespace. Scrapers that parse via a DOM (the browser, lxml with HTML mode) usually strip namespaces, but strict XML parsers require a namespace prefix on every step. - Self-closing tags.
<br/>and<img/>are legal in XML. HTML5 parsers handle them, but an XML parser will fail on an unclosed<br>. - class attribute. In CSS you write
.cardto match a class. In XPath you must usecontains(@class, "card")— but be careful:contains(@class, "card")also matchesclass="postcard". A more precise match is:contains(concat(' ',@class,' '), ' card ').
If you are comparing XPath with CSS selectors, check the CSS Selector Tester for side-by-side experimentation. For pattern-matching within text values, the concepts in our Regex Cheatsheet complement XPath predicates well.
Running XPath in the Browser
Every modern browser exposes document.evaluate() which executes XPath 1.0 against the live DOM. This is ideal for debugging selectors before putting them in a scraper.
// Basic usage: get a single node
const result = document.evaluate(
'//h1', // XPath expression
document, // context node (root)
null, // namespace resolver
XPathResult.FIRST_ORDERED_NODE_TYPE,
null
);
console.log(result.singleNodeValue?.textContent);
// Get all matching nodes as a snapshot
const snapshot = document.evaluate(
'//a[@href]',
document,
null,
XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,
null
);
for (let i = 0; i < snapshot.snapshotLength; i++) {
console.log(snapshot.snapshotItem(i).href);
}
// Evaluate a boolean (useful in predicates)
const hasForm = document.evaluate(
'boolean(//form[@id="login"])',
document,
null,
XPathResult.BOOLEAN_TYPE,
null
).booleanValue;
// Shorthand via $x() in Chrome DevTools console
$x('//h2[contains(text(),"XPath")]')The $x() helper is available only in Chrome and Edge DevTools consoles — it is not part of the Web standard. In Firefox use document.evaluate() directly, or install a DevTools extension.
Common XPath Functions
| Function | Type | Description |
|---|---|---|
text() | Node test | Selects text nodes |
normalize-space(s) | String | Strips leading/trailing whitespace, collapses internal runs |
contains(s, sub) | Boolean | True if string s contains sub |
starts-with(s, pre) | Boolean | True if s starts with pre |
string-length(s) | Number | Character count |
substring(s, start, len) | String | Extracts substring; 1-based index |
translate(s, from, to) | String | Character-by-character replacement (used for case folding) |
count(nodeset) | Number | Number of nodes in the set |
position() | Number | 1-based position within context |
last() | Number | Size of context node-list |
not(expr) | Boolean | Logical negation |
boolean(obj) | Boolean | Converts to boolean (empty nodeset = false) |
string(obj) | String | String value of a node or expression |
number(obj) | Number | Numeric conversion |
concat(s1, s2, …) | String | Concatenates two or more strings |
Real-World Examples
Here are common patterns you will encounter when scraping or processing documents:
<!-- Sample HTML fragment used in the examples below -->
<article id="post-42">
<h2 class="post-title">Getting Started with XPath</h2>
<p class="byline">By <a href="/authors/jane">Jane Smith</a> on <time datetime="2026-06-11">June 11</time></p>
<ul class="tags">
<li><a href="/tag/xml">XML</a></li>
<li><a href="/tag/xpath">XPath</a></li>
<li><a href="/tag/scraping">Scraping</a></li>
</ul>
</article><!-- Get the article title -->
//article[@id="post-42"]/h2/text()
<!-- Get the author link URL -->
//p[@class="byline"]/a/@href
<!-- Get the machine-readable date -->
//time/@datetime
<!-- Get all tag labels -->
//ul[@class="tags"]/li/a/text()
<!-- Get all tag hrefs -->
//ul[@class="tags"]/li/a/@href
<!-- Get the tag that comes after "XML" -->
//a[text()="XML"]/parent::li/following-sibling::li[1]/a/text()
<!-- Count tags -->
count(//ul[@class="tags"]/li)
<!-- Check if "XPath" tag exists -->
boolean(//ul[@class="tags"]//a[text()="XPath"])Best Practices
- Anchor to stable IDs or roles. Prefer
//section[@id='main']//pover a deeply nested absolute path. IDs change less often than DOM structure. - Avoid positional predicates on dynamic lists.
//ul/li[3]breaks if an item is inserted. Filter by content or attribute instead://li[contains(@class,'selected')]. - Scope your
//selectors. Write//nav//ainstead of//ato avoid matching every link on the page. - Use
normalize-space()for text comparisons. Whitespace in HTML is unpredictable.normalize-space(.)='Submit'is more robust thantext()='Submit'. - Test incrementally. Build your expression one step at a time in the XPath Tester to verify each axis and predicate before adding the next.
Ready to put these patterns into practice? Open the XPath Tester and paste in any XML or HTML document to run every expression from this guide against your own data in seconds — no browser console required.