XPath Tester Guide: Axes, Predicates & Browser Tips

June 11, 2026•10 min read

XPath (XML Path Language) is the standard query language for selecting nodes in XML and HTML documents. Whether you are writing web scrapers, processing XML feeds, or debugging DOM structure, understanding XPath will save you hours. Use the XPath Tester to experiment with every example in this guide interactively.

This guide covers the concepts that trip up developers most often: the axis system, predicate filtering, the performance pitfalls of //, and running live XPath queries inside the browser console.

Absolute vs Relative Paths

XPath expressions are either absolute or relative. An absolute path starts from the document root with a single /:

<!-- Absolute: starts from root -->
/html/body/div/ul/li

<!-- Relative: starts from the context node -->
.//li

<!-- Relative: selects child elements of current node -->
div/span

Absolute paths are brittle — adding one wrapper element in the HTML breaks the selector. Prefer relative paths anchored to a stable landmark element whenever possible.

The `//` Descendant-or-Self Pitfall

The // shorthand expands to /descendant-or-self::node()/. It is convenient but carries two risks:

Unexpected matches. //div matches every <div> in the entire document, including deeply nested ones you did not intend to select.
Performance cost. On large documents the engine must traverse every node in the tree. Using a more specific axis or anchoring to a parent element is dramatically faster.

<!-- Avoid: scans the whole document -->
//table//td

<!-- Better: scoped to a known parent -->
//div[@id="results"]//td

<!-- Best: explicit axis from a known node -->
//div[@id="results"]/table/tbody/tr/td

XPath Axes

An axis defines the direction of traversal relative to the context node. Every XPath step has the form axis::nodetest[predicate]. The abbreviated syntax (child:: is implied, .. means parent::node()) hides the axis but it is always there.

Axis	Selects	Example
`child`	Direct children of context node	`child::li` or `li`
`descendant`	All descendants (children, grandchildren, …)	`descendant::span`
`descendant-or-self`	Context node + all descendants	`.//span` (abbreviated)
`parent`	Direct parent of context node	`parent::div` or `..`
`ancestor`	All ancestors up to root	`ancestor::section`
`ancestor-or-self`	Context node + all ancestors	`ancestor-or-self::article`
`following-sibling`	Siblings after context node	`following-sibling::li`
`preceding-sibling`	Siblings before context node	`preceding-sibling::dt`
`following`	All nodes after context node (document order)	`following::h2`
`preceding`	All nodes before context node (document order)	`preceding::h2`
`self`	Context node itself	`self::div` or `.`
`attribute`	Attributes of context node	`attribute::href` or `@href`
`namespace`	Namespace nodes	`namespace::xsl`

The following-sibling axis is particularly useful in HTML tables and definition lists where you need to select the cell next to a labelled header:

<!-- Select the <dd> that follows the <dt> containing "Price" -->
//dt[normalize-space(text())="Price"]/following-sibling::dd[1]

Predicates

Predicates appear in square brackets […] and filter the node-set produced by the axis. You can stack multiple predicates; they are evaluated left to right.

<!-- By position (1-based) -->
//ul/li[1]          <!-- first item -->
//ul/li[last()]     <!-- last item -->
//ul/li[position() <= 3]  <!-- first three items -->

<!-- By attribute presence -->
//a[@href]

<!-- By attribute value -->
//a[@href="https://example.com"]
//input[@type="submit"]

<!-- Partial attribute match (contains) -->
//div[contains(@class, "card")]

<!-- Multiple predicates (AND) -->
//input[@type="text"][@required]

<!-- By child element existence -->
//li[a]

<!-- By text content -->
//button[text()="Submit"]
//h2[contains(text(), "XPath")]

When you need case-insensitive matching in XPath 1.0, combine translate() with contains() because there is no lower-case() function in the 1.0 spec:

//div[contains(
  translate(@class, "ABCDEFGHIJKLMNOPQRSTUVWXYZ",
                    "abcdefghijklmnopqrstuvwxyz"),
  "highlight"
)]

Selecting Attributes and Text

Attributes are selected with the @ shorthand for the attribute:: axis. Text nodes are selected with text() or the string-value functions.

Goal	XPath	Notes
Get `href` value	`//a/@href`	Returns attribute node
Get `src` of images	`//img/@src`
Get raw text node	`//p/text()`	Only direct text, not descendants
Get all text (including nested)	`string(//p)`	Concatenates all text nodes
Trim whitespace	`normalize-space(//h1)`	Collapses whitespace runs
Substring extraction	`substring(//span, 1, 5)`	1-based indexing

A common mistake is using text() when the element contains mixed content (text interleaved with child elements like <strong> or <em>). In that case text() returns only the first bare text node. Use normalize-space(.) or string(.) to collect the full string value of the element.

XPath for HTML Scraping vs Strict XML

XPath was designed for well-formed XML, but most web scraping targets HTML which is frequently malformed. Differences to keep in mind:

Case sensitivity. XML element names are case-sensitive. //DIV and //div are different in XML. HTML parsers typically normalise tag names to lowercase, so //div works reliably for HTML scraping.
Namespaces. XHTML documents use the http://www.w3.org/1999/xhtml namespace. Scrapers that parse via a DOM (the browser, lxml with HTML mode) usually strip namespaces, but strict XML parsers require a namespace prefix on every step.
Self-closing tags. <br/> and <img/> are legal in XML. HTML5 parsers handle them, but an XML parser will fail on an unclosed <br>.
class attribute. In CSS you write .card to match a class. In XPath you must use contains(@class, "card") — but be careful: contains(@class, "card") also matches class="postcard". A more precise match is: contains(concat(' ',@class,' '), ' card ').

If you are comparing XPath with CSS selectors, check the CSS Selector Tester for side-by-side experimentation. For pattern-matching within text values, the concepts in our Regex Cheatsheet complement XPath predicates well.

Running XPath in the Browser

Every modern browser exposes document.evaluate() which executes XPath 1.0 against the live DOM. This is ideal for debugging selectors before putting them in a scraper.

// Basic usage: get a single node
const result = document.evaluate(
  '//h1',                         // XPath expression
  document,                       // context node (root)
  null,                           // namespace resolver
  XPathResult.FIRST_ORDERED_NODE_TYPE,
  null
);
console.log(result.singleNodeValue?.textContent);

// Get all matching nodes as a snapshot
const snapshot = document.evaluate(
  '//a[@href]',
  document,
  null,
  XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,
  null
);
for (let i = 0; i < snapshot.snapshotLength; i++) {
  console.log(snapshot.snapshotItem(i).href);
}

// Evaluate a boolean (useful in predicates)
const hasForm = document.evaluate(
  'boolean(//form[@id="login"])',
  document,
  null,
  XPathResult.BOOLEAN_TYPE,
  null
).booleanValue;

// Shorthand via $x() in Chrome DevTools console
$x('//h2[contains(text(),"XPath")]')

The $x() helper is available only in Chrome and Edge DevTools consoles — it is not part of the Web standard. In Firefox use document.evaluate() directly, or install a DevTools extension.

Common XPath Functions

Function	Type	Description
`text()`	Node test	Selects text nodes
`normalize-space(s)`	String	Strips leading/trailing whitespace, collapses internal runs
`contains(s, sub)`	Boolean	True if string `s` contains `sub`
`starts-with(s, pre)`	Boolean	True if `s` starts with `pre`
`string-length(s)`	Number	Character count
`substring(s, start, len)`	String	Extracts substring; 1-based index
`translate(s, from, to)`	String	Character-by-character replacement (used for case folding)
`count(nodeset)`	Number	Number of nodes in the set
`position()`	Number	1-based position within context
`last()`	Number	Size of context node-list
`not(expr)`	Boolean	Logical negation
`boolean(obj)`	Boolean	Converts to boolean (empty nodeset = false)
`string(obj)`	String	String value of a node or expression
`number(obj)`	Number	Numeric conversion
`concat(s1, s2, …)`	String	Concatenates two or more strings

Real-World Examples

Here are common patterns you will encounter when scraping or processing documents:

<!-- Sample HTML fragment used in the examples below -->
<article id="post-42">
  <h2 class="post-title">Getting Started with XPath</h2>
  <p class="byline">By <a href="/authors/jane">Jane Smith</a> on <time datetime="2026-06-11">June 11</time></p>
  <ul class="tags">
    <li><a href="/tag/xml">XML</a></li>
    <li><a href="/tag/xpath">XPath</a></li>
    <li><a href="/tag/scraping">Scraping</a></li>
  </ul>
</article>

<!-- Get the article title -->
//article[@id="post-42"]/h2/text()

<!-- Get the author link URL -->
//p[@class="byline"]/a/@href

<!-- Get the machine-readable date -->
//time/@datetime

<!-- Get all tag labels -->
//ul[@class="tags"]/li/a/text()

<!-- Get all tag hrefs -->
//ul[@class="tags"]/li/a/@href

<!-- Get the tag that comes after "XML" -->
//a[text()="XML"]/parent::li/following-sibling::li[1]/a/text()

<!-- Count tags -->
count(//ul[@class="tags"]/li)

<!-- Check if "XPath" tag exists -->
boolean(//ul[@class="tags"]//a[text()="XPath"])

Best Practices

Anchor to stable IDs or roles. Prefer //section[@id='main']//p over a deeply nested absolute path. IDs change less often than DOM structure.
Avoid positional predicates on dynamic lists. //ul/li[3] breaks if an item is inserted. Filter by content or attribute instead: //li[contains(@class,'selected')].
Scope your // selectors. Write //nav//a instead of //a to avoid matching every link on the page.
Use normalize-space() for text comparisons. Whitespace in HTML is unpredictable. normalize-space(.)='Submit' is more robust than text()='Submit'.
Test incrementally. Build your expression one step at a time in the XPath Tester to verify each axis and predicate before adding the next.

Ready to put these patterns into practice? Open the XPath Tester and paste in any XML or HTML document to run every expression from this guide against your own data in seconds — no browser console required.