DevToys Web Pro iconDevToys Web ProBlog
Bedøm os:
Prøv browserudvidelsen:
← Back to Blog

XPath Tester Guide: Axes, Predicates & Browser Tips

10 min read

XPath (XML Path Language) is the standard query language for selecting nodes in XML and HTML documents. Whether you are writing web scrapers, processing XML feeds, or debugging DOM structure, understanding XPath will save you hours. Use the XPath Tester to experiment with every example in this guide interactively.

This guide covers the concepts that trip up developers most often: the axis system, predicate filtering, the performance pitfalls of //, and running live XPath queries inside the browser console.

Absolute vs Relative Paths

XPath expressions are either absolute or relative. An absolute path starts from the document root with a single /:

<!-- Absolute: starts from root -->
/html/body/div/ul/li

<!-- Relative: starts from the context node -->
.//li

<!-- Relative: selects child elements of current node -->
div/span

Absolute paths are brittle — adding one wrapper element in the HTML breaks the selector. Prefer relative paths anchored to a stable landmark element whenever possible.

The // Descendant-or-Self Pitfall

The // shorthand expands to /descendant-or-self::node()/. It is convenient but carries two risks:

  • Unexpected matches. //div matches every <div> in the entire document, including deeply nested ones you did not intend to select.
  • Performance cost. On large documents the engine must traverse every node in the tree. Using a more specific axis or anchoring to a parent element is dramatically faster.
<!-- Avoid: scans the whole document -->
//table//td

<!-- Better: scoped to a known parent -->
//div[@id="results"]//td

<!-- Best: explicit axis from a known node -->
//div[@id="results"]/table/tbody/tr/td

XPath Axes

An axis defines the direction of traversal relative to the context node. Every XPath step has the form axis::nodetest[predicate]. The abbreviated syntax (child:: is implied, .. means parent::node()) hides the axis but it is always there.

AxisSelectsExample
childDirect children of context nodechild::li or li
descendantAll descendants (children, grandchildren, …)descendant::span
descendant-or-selfContext node + all descendants.//span (abbreviated)
parentDirect parent of context nodeparent::div or ..
ancestorAll ancestors up to rootancestor::section
ancestor-or-selfContext node + all ancestorsancestor-or-self::article
following-siblingSiblings after context nodefollowing-sibling::li
preceding-siblingSiblings before context nodepreceding-sibling::dt
followingAll nodes after context node (document order)following::h2
precedingAll nodes before context node (document order)preceding::h2
selfContext node itselfself::div or .
attributeAttributes of context nodeattribute::href or @href
namespaceNamespace nodesnamespace::xsl

The following-sibling axis is particularly useful in HTML tables and definition lists where you need to select the cell next to a labelled header:

<!-- Select the <dd> that follows the <dt> containing "Price" -->
//dt[normalize-space(text())="Price"]/following-sibling::dd[1]

Predicates

Predicates appear in square brackets […] and filter the node-set produced by the axis. You can stack multiple predicates; they are evaluated left to right.

<!-- By position (1-based) -->
//ul/li[1]          <!-- first item -->
//ul/li[last()]     <!-- last item -->
//ul/li[position() <= 3]  <!-- first three items -->

<!-- By attribute presence -->
//a[@href]

<!-- By attribute value -->
//a[@href="https://example.com"]
//input[@type="submit"]

<!-- Partial attribute match (contains) -->
//div[contains(@class, "card")]

<!-- Multiple predicates (AND) -->
//input[@type="text"][@required]

<!-- By child element existence -->
//li[a]

<!-- By text content -->
//button[text()="Submit"]
//h2[contains(text(), "XPath")]

When you need case-insensitive matching in XPath 1.0, combine translate() with contains() because there is no lower-case() function in the 1.0 spec:

//div[contains(
  translate(@class, "ABCDEFGHIJKLMNOPQRSTUVWXYZ",
                    "abcdefghijklmnopqrstuvwxyz"),
  "highlight"
)]

Selecting Attributes and Text

Attributes are selected with the @ shorthand for the attribute:: axis. Text nodes are selected with text() or the string-value functions.

GoalXPathNotes
Get href value//a/@hrefReturns attribute node
Get src of images//img/@src
Get raw text node//p/text()Only direct text, not descendants
Get all text (including nested)string(//p)Concatenates all text nodes
Trim whitespacenormalize-space(//h1)Collapses whitespace runs
Substring extractionsubstring(//span, 1, 5)1-based indexing

A common mistake is using text() when the element contains mixed content (text interleaved with child elements like <strong> or <em>). In that case text() returns only the first bare text node. Use normalize-space(.) or string(.) to collect the full string value of the element.

XPath for HTML Scraping vs Strict XML

XPath was designed for well-formed XML, but most web scraping targets HTML which is frequently malformed. Differences to keep in mind:

  • Case sensitivity. XML element names are case-sensitive. //DIV and //div are different in XML. HTML parsers typically normalise tag names to lowercase, so //div works reliably for HTML scraping.
  • Namespaces. XHTML documents use the http://www.w3.org/1999/xhtml namespace. Scrapers that parse via a DOM (the browser, lxml with HTML mode) usually strip namespaces, but strict XML parsers require a namespace prefix on every step.
  • Self-closing tags. <br/> and <img/> are legal in XML. HTML5 parsers handle them, but an XML parser will fail on an unclosed <br>.
  • class attribute. In CSS you write .card to match a class. In XPath you must use contains(@class, "card") — but be careful: contains(@class, "card") also matches class="postcard". A more precise match is: contains(concat(' ',@class,' '), ' card ').

If you are comparing XPath with CSS selectors, check the CSS Selector Tester for side-by-side experimentation. For pattern-matching within text values, the concepts in our Regex Cheatsheet complement XPath predicates well.

Running XPath in the Browser

Every modern browser exposes document.evaluate() which executes XPath 1.0 against the live DOM. This is ideal for debugging selectors before putting them in a scraper.

// Basic usage: get a single node
const result = document.evaluate(
  '//h1',                         // XPath expression
  document,                       // context node (root)
  null,                           // namespace resolver
  XPathResult.FIRST_ORDERED_NODE_TYPE,
  null
);
console.log(result.singleNodeValue?.textContent);

// Get all matching nodes as a snapshot
const snapshot = document.evaluate(
  '//a[@href]',
  document,
  null,
  XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,
  null
);
for (let i = 0; i < snapshot.snapshotLength; i++) {
  console.log(snapshot.snapshotItem(i).href);
}

// Evaluate a boolean (useful in predicates)
const hasForm = document.evaluate(
  'boolean(//form[@id="login"])',
  document,
  null,
  XPathResult.BOOLEAN_TYPE,
  null
).booleanValue;

// Shorthand via $x() in Chrome DevTools console
$x('//h2[contains(text(),"XPath")]')

The $x() helper is available only in Chrome and Edge DevTools consoles — it is not part of the Web standard. In Firefox use document.evaluate() directly, or install a DevTools extension.

Common XPath Functions

FunctionTypeDescription
text()Node testSelects text nodes
normalize-space(s)StringStrips leading/trailing whitespace, collapses internal runs
contains(s, sub)BooleanTrue if string s contains sub
starts-with(s, pre)BooleanTrue if s starts with pre
string-length(s)NumberCharacter count
substring(s, start, len)StringExtracts substring; 1-based index
translate(s, from, to)StringCharacter-by-character replacement (used for case folding)
count(nodeset)NumberNumber of nodes in the set
position()Number1-based position within context
last()NumberSize of context node-list
not(expr)BooleanLogical negation
boolean(obj)BooleanConverts to boolean (empty nodeset = false)
string(obj)StringString value of a node or expression
number(obj)NumberNumeric conversion
concat(s1, s2, …)StringConcatenates two or more strings

Real-World Examples

Here are common patterns you will encounter when scraping or processing documents:

<!-- Sample HTML fragment used in the examples below -->
<article id="post-42">
  <h2 class="post-title">Getting Started with XPath</h2>
  <p class="byline">By <a href="/authors/jane">Jane Smith</a> on <time datetime="2026-06-11">June 11</time></p>
  <ul class="tags">
    <li><a href="/tag/xml">XML</a></li>
    <li><a href="/tag/xpath">XPath</a></li>
    <li><a href="/tag/scraping">Scraping</a></li>
  </ul>
</article>
<!-- Get the article title -->
//article[@id="post-42"]/h2/text()

<!-- Get the author link URL -->
//p[@class="byline"]/a/@href

<!-- Get the machine-readable date -->
//time/@datetime

<!-- Get all tag labels -->
//ul[@class="tags"]/li/a/text()

<!-- Get all tag hrefs -->
//ul[@class="tags"]/li/a/@href

<!-- Get the tag that comes after "XML" -->
//a[text()="XML"]/parent::li/following-sibling::li[1]/a/text()

<!-- Count tags -->
count(//ul[@class="tags"]/li)

<!-- Check if "XPath" tag exists -->
boolean(//ul[@class="tags"]//a[text()="XPath"])

Best Practices

  • Anchor to stable IDs or roles. Prefer //section[@id='main']//p over a deeply nested absolute path. IDs change less often than DOM structure.
  • Avoid positional predicates on dynamic lists. //ul/li[3] breaks if an item is inserted. Filter by content or attribute instead: //li[contains(@class,'selected')].
  • Scope your // selectors. Write //nav//a instead of //a to avoid matching every link on the page.
  • Use normalize-space() for text comparisons. Whitespace in HTML is unpredictable. normalize-space(.)='Submit' is more robust than text()='Submit'.
  • Test incrementally. Build your expression one step at a time in the XPath Tester to verify each axis and predicate before adding the next.

Ready to put these patterns into practice? Open the XPath Tester and paste in any XML or HTML document to run every expression from this guide against your own data in seconds — no browser console required.