DevToys Pro

free web developer tools

Blog
Rate us:
Try browser extension:
← Back to Blog

HTML Entities: When to Escape, When Not To

13 min read

HTML entity encoding is one of those topics that seems simple until you need to decide whether to encode user input going into an HTML attribute, text content, a URL parameter, or a JSON payload embedded in a <script> tag. Get it wrong and you open the door to cross-site scripting (XSS) attacks. Encode too aggressively and you break legitimate content.

This guide explains when HTML entity encoding is required for security, when it is optional, and how to safely handle data in different HTML contexts without breaking functionality or introducing vulnerabilities.

What Are HTML Entities?

HTML entities are special sequences that represent reserved characters in HTML. They start with & and end with ;.

CharacterEntity NameEntity NumberWhy Encode
<&lt;&#60;Starts HTML tag
>&gt;&#62;Ends HTML tag
&&amp;&#38;Starts entity
"&quot;&#34;Attribute delimiter
'&apos;&#39;Attribute delimiter

The browser automatically decodes these entities when rendering the page. If you see &lt;script&gt; in your HTML source, the browser displays <script> as plain text rather than executing it as a tag.

Context Matters: Where You Need Encoding

The golden rule: encoding requirements depend on context. HTML has multiple contexts where data can appear, and each has different escaping rules.

Context 1: HTML Text Content

Text content between tags (not inside attributes) requires encoding for <, >, and &:

<p>User comment: <script>alert('XSS')</script></p>

<!-- Vulnerable: browser executes the script -->

<p>User comment: &lt;script&gt;alert('XSS')&lt;/script&gt;</p>

<!-- Safe: browser displays as text -->

Required encoding: <&lt;, >&gt;, &&amp;

Optional: Quotes (" and ') do not need encoding in text content because they have no special meaning outside attributes.

Context 2: HTML Attributes

Attributes require encoding for " (or ' if using single-quoted attributes):

<input value="User input: " onclick="alert(1)">

<!-- Vulnerable: attacker injects onclick attribute -->

<input value="User input: &quot; onclick=&quot;alert(1)">

<!-- Safe: quotes are encoded, no attribute injection -->

Required encoding in double-quoted attributes: " &quot;, &&amp;

Required encoding in single-quoted attributes: ' &apos; or &#39;, & &amp;

Context 3: JavaScript Inside HTML

Embedding data in <script> tags is the most dangerous context. HTML entity encoding does not help here because the browser parses the script content as JavaScript, not HTML.

<script>
  const userName = "&lt;script&gt;alert(1)&lt;/script&gt;";
</script>

<!-- HTML entities are decoded BEFORE JavaScript runs -->
<!-- Result: const userName = "<script>alert(1)</script>"; -->
<!-- Still vulnerable to script injection -->

Safe approach: Use JSON encoding and proper escaping:

<script>
  const userName = ${JSON.stringify(userInput)};
</script>

<!-- Safe: JSON.stringify escapes quotes and special chars -->

However, even JSON.stringify is not enough if the string contains </script>:

<script>
  const data = "</script><script>alert(1)</script>";
</script>

<!-- Breaks out of the script tag and executes injected code -->

Fully safe approach: Escape </script> sequences:

<script>
  const data = ${JSON.stringify(userInput).replace(/</g, '\u003c')};
</script>

<!-- Safe: < is escaped as Unicode \u003c -->

Context 4: URL Attributes

URLs in href or src attributes require different handling:

<a href="javascript:alert(1)">Click</a>

<!-- Dangerous: executes JavaScript -->

<a href="https://example.com?q=user input">Link</a>

<!-- Requires URL encoding, not HTML entity encoding -->

For URLs, use URL encoding (percent encoding) first, then HTML entity encoding for the attribute context:

const safeUrl = "https://example.com?q=" + encodeURIComponent(userInput);
// Then embed in HTML attribute with entity encoding for quotes

Use a URL encoder to handle query string values, then apply HTML entity encoding when embedding the full URL in an attribute.

XSS Prevention: Real Attack Scenarios

Attack 1: Breaking Out of Attributes

Imagine a search form that displays the user's query:

<input type="text" value="${userQuery}">

// If userQuery = "><script>alert(1)</script>
// Result:
<input type="text" value=""><script>alert(1)</script>">

Fix: Encode quotes in attribute values:

<input type="text" value="&quot;&gt;&lt;script&gt;alert(1)&lt;/script&gt;">

// Safe: displayed as literal text

Attack 2: JSON in HTML Context

Developers often embed JSON configuration in HTML:

<script>
  const config = {"name": "${userName}"};
</script>

// If userName = Alice", "admin": true, "name": "
// Result:
const config = {"name": "Alice", "admin": true, "name": ""};

// Attacker injects JSON keys

Fix: Always use JSON.stringify and escape </script>:

<script>
  const config = ${JSON.stringify({name: userName}).replace(/</g, '\u003c')};
</script>

Attack 3: Event Handler Injection

<div title="${userInput}">Hover me</div>

// If userInput = " onmouseover="alert(1)
// Result:
<div title="" onmouseover="alert(1)">Hover me</div>

Fix: Encode quotes in attributes:

<div title="&quot; onmouseover=&quot;alert(1)">Hover me</div>

// Safe: quotes are literal text, not attribute delimiters

When Encoding Is Optional

Static Content

If the content is hardcoded and not user-controlled, encoding is optional (though still good practice):

<p>The <code> tag is used for code.</p>

<!-- Works fine, no user input -->

<p>The &lt;code&gt; tag is used for code.</p>

<!-- Also fine, more explicit -->

Pre-Sanitized HTML

If you are intentionally rendering HTML (like from a rich text editor) after sanitizing it with a trusted library, you should not encode entities because that would break the HTML structure:

const sanitizedHTML = DOMPurify.sanitize(userHTML);
// Do NOT encode entities after sanitization
document.innerHTML = sanitizedHTML;

// If you encode it:
document.innerHTML = htmlEncode(sanitizedHTML);
// Result: HTML tags display as text, not rendered

JSON Responses

JSON API responses do not need HTML entity encoding because they are not parsed as HTML:

// API response (Content-Type: application/json)
{
  "comment": "<script>alert(1)</script>"
}

// Safe: JSON is not HTML, no parsing as tags
// HTML encoding would corrupt the data

Common Mistakes

Mistake 1: Encoding Too Early

Encoding data when storing it in the database instead of when rendering it causes corruption:

// Wrong: encode before storing
db.save({ name: htmlEncode(userName) });

// Result: database contains "&lt;Alice&gt;" instead of "<Alice>"
// If you later send this in JSON API, clients get corrupted data

// Right: store raw, encode when rendering HTML
db.save({ name: userName });
// When rendering:
<p>${htmlEncode(data.name)}</p>

Mistake 2: Using HTML Encoding for URLs

// Wrong: HTML encoding doesn't work for URLs
<a href="/search?q=${htmlEncode('hello world')}">Link</a>
// Result: /search?q=hello world
// Broken: space not encoded for URL

// Right: use URL encoding
<a href="/search?q=${encodeURIComponent('hello world')}">Link</a>
// Result: /search?q=hello%20world

Use a URL encoder for query parameters and an HTML encoder for text content and attributes.

Mistake 3: Trusting Client-Side Encoding Only

Encoding on the client does not protect the server. Attackers can bypass client-side code and send raw requests:

// Client-side encoding:
const encoded = htmlEncode(userInput);
fetch('/api/comment', { body: JSON.stringify({ text: encoded }) });

// Attacker bypasses client, sends raw request:
POST /api/comment
{"text": "<script>alert(1)</script>"}

// Server must encode when rendering HTML

Always encode on the server when generating HTML.

Best Practices for Safe HTML Rendering

1. Encode at the Last Possible Moment

Store raw data, encode only when rendering HTML. This keeps data clean for other uses (API responses, exports, searches).

2. Use Context-Appropriate Encoding

  • HTML text content: Encode <, >, &
  • HTML attributes: Encode " (or ') and &
  • JavaScript context: Use JSON.stringify + escape </script>
  • URL context: Use URL encoding (percent encoding)

3. Use Framework Built-Ins

Modern frameworks handle encoding automatically:

// React: automatically encodes text content and attributes
<div title={userInput}>{userInput}</div>

// Vue: same auto-encoding
<div :title="userInput">{{ userInput }}</div>

// Angular: same
<div [title]="userInput">{{ userInput }}</div>

Only bypass framework encoding if you are intentionally rendering sanitized HTML with dangerouslySetInnerHTML or equivalent.

4. Test With Special Characters

During development, test with inputs that contain <>"'& and </script> to verify encoding works correctly. Use an HTML encoder/decoder to generate test payloads and verify how they render.

5. Validate Input, Encode Output

Input validation rejects malformed data (e.g., email format checks). Output encoding prevents XSS when rendering. Both are necessary:

  • Validation prevents bad data from entering your system
  • Encoding prevents stored data from executing as code when displayed

Tools for HTML Entity Encoding

When debugging or manually constructing HTML, a dedicated HTML encoder/decoder is essential. With the DevToys Pro HTML Encoder/Decoder you can:

  • Encode user input to see how it should appear in HTML source
  • Decode HTML source to verify the original data
  • Test edge cases like quotes, ampersands, and angle brackets
  • Verify whether data is double-encoded or corrupted
  • Generate test payloads for security testing

Quick Reference: Encoding Rules by Context

ContextRequired EncodingTool to Use
HTML text content< > &HTML encoder
HTML attributes" ' &HTML encoder
JavaScript <script>JSON.stringify + escape </script>JSON + custom escape
URL parametersPercent encoding (RFC 3986)URL encoder
JSON API responsesNone (not HTML context)JSON serializer

Conclusion

HTML entity encoding is required when embedding user-controlled data in HTML to prevent XSS attacks. The key is understanding context: text content, attributes, JavaScript, and URLs each have different encoding requirements.

Key takeaways:

  • Always encode user input when rendering HTML
  • Encode at render time, not at storage time
  • Use context-appropriate encoding: HTML entities for text/attributes, JSON for script tags, percent encoding for URLs
  • Never trust client-side encoding for server-side security
  • Test with special characters to verify encoding works correctly

For manual testing and debugging, use an HTML encoder/decoder to verify how characters are encoded and ensure your output is safe from XSS vulnerabilities while preserving legitimate content.