Markdown to HTML Guide: Flavors, Parsers, Sanitization, and Edge Cases
Markdown looks deceptively simple — until you need to convert it reliably across different environments. A document that renders perfectly in your editor may break in your CMS, produce unsafe output in your app, or render as plain text in an email client. Use the Markdown to HTML converter to follow along, or the Markdown preview tool for quick live editing.
The Flavor Zoo
There is no single Markdown. What exists is a family of dialects built on top of John Gruber's original 2004 spec, each adding or removing syntax:
- CommonMark — A formal, unambiguous specification (commonmark.org). Fixes hundreds of edge cases in the original Gruber spec. The baseline most modern parsers implement. Does not include tables, task lists, or strikethrough.
- GFM (GitHub Flavored Markdown) — CommonMark plus tables, task lists, strikethrough (
~~text~~), and autolinks. The most common dialect on the web. Used by GitHub, GitLab, npm, and many CMSes. - MDX — Markdown plus JSX. Lets you embed React components directly in Markdown files. Used by Next.js docs, Docusaurus, Astro. Not a rendering target — it compiles to JavaScript, not HTML.
- CommonMark extensions — Footnotes (
[^1]), definition lists, attributes ({.class #id}), subscript/superscript. Available via markdown-it plugins or remark plugins; not part of any universal standard.
The practical consequence: Markdown you write for GitHub will not render identically in every parser. Tables and task lists require explicit GFM support. Footnotes require a separate plugin. MDX files cannot be fed to a plain HTML renderer.
What Breaks Across Flavors
| Feature | CommonMark | GFM | MDX | Notes |
|---|---|---|---|---|
| Tables | No | Yes | Yes (via remark-gfm) | Pipe syntax: | col | col | |
| Task lists | No | Yes | Yes (via remark-gfm) | - [x] done, - [ ] todo |
| Strikethrough | No | Yes | Yes (via remark-gfm) | ~~text~~ |
| Autolinks | Angle brackets only | Bare URLs | Bare URLs | GFM linkifies https://... without brackets |
| Footnotes | No | No (GitHub renders them) | Plugin required | Non-standard; use remark-footnotes or markdown-it-footnote |
| Raw HTML | Allowed | Allowed (filtered) | JSX only | MDX treats <div> as a component, not HTML |
The most common breakage in practice: a document with a pipe table or task list fed to a plain CommonMark parser renders the raw syntax as literal text. Always confirm which flavor your parser supports before writing content.
Sanitization Is Mandatory
Markdown allows raw HTML passthrough. This means a user who controls Markdown input can inject arbitrary HTML — including <script> tags, event handlers, and data-exfiltrating <img> tags — directly into your rendered output. Never insert Markdown-rendered HTML into the DOM without sanitizing it first.
Two widely used options:
- DOMPurify (browser) — Fast, battle-tested, works in any browser context. Strips dangerous elements and attributes while preserving safe markup.
- rehype-sanitize (Node.js / unified pipeline) — AST-level sanitization inside the remark/rehype pipeline. Safer than post-processing because it never constructs the dangerous DOM node at all.
// Browser: DOMPurify after parsing
import MarkdownIt from 'markdown-it';
import DOMPurify from 'dompurify';
const md = new MarkdownIt({ html: false });
const rawHtml = md.render(userInput);
// Sanitize before touching the DOM
const safeHtml = DOMPurify.sanitize(rawHtml);
outputElement.innerHTML = safeHtml;// Node.js: rehype-sanitize inside unified pipeline
import { unified } from 'unified';
import remarkParse from 'remark-parse';
import remarkRehype from 'remark-rehype';
import rehypeSanitize from 'rehype-sanitize';
import rehypeStringify from 'rehype-stringify';
const file = await unified()
.use(remarkParse)
.use(remarkRehype, { allowDangerousHtml: false })
.use(rehypeSanitize) // removes unsafe nodes from the AST
.use(rehypeStringify)
.process(userInput);
const safeHtml = String(file);Setting html: false in markdown-it disables raw HTML passthrough entirely — a good default when you do not need to preserve author-written HTML. Combine this with DOMPurify as a second layer when using html: true.
The Parser Ecosystem
Three parsers cover the vast majority of JavaScript use cases:
- markdown-it — Fast, extensible, synchronous. Supports CommonMark and GFM via plugins (
markdown-it-task-lists,markdown-it-footnote). Best choice for browser rendering and server-side APIs where you need predictable performance. - remark / unified — AST-based pipeline. Parses Markdown to an abstract syntax tree (mdast), transforms it, converts to an HTML AST (hast), then serializes. Composable and powerful, but slower than markdown-it. Best for complex pipelines: syntax highlighting, custom components, content transformation.
- marked — Minimal, zero-dependency, fast. Less extensible than the above two. Good for simple use cases where you need a small bundle size and basic CommonMark/GFM rendering.
// markdown-it with GFM-like extensions
import MarkdownIt from 'markdown-it';
import taskLists from 'markdown-it-task-lists';
import footnote from 'markdown-it-footnote';
const md = new MarkdownIt({
html: false, // disable raw HTML passthrough
linkify: true, // auto-link bare URLs (like GFM)
typographer: true, // smart quotes and dashes
})
.use(taskLists)
.use(footnote);
const html = md.render(input);For the reference CommonMark implementation, the commonmark npm package provides a spec-compliant parser. Use it when spec-conformance matters more than extensions or speed.
Code Blocks and Syntax Highlighting
Fenced code blocks (triple backtick with a language hint) are supported by all major flavors. The language identifier is passed to your syntax highlighter as a hint — the parser itself does not validate it.
```javascript
const x = 1;
```
```python
x = 1
```Two popular syntax highlighting libraries:
- highlight.js — Runs in the browser or Node.js. Detects language automatically if not specified. Integrates with markdown-it via the
highlightoption callback. - Shiki — Uses VS Code TextMate grammars for higher-fidelity highlighting. Outputs inline styles (no separate CSS file needed). Async; better suited for build-time rendering than runtime preview.
// markdown-it with highlight.js
import MarkdownIt from 'markdown-it';
import hljs from 'highlight.js';
const md = new MarkdownIt({
highlight(code, lang) {
if (lang && hljs.getLanguage(lang)) {
return hljs.highlight(code, { language: lang }).value;
}
return ''; // let markdown-it escape the code block
},
});Inline code (`backtick`) does not carry a language hint and is not syntax highlighted. Indented code blocks (4 spaces) are CommonMark-valid but discouraged — they conflict with list item continuation and produce no language hint for highlighting.
Email Rendering
Email clients do not render standard HTML. They strip <style> blocks, ignore external stylesheets, and require table-based layout. Markdown-to-email requires a dedicated pipeline:
- Parse Markdown to HTML as normal.
- Convert the HTML to email-safe markup: table-based layout, inline CSS, no
<div>for layout. - Use MJML or email-templates to abstract the table-based layout layer.
- Inline CSS with
juiceorinline-css— every style must live on the element itself as astyleattribute.
import MarkdownIt from 'markdown-it';
import juice from 'juice';
const md = new MarkdownIt({ html: false });
// Step 1: Markdown to HTML
const bodyHtml = md.render(markdownInput);
// Step 2: Wrap in a template with styles to be inlined
const template = `<!DOCTYPE html>
<html>
<head>
<style>
body { font-family: Arial, sans-serif; font-size: 14px; }
p { line-height: 1.5; margin: 0 0 12px; }
code { background: #f4f4f4; padding: 2px 4px; font-size: 13px; }
pre { background: #f4f4f4; padding: 12px; overflow: auto; }
table { border-collapse: collapse; width: 100%; }
td, th { border: 1px solid #ddd; padding: 6px 10px; }
</style>
</head>
<body>${bodyHtml}</body>
</html>`;
// Step 3: Inline the CSS so email clients render it
const emailHtml = juice(template);Markdown tables map reasonably well to HTML tables, which email clients do support. Avoid nested tables and complex colspan layouts — they break in Outlook. Footnotes should be converted to inline parenthetical text for email, since anchored references do not work in most clients.
CMS Preview Workflow
Live Markdown preview in a CMS editor requires three considerations: performance, scroll synchronization, and debouncing. Parsing on every keystroke is wasteful; a 100–200ms debounce keeps the preview feeling live while avoiding unnecessary work.
import MarkdownIt from 'markdown-it';
import DOMPurify from 'dompurify';
const md = new MarkdownIt({ html: false, linkify: true });
let timer;
editor.addEventListener('input', () => {
clearTimeout(timer);
timer = setTimeout(() => {
const rawHtml = md.render(editor.value);
// Always sanitize before setting HTML
preview.innerHTML = DOMPurify.sanitize(rawHtml);
}, 150); // 150ms: fast enough to feel live, slow enough not to thrash
});For scroll synchronization, track the cursor line in the editor and map it to the corresponding element in the preview using source position data. Both remark and markdown-it can attach source position annotations to output elements via plugins, making this line-to-element mapping possible.
Common Gotchas
- Indented code vs fenced code: Four-space indentation creates a code block in CommonMark. This collides with list continuation paragraphs. Prefer fenced blocks (
```) in all new content. - Hard-break newlines: In CommonMark, a single newline inside a paragraph is not a line break — it becomes a space. A hard break requires two trailing spaces or a backslash before the newline. GFM parsers sometimes differ here.
- Raw URLs vs autolinks: CommonMark only linkifies URLs wrapped in angle brackets (
<https://example.com>). GFM linkifies bare URLs. If your content moves between parsers, angle-bracket wrapping is the portable choice. - HTML entities in code spans: Content inside backtick code spans is treated as literal text and HTML-escaped by the parser. You do not need to manually escape
<or&inside code spans. - Loose vs tight lists: A blank line between list items switches the list to "loose" mode, wrapping each item's content in a
<p>tag. This changes rendered spacing and surprises authors who expect tight lists.
Convert Markdown to HTML directly in your browser — no server, no data leaving your machine — with the Markdown to HTML converter. For live editing with instant preview, use the Markdown preview tool. For a broader look at text processing tools, see the text tools guide.