Text ↔ Unicode Converter
Text
Unicode
Technical details
How the Text ↔ Unicode Converter Works
What the Tool Does
The Text ↔ Unicode tool converts text to and from Unicode code points in multiple notations: U+XXXX, JavaScript escapes (\uXXXX or \u{XXXXX}), decimal, 0xHEX, and HTML numeric entities (&#N; or &#xHEX;). The decoder is permissive — it recognizes any combination of these notations interspersed with text and extracts the code points. The encoder lets you choose the output format and a separator between code points.
Common Developer Use Cases
Frontend developers use the tool to translate emoji and CJK characters into safe ASCII escape sequences for JSON files, source code, or transport over restrictive channels. Localization engineers verify the exact code points used in a translated string. The tool is also helpful for spotting invisible or look-alike characters (homoglyphs, zero-width joiners, RTL marks) hiding inside a copied snippet.
Data Formats, Types, or Variants
Unicode standard notation is U+XXXX for the Basic Multilingual Plane and U+XXXXX for supplementary planes. JavaScript escapes use \uXXXX for BMP and \u{XXXXX} (ES2015+) for the full range. HTML accepts numeric entities (decimal &#N; or hex &#xN;). Plain decimal and 0xHEX formats round-trip through codePointAt / fromCodePoint without further interpretation. Astral characters (emoji like 🌍) are represented as a single code point above U+FFFF.
Common Pitfalls and Edge Cases
Astral characters cannot be expressed with the older \uXXXX form because each escape is only 4 hex digits — use \u{...} or a surrogate pair. Code points above U+10FFFF are invalid per the Unicode spec and will be rejected. Combining marks and emoji ZWJ sequences appear as multiple code points even though they render as a single glyph. The decoder will not interpret HTML named entities (©, &) — use a dedicated HTML decoder for those.
When to Use This Tool vs Code
Use the browser tool for quick inspection, escape-sequence generation, and homoglyph hunting. In code, prefer language-native string APIs (`String.fromCodePoint`, `Array.from(str)` for code-point iteration in JS; `chr` and `ord` in Python; `Character.toCodePoint` in Java) plus full ICU libraries when you need normalization (NFC/NFD), case folding, or grapheme-cluster segmentation.