Inspect Unicode code points, escapes, normalization forms, and byte breakdowns of any string.
Unicode Inspector decodes any string into its underlying code points, encodings, and normalization forms so you can debug emoji, accented characters, bidi text, zero-width gremlins, and mystery bytes in one place. Everything runs in your browser — paste a string and get a per-character breakdown with UTF-8 bytes, UTF-16 units, HTML entities, and JavaScript / CSS escape sequences, plus all four Unicode normalization forms (NFC, NFD, NFKC, NFKD) side-by-side.
Paste or type a string
Drop in any text — plain ASCII, multilingual scripts, emoji with skin-tone modifiers, or invisible control characters. The tool inspects whatever you give it.
Read the summary row
See characters, graphemes (via Intl.Segmenter), code points, UTF-8 bytes, and UTF-16 code units at a glance. Graphemes explain why one visible emoji can be many code points.
Drill into each code point
The table lists every code point with its U+XXXX value, UTF-8 and UTF-16 byte breakdown, HTML entity, JS escape, and CSS escape — ready to copy into source code.
Compare normalization forms
Check whether NFC, NFD, NFKC, or NFKD change your string. Useful for debugging search, dedup, and storage bugs caused by combining marks or compatibility characters.
Copy the hex dump
Grab a classic offset / hex / ASCII dump of the UTF-8 bytes — ideal for pasting into bug reports, tickets, or commit messages.
Many emoji are built from multiple code points joined with zero-width joiners (ZWJ) or modifier characters like skin-tone selectors. Intl.Segmenter groups these into a single grapheme — that's the visible 'character' a user sees, while the code point count reflects how the string is actually stored.
NFC composes characters (e.g. 'é' stays as one code point), NFD decomposes them (e.g. 'é' becomes 'e' + combining acute). NFKC and NFKD additionally apply compatibility mappings — things like turning ligatures and full-width forms into their plain equivalents. If search, deduplication, or string comparison is misbehaving, normalization mismatches are usually the cause.
That's a UTF-8 byte-order mark (BOM, U+FEFF). It's invisible but can break string comparisons and JSON parsing. The inspector surfaces it so you can see and remove it.
No. JavaScript uses \uXXXX for BMP code points and \u{XXXXX} for astral ones. CSS uses \XXXX (hex digits, optionally followed by whitespace) and always works for any code point. The tool shows both so you can paste straight into source code.
No. All analysis uses built-in JavaScript APIs (TextEncoder, Intl.Segmenter, String.prototype.normalize). No network requests, no logging.