Unicode Inspector

Inspect Unicode code points, escapes, normalization forms, and byte breakdowns of any string.

About this tool

Unicode Inspector decodes any string into its underlying code points, encodings, and normalization forms so you can debug emoji, accented characters, bidi text, zero-width gremlins, and mystery bytes in one place. Everything runs in your browser — paste a string and get a per-character breakdown with UTF-8 bytes, UTF-16 units, HTML entities, and JavaScript / CSS escape sequences, plus all four Unicode normalization forms (NFC, NFD, NFKC, NFKD) side-by-side.

How to use Unicode Inspector

Paste or type a string
Drop in any text — plain ASCII, multilingual scripts, emoji with skin-tone modifiers, or invisible control characters. The tool inspects whatever you give it.
Read the summary row
See characters, graphemes (via Intl.Segmenter), code points, UTF-8 bytes, and UTF-16 code units at a glance. Graphemes explain why one visible emoji can be many code points.
Drill into each code point
The table lists every code point with its U+XXXX value, UTF-8 and UTF-16 byte breakdown, HTML entity, JS escape, and CSS escape — ready to copy into source code.
Compare normalization forms
Check whether NFC, NFD, NFKC, or NFKD change your string. Useful for debugging search, dedup, and storage bugs caused by combining marks or compatibility characters.
Copy the hex dump
Grab a classic offset / hex / ASCII dump of the UTF-8 bytes — ideal for pasting into bug reports, tickets, or commit messages.

Why use Unicode Inspector

Code point count, grapheme count, UTF-8 byte count, and UTF-16 unit count
Per-code-point table with U+XXXX, UTF-8 bytes, UTF-16 units, HTML entity, JS escape, and CSS escape
All four normalization forms (NFC, NFD, NFKC, NFKD) with a clear differs/identical indicator
Grapheme segmentation powered by Intl.Segmenter for accurate visible-character counts
Control characters, zero-width joiners, and BOMs labelled instead of silently rendered
Copyable UTF-8 hex dump with offset and ASCII sidebar
100% in-browser — your input never leaves the page

Frequently asked questions

Why do some emoji count as multiple code points?

Many emoji are built from multiple code points joined with zero-width joiners (ZWJ) or modifier characters like skin-tone selectors. Intl.Segmenter groups these into a single grapheme — that's the visible 'character' a user sees, while the code point count reflects how the string is actually stored.

What's the difference between NFC, NFD, NFKC, and NFKD?

NFC composes characters (e.g. 'é' stays as one code point), NFD decomposes them (e.g. 'é' becomes 'e' + combining acute). NFKC and NFKD additionally apply compatibility mappings — things like turning ligatures and full-width forms into their plain equivalents. If search, deduplication, or string comparison is misbehaving, normalization mismatches are usually the cause.

Why do I see bytes like 'EF BB BF' at the start of my string?

That's a UTF-8 byte-order mark (BOM, U+FEFF). It's invisible but can break string comparisons and JSON parsing. The inspector surfaces it so you can see and remove it.

Is the JS escape the same as the CSS escape?

No. JavaScript uses \uXXXX for BMP code points and \u{XXXXX} for astral ones. CSS uses \XXXX (hex digits, optionally followed by whitespace) and always works for any code point. The tool shows both so you can paste straight into source code.

Does anything leave my browser?

No. All analysis uses built-in JavaScript APIs (TextEncoder, Intl.Segmenter, String.prototype.normalize). No network requests, no logging.

Unicode Inspector

Inspect Unicode code points, escapes, normalization forms, and byte breakdowns of any string.

About this tool

How to use Unicode Inspector

Paste or type a string
Drop in any text — plain ASCII, multilingual scripts, emoji with skin-tone modifiers, or invisible control characters. The tool inspects whatever you give it.
Read the summary row
See characters, graphemes (via Intl.Segmenter), code points, UTF-8 bytes, and UTF-16 code units at a glance. Graphemes explain why one visible emoji can be many code points.
Drill into each code point
The table lists every code point with its U+XXXX value, UTF-8 and UTF-16 byte breakdown, HTML entity, JS escape, and CSS escape — ready to copy into source code.
Compare normalization forms
Check whether NFC, NFD, NFKC, or NFKD change your string. Useful for debugging search, dedup, and storage bugs caused by combining marks or compatibility characters.
Copy the hex dump
Grab a classic offset / hex / ASCII dump of the UTF-8 bytes — ideal for pasting into bug reports, tickets, or commit messages.

Why use Unicode Inspector

Code point count, grapheme count, UTF-8 byte count, and UTF-16 unit count
Per-code-point table with U+XXXX, UTF-8 bytes, UTF-16 units, HTML entity, JS escape, and CSS escape
All four normalization forms (NFC, NFD, NFKC, NFKD) with a clear differs/identical indicator
Grapheme segmentation powered by Intl.Segmenter for accurate visible-character counts
Control characters, zero-width joiners, and BOMs labelled instead of silently rendered
Copyable UTF-8 hex dump with offset and ASCII sidebar
100% in-browser — your input never leaves the page

Frequently asked questions

Why do some emoji count as multiple code points?

What's the difference between NFC, NFD, NFKC, and NFKD?

Why do I see bytes like 'EF BB BF' at the start of my string?

That's a UTF-8 byte-order mark (BOM, U+FEFF). It's invisible but can break string comparisons and JSON parsing. The inspector surfaces it so you can see and remove it.

Is the JS escape the same as the CSS escape?

Does anything leave my browser?

No. All analysis uses built-in JavaScript APIs (TextEncoder, Intl.Segmenter, String.prototype.normalize). No network requests, no logging.

Unicode Inspector

About this tool

How to use Unicode Inspector

Why use Unicode Inspector

Frequently asked questions

Related Tools

Unicode Inspector

About this tool

How to use Unicode Inspector

Why use Unicode Inspector

Frequently asked questions

Related Tools