CSV ↔ JSON Conversion Pitfalls: Escaping, Quoting, Encoding

CSV is the format everyone thinks they understand until they try to import 50,000 rows of customer data and half the addresses land in the wrong columns. It looks like "values separated by commas," and that definition is right about 85% of the time. The other 15% is where your Tuesday morning goes.

Here's what actually trips people up, and how to not become the person who ships a broken CSV.

The Spec Nobody Reads

There is, in fact, a spec: RFC 4180. It's two pages long. Most tools implement a dialect of it, not the thing itself. The key rules:

Fields containing commas, quotes, or newlines must be wrapped in double quotes
Double quotes inside a quoted field are escaped by doubling them: " becomes ""
Records are separated by CRLF — yes, Windows line endings, in 2026, because that's what the RFC says

Most modern parsers accept LF too. Some legacy ones do not.

CSV ↔ JSON

Convert between CSV and JSON, with header control. Fully in-browser.

Pitfall 1: Embedded Commas and Quotes

This is the one everyone hits first.

Broken

id,name,address
1,Jane Doe,123 Main St, Apt 4B, Portland
2,"Smith, Bob",42 Oak Ave, Chicago

The first row silently becomes 5 fields. The second row is technically valid but the address still spills over.

Correct

id,name,address
1,"Jane Doe","123 Main St, Apt 4B, Portland"
2,"Smith, Bob","42 Oak Ave, Chicago"

Any field that might contain the delimiter gets quoted. Consistent, safe, boring. Boring is the goal.

And the nested-quote case:

Escaping a literal double quote

If your generator emits \" instead of "", it's lying to you — that's JSON escaping, not CSV escaping. Different rules.

Pitfall 2: Excel's "Helpful" Auto-Interpretation

Excel is the most popular CSV editor in the world and it will actively corrupt your data.

01234 (ZIP code) → 1234 (leading zero stripped, cast to number)
0123456789 (phone) → 1.23457E+08 (scientific notation)
3/4 (fraction in a product SKU) → 4-Mar (guessed as a date)
GENE1 (gene symbol) → Jan-1 (Excel's gene-name bug is so famous scientists renamed 27 genes to work around it)
Long numeric IDs (credit card, order ID) → silently rounded because IEEE 754

Excel is not a CSV tool

If your users will open your CSV in Excel, you have three options: (1) ship XLSX instead and keep types, (2) force-quote every numeric-looking string, or (3) document that "double-click this file in Excel" is not a supported workflow. Pick one and mean it.

The force-quote trick works some of the time — "01234" in the file often still gets coerced by Excel on open. The truly paranoid prefix with a tab character or single quote: ="01234". It's ugly. That's the price.

Pitfall 3: Encoding — UTF-8 BOM, Windows-1252, the Usual Suspects

Your CSV is bytes. Bytes need an encoding. If you don't declare one, every tool guesses, and they guess differently.

The specific failures you'll see:

UTF-8 without BOM opened in Excel on Windows → café shows up as cafÃ©
UTF-8 with BOM sent to a strict parser → first column name becomes \uFEFFid and your JOIN fails
Windows-1252 ("ANSI") sent to a UTF-8 parser → emoji and smart quotes break
Mixed encoding within one file → someone concatenated two exports and you're in pain

The pragmatic default for modern pipelines is UTF-8 with BOM, because Excel needs the BOM to detect UTF-8 and most other parsers strip it gracefully. If your consumer is strictly a program (not a human opening the file), UTF-8 without BOM.

Pitfall 4: Line Endings

Linux writes \n. Windows wants \r\n. Some old Macs wrote \r. Most tools handle all three — until you get the one that doesn't, and your "CSV" is now a single row with 4 million fields.

Rule: emit \r\n if you're producing a file for unknown consumers. Accept anything when parsing.

Pitfall 5: Semicolons vs Commas (the Locale Wars)

In Europe, the decimal separator is a comma. So 1,5 means "one and a half." If you use , as both decimal separator and field separator, you're doomed. European Excel solves this by using ; as the field delimiter:

Locale-aware dialect

If you're shipping CSVs to a mixed audience, consider:

TSV (tab-separated) — unambiguous, no decimals collide with tabs
JSON — if both ends are machines, just do this
Explicitly document the delimiter and decimal separator in the file header or API docs

Rules for Shipping CSV That Survives First Contact

UTF-8 with BOM for human-consumed files; without BOM for machine pipelines
Always quote fields that contain ,, ", newlines, or leading/trailing whitespace
Double up internal quotes (""), never backslash-escape
CRLF line endings for maximum compatibility
No empty header row — if you have no header, say so explicitly
Treat strings as strings — wrap ZIP codes, phone numbers, and IDs in quotes and hope the consumer respects it
If you control both ends and it's more than trivial data, use JSON or Parquet instead

When JSON Is the Right Call

CSV was designed for spreadsheets. JSON was designed for structured data. If your fields contain nested objects, arrays, or anything that isn't a scalar string/number, you're fighting the format.

CSV ↔ JSON

Convert between CSV and JSON, with header control. Fully in-browser.

The converter handles the RFC 4180 quoting rules correctly and lets you pick the header behavior, so you can flip back and forth without hand-editing.

The Takeaway

CSV is a simple format with a complicated ecosystem. The format itself is fine — it's the 40 years of inconsistent implementations, Excel's opinions, and locale differences that will get you. Be boring: quote everything questionable, declare UTF-8, document your dialect, and for anything non-trivial, skip CSV entirely.

CSV ↔ JSON Conversion Pitfalls: Escaping, Quoting, Encoding

The Spec Nobody Reads

CSV ↔ JSON

Pitfall 1: Embedded Commas and Quotes

Pitfall 2: Excel's "Helpful" Auto-Interpretation

Pitfall 3: Encoding — UTF-8 BOM, Windows-1252, the Usual Suspects

Pitfall 4: Line Endings

Pitfall 5: Semicolons vs Commas (the Locale Wars)

Rules for Shipping CSV That Survives First Contact

When JSON Is the Right Call

CSV ↔ JSON

The Takeaway

Try these tools

More articles

PDF Compression Strategies: When Each Works Best

Cron Expression Cheatsheet With Real Examples

Favicon Sizes That Actually Matter in 2026