ZairussalamTools

PDF Compression Strategies: When Each Works Best

Why the same compress button gives wildly different results. Vector-text PDFs that grow instead of shrinking, scan-heavy PDFs that drop 90%, and when to skip compression entirely.

·Ibrahimsyah Zairussalam·

Someone hands you a 40MB PDF and says "can you make this smaller?" You run it through a compressor. Sometimes you get back 4MB. Sometimes you get back 41MB. Sometimes the text turns into blurry mush and your client is annoyed.

None of this is random. PDFs are containers with wildly different internal structures, and "compress" means different things depending on what's inside. Here's how to think about it.

What's Actually Inside a PDF

Every PDF is a bundle of streams. A stream might be:

  • Vector text and paths — glyphs, bezier curves. Tiny. Already compressed with Flate (zlib).
  • Embedded fonts — subsetted or full. A few hundred KB each.
  • Raster images — photos, scans, icons. These are 95% of the file size in a typical "big" PDF.
  • Forms, annotations, metadata — usually negligible.

The size of a PDF is almost always dominated by its raster images. Compression tools therefore work by re-encoding those images at lower quality or resolution. If there are no raster images to work with, there's nothing to compress.

The Four Types of PDF

Every PDF you'll encounter falls into roughly one of these buckets. The right strategy is different for each.

Vector-text PDF

Examples: LaTeX output, Figma exports, Word → Export as PDF, design portfolios

Contents: Text as real glyphs, vector shapes, maybe some small PNG icons.

Typical size: 50KB – 2MB for hundreds of pages.

Compression result: Often grows. Re-encoding pipelines may rasterize clean text into JPEG and inflate the file.

Scan-heavy PDF

Examples: Scanned contracts, receipts photographed with a phone, old books digitized

Contents: One giant raster image per page, usually 300 DPI color.

Typical size: 20MB – 500MB.

Compression result: Massive wins. 80–95% reduction is normal with JPEG re-encoding.

Mixed PDF

Examples: Product manuals, whitepapers with figures, reports with embedded screenshots

Contents: Real text plus raster images (screenshots, photos, diagrams).

Typical size: 5MB – 80MB.

Compression result: Moderate wins. 40–70% reduction, concentrated on the image streams. Text stays sharp.

Already-compressed PDF

Examples: Files that went through a compression pipeline, web-optimized exports, old faxes

Contents: Images already JPEGed at low quality, text flate-compressed.

Typical size: Whatever it is.

Compression result: Near-zero or negative. Running JPEG-on-JPEG adds artifacts without shrinking anything meaningful.

Why Vector-Text PDFs Sometimes Grow

This surprises people. You run a 300-page LaTeX thesis PDF (1.2MB) through a compressor and get back a 4.5MB file.

Here's what happened: the compressor rasterized each page at, say, 200 DPI, JPEGed the rasters, and dropped them back in. Your crisp vector text is now a pixel grid. Worse, 200 DPI × 8.5×11" is about 3 megapixels per page, and JPEG-compressing 300 of those is much bigger than the original vector stream.

If your PDF is already text-based, don't compress it

Open the PDF. If you can select text, zoom to 400% and it stays crisp, and the file is under ~5MB, just ship it. Compression is for files dominated by raster content. Running it on a clean vector PDF is how you make things worse.

Scan-Heavy PDFs: Where Compression Earns Its Keep

Phone-scanned receipts are the worst offenders. A modern phone camera shoots 12MP+. A scanning app often saves each page as a lossless PNG or a very-high-quality JPEG and wraps it in a PDF. A 10-page scan can easily be 200MB.

These compress beautifully because:

  • 300 DPI is usually overkill — 150 DPI reads fine on screen and for home printers
  • JPEG at quality 75 is indistinguishable from quality 95 on text and line art
  • Converting grayscale documents from RGB color saves 2/3 of the data immediately

A good PDF compressor applied to a scan-heavy file should get you 80–95% reduction with no visible loss. If you're not hitting those numbers, either the scans were already compressed once, or the compressor is being too conservative.

The OCR Question

If your scan contains text (contracts, books, receipts), consider re-OCRing before compressing. Here's why:

  • A searchable PDF has a hidden text layer behind the image. That text layer is vector and tiny.
  • After aggressive image compression, the text layer still makes the document searchable and copyable.
  • Some scan tools save both the text layer AND a high-res image "just in case" — bloating the file.

A clean workflow for a scanned contract:

  1. Run OCR to produce a text layer
  2. Compress the image layer aggressively (150 DPI, JPEG 70)
  3. Keep the text layer intact

The result is small AND searchable. Many consumer tools skip the OCR step and just compress, losing the searchability even if it was there.

When to Split Instead

Sometimes the real answer isn't "compress" — it's "split." A 300MB product catalog full of images should be 300MB if the images are meant to be high-res. Chopping it into section PDFs makes it deliverable without lossy re-encoding.

Reach for split when:

  • The file is bloated because it's comprehensive, not because it's wasteful
  • Different sections go to different audiences
  • You're hitting email attachment limits (25MB typical) and a 120MB download is the wrong fix

When to Skip Compression and Re-Export From the Source

The absolute best "compression" is often not compression at all:

  • Figma / design tools → Export as PDF with "Embed fonts: subset" and image quality set explicitly
  • InDesign → Use "Smallest File Size" preset, which tunes image DPI and JPEG quality automatically
  • Word / Google Docs → "Minimum size" or "web-optimized" export
  • LaTeX → Use \pdfcompresslevel=9 and subset fonts; don't post-process

A well-exported 10MB PDF beats a badly-compressed 8MB one every time, because the compressor isn't guessing what's important.

If you don't have access to the source app — which is the common case — in-browser compression is the right call. Run it, check the output at the zoom levels your recipients will use, and don't trust the percentage reduction blindly.

A Decision Flowchart

When someone hands you a PDF and says "make it smaller":

  1. Is it under 5MB and mostly text? → Don't compress. Ship as is.
  2. Is it mostly scans / photos? → Compress aggressively. Expect 80%+ reduction.
  3. Is it a mixed document with real text + images? → Compress moderately. Check text quality at 100% zoom after.
  4. Has it already been through a compression pipeline? → Don't double-compress. Go back to the source if you can.
  5. Is it huge because it's a comprehensive catalog? → Split, don't compress.
  6. Do you control the source app? → Re-export with better settings instead.

The Takeaway

PDF compression isn't a button, it's a strategy that depends on what's inside the file. Raster-heavy files have room to shrink; vector-heavy files don't. Check what you have before you start. And remember that "smaller" isn't the only goal — sharper text, faster opening, and preserved searchability are often worth more than the last 10% of size.