DevToys Pro

free web developer tools

Blog
Rate us:
Try browser extension:
← Back to Blog

GZip vs Deflate vs Zlib — Formats and Headers

14 min read

You're trying to decompress data and getting "invalid header" or "incorrect header check" errors. Or maybe compressed data works with one tool but fails with another. The problem? GZip, Deflate, and Zlib all use the same core compression algorithm (DEFLATE) but wrap it in different headers and checksums. Understanding these format differences is essential for debugging compression issues, working with APIs, and choosing the right compression format.

The Core: DEFLATE Algorithm

At the heart of all three formats is the DEFLATE compression algorithm, defined in RFC 1951. DEFLATE combines LZ77 (sliding window compression) with Huffman coding to achieve efficient data compression. It's the same algorithm used in ZIP files, PNG images, and HTTP compression.

DEFLATE produces a raw compressed data stream with no headers, no checksums, and no metadata about the original data. This raw format is efficient but requires additional wrapping to be useful in practice.

Format #1: GZip (RFC 1952)

GZip is the most common format you'll encounter. It wraps DEFLATE compressed data with:

  • 10-byte header: Starts with magic bytes 1F 8B
  • Metadata: Compression method, flags, timestamp, OS type
  • Optional extras: Original filename, comment, extra fields
  • DEFLATE compressed data
  • 8-byte footer: CRC-32 checksum + uncompressed size (mod 2^32)

GZip Header Structure

Byte 0-1:   1F 8B          # Magic number (identifies GZip format)
Byte 2:     08             # Compression method (08 = DEFLATE)
Byte 3:     Flags          # Optional features (filename, comment, etc.)
Byte 4-7:   Timestamp      # Modification time (Unix timestamp)
Byte 8:     Extra flags    # Compression level indicator
Byte 9:     OS             # Operating system (0=FAT, 3=Unix, 11=NTFS)
...         [optional]     # Filename, comment, extra data
...         DEFLATE data   # Compressed payload
...         CRC-32         # 4 bytes: checksum of uncompressed data
...         Size           # 4 bytes: uncompressed size mod 2^32

When GZip is Used

  • HTTP compression: Content-Encoding: gzip
  • File compression: .gz files (e.g., archive.tar.gz)
  • Gzip command-line tool: gzip / gunzip
  • Git objects: Compressed with GZip format

GZip Example

# Original text
"Hello, World!"

# GZip compressed (hex):
1F 8B 08 00 00 00 00 00 00 03 F3 48 CD C9 C9 D7 
51 08 CF 2F CA 49 51 04 00 D0 C3 4A EC 0D 00 00 00

# Breakdown:
1F 8B       - GZip magic number
08          - DEFLATE compression
00          - No flags
00 00 00 00 - No timestamp
00          - Default compression
03          - Unix OS
...         - DEFLATE compressed data
D0 C3 4A EC - CRC-32 checksum
0D 00 00 00 - Uncompressed size (13 bytes)

Format #2: Zlib (RFC 1950)

Zlib is similar to GZip but with a simpler header structure. It wraps DEFLATE compressed data with:

  • 2-byte header: Compression method and flags
  • DEFLATE compressed data
  • 4-byte footer: Adler-32 checksum (faster than CRC-32)

Zlib Header Structure

Byte 0:     CMF            # Compression Method and Flags
            Bits 0-3:  CM = 8       # Compression method (8 = DEFLATE)
            Bits 4-7:  CINFO       # Window size (log2(window) - 8)
Byte 1:     FLG            # Flags
            Bits 0-4:  FCHECK      # Header checksum
            Bit 5:     FDICT       # Preset dictionary flag
            Bits 6-7:  FLEVEL      # Compression level
...         DEFLATE data   # Compressed payload
...         Adler-32       # 4 bytes: checksum of uncompressed data

When Zlib is Used

  • PNG images: Each PNG chunk is Zlib-compressed
  • PDF files: Embedded streams use Zlib compression
  • Python zlib module: Default format
  • Java Deflater/Inflater: Default format
  • OpenSSL: Default compression format

Zlib Example

# Original text
"Hello, World!"

# Zlib compressed (hex):
78 9C F3 48 CD C9 C9 D7 51 08 CF 2F CA 49 51 04 
00 1F 9E 04 6A

# Breakdown:
78          - CMF: DEFLATE, 32K window
9C          - FLG: Default compression, checksum
...         - DEFLATE compressed data
1F 9E 04 6A - Adler-32 checksum

Format #3: Raw DEFLATE

Raw DEFLATE is just the compressed data stream with no headers or checksums at all. This is the most compact format but requires external knowledge of the data to decompress and verify correctly.

When Raw DEFLATE is Used

  • HTTP compression: Content-Encoding: deflate (technically should be Zlib, but some servers send raw DEFLATE)
  • ZIP files: Individual entries use raw DEFLATE (headers/checksums are in the ZIP structure)
  • 7-Zip: DEFLATE method uses raw format

Raw DEFLATE Example

# Original text
"Hello, World!"

# Raw DEFLATE compressed (hex):
F3 48 CD C9 C9 D7 51 08 CF 2F CA 49 51 04 00

# No header, no footer, just compressed data

Format Comparison Table

| Feature          | GZip      | Zlib      | Raw DEFLATE |
|------------------|-----------|-----------|-------------|
| Header Size      | 10+ bytes | 2 bytes   | 0 bytes     |
| Checksum         | CRC-32    | Adler-32  | None        |
| Metadata         | Yes       | Minimal   | None        |
| Overhead         | ~18 bytes | ~6 bytes  | 0 bytes     |
| Magic Bytes      | 1F 8B     | 78 XX     | None        |
| Size Info        | Yes       | No        | No          |
| Timestamp        | Yes       | No        | No          |
| RFC              | 1952      | 1950      | 1951        |

Why Tools Fail: Format Mismatch

The most common compression error is trying to decompress data with the wrong format expectation:

Error #1: "incorrect header check"

You're trying to decompress GZip data as Zlib (or vice versa). The decompressor reads the first two bytes as a Zlib header and the checksum fails.

# Python example: Wrong format
import zlib

gzip_data = b'\x1f\x8b\x08...'  # GZip format
zlib.decompress(gzip_data)
# Error: zlib.error: Error -3 while decompressing: incorrect header check

Fix: Use zlib.decompress(data, wbits=16+zlib.MAX_WBITS) for GZip or gzip.decompress()

Error #2: "invalid stored block lengths"

You're trying to decompress Zlib data as raw DEFLATE (or vice versa).

# Python example: Missing header
import zlib

zlib_data = b'\x78\x9c...'  # Zlib format
zlib.decompress(zlib_data, wbits=-zlib.MAX_WBITS)  # Raw DEFLATE mode
# Error: zlib.error: Error -5 while decompressing: incomplete or truncated stream

Fix: Use positive wbits for Zlib, negative for raw DEFLATE

Error #3: "invalid distance too far back"

You're decompressing with the wrong window size or the data is corrupted.

Language-Specific Compression APIs

Python

import zlib
import gzip

# GZip format
gzip_data = gzip.compress(b"Hello")       # Create GZip
original = gzip.decompress(gzip_data)     # Decompress GZip

# Zlib format (default)
zlib_data = zlib.compress(b"Hello")       # Create Zlib
original = zlib.decompress(zlib_data)     # Decompress Zlib

# Raw DEFLATE
deflate_data = zlib.compress(b"Hello", wbits=-zlib.MAX_WBITS)
original = zlib.decompress(deflate_data, wbits=-zlib.MAX_WBITS)

# Auto-detect format
def decompress_any(data):
    # Try GZip
    if data[:2] == b'\x1f\x8b':
        return gzip.decompress(data)
    # Try Zlib
    elif data[:1] == b'\x78':
        return zlib.decompress(data)
    # Try raw DEFLATE
    else:
        return zlib.decompress(data, wbits=-zlib.MAX_WBITS)

JavaScript (Node.js)

const zlib = require('zlib');

// GZip format
const gzipData = zlib.gzipSync(Buffer.from('Hello'));
const original = zlib.gunzipSync(gzipData);

// Zlib format
const zlibData = zlib.deflateSync(Buffer.from('Hello'));
const original = zlib.inflateSync(zlibData);

// Raw DEFLATE
const deflateData = zlib.deflateRawSync(Buffer.from('Hello'));
const original = zlib.inflateRawSync(deflateData);

Java

import java.util.zip.*;

// Zlib format (default)
Deflater deflater = new Deflater();
deflater.setInput(data);
deflater.finish();
byte[] zlibData = new byte[1024];
int size = deflater.deflate(zlibData);

// Raw DEFLATE
Deflater deflater = new Deflater(Deflater.DEFAULT_COMPRESSION, true); // nowrap=true
// ... same as above

// GZip format
ByteArrayOutputStream baos = new ByteArrayOutputStream();
GZIPOutputStream gzos = new GZIPOutputStream(baos);
gzos.write(data);
gzos.close();
byte[] gzipData = baos.toByteArray();

HTTP Content-Encoding Confusion

The HTTP Content-Encoding: deflate header is ambiguous. According to RFC 2616, it should use Zlib format, but many implementations use raw DEFLATE instead.

What Browsers Expect

  • Content-Encoding: gzip — GZip format (most common, best supported)
  • Content-Encoding: deflate — Zlib format (RFC spec) OR raw DEFLATE (common bug)
  • Content-Encoding: br — Brotli (newer, better compression)

Best Practice for HTTP

Always use GZip for HTTP compression. It's universally supported, clearly defined, and avoids the deflate ambiguity.

# Nginx configuration
gzip on;
gzip_types text/plain text/css application/json application/javascript;

# Apache configuration
<IfModule mod_deflate.c>
  AddOutputFilterByType DEFLATE text/html text/plain text/xml
</IfModule>

Detecting Compression Format

You can detect the format by inspecting the first few bytes:

function detectFormat(data) {
  if (data.length < 2) return 'unknown';
  
  // Check GZip magic number
  if (data[0] === 0x1f && data[1] === 0x8b) {
    return 'gzip';
  }
  
  // Check Zlib header
  // CMF byte: 0x78 is common (DEFLATE with 32K window)
  // FLG byte varies but header checksum must be valid
  if (data[0] === 0x78 && (data[0] * 256 + data[1]) % 31 === 0) {
    return 'zlib';
  }
  
  // Might be raw DEFLATE (no reliable detection)
  return 'raw-deflate (or unknown)';
}

// Examples
detectFormat([0x1f, 0x8b, 0x08, ...]);  // 'gzip'
detectFormat([0x78, 0x9c, ...]);        // 'zlib'
detectFormat([0x78, 0xda, ...]);        // 'zlib' (higher compression)
detectFormat([0xf3, 0x48, ...]);        // 'raw-deflate (or unknown)'

Compression Levels and Trade-offs

All three formats support compression levels (typically 0-9 or 1-9, where higher = better compression but slower):

  • Level 0: No compression (store only) — fast but no space savings
  • Level 1-3: Fast compression — good for real-time data
  • Level 6: Default — balanced speed and compression
  • Level 9: Maximum compression — slow but best ratio

Compression Ratio Comparison

For typical text data (like JSON or HTML):

  • GZip level 6: ~70-80% compression (3-5x smaller)
  • GZip level 9: ~75-85% compression (slightly better, much slower)
  • Brotli level 11: ~80-90% compression (best for static files)

Real-World Debugging Scenario

You're downloading compressed API responses and getting decompression errors. Here's how to debug:

Step 1: Capture the Raw Data

# Python example
import requests

response = requests.get('https://api.example.com/data', 
                       headers={'Accept-Encoding': 'gzip'})

# Raw compressed data
compressed = response.content

# Check Content-Encoding header
print(response.headers.get('Content-Encoding'))  # 'gzip' or 'deflate'

# Inspect first bytes
print(compressed[:10].hex())  # '1f8b08...' = GZip

Step 2: Detect Format

# Check magic bytes
if compressed[:2] == b'\x1f\x8b':
    print("Format: GZip")
    data = gzip.decompress(compressed)
elif compressed[:1] == b'\x78':
    print("Format: Zlib")
    data = zlib.decompress(compressed)
else:
    print("Format: Raw DEFLATE (or unknown)")
    try:
        data = zlib.decompress(compressed, wbits=-zlib.MAX_WBITS)
    except:
        print("Not raw DEFLATE either!")

Step 3: Verify Decompression

# Check if decompressed data looks correct
print(f"Decompressed size: {len(data)} bytes")
print(f"First 100 chars: {data[:100]}")

# Try parsing as JSON if expected
import json
try:
    parsed = json.loads(data)
    print("Valid JSON!")
except:
    print("Not valid JSON - might still be compressed or corrupted")

Using Compression Tools

When working with compressed data, use tools that support all three formats:

The GZip tool on DevToys Pro supports:

  • GZip, Zlib, and raw DEFLATE formats
  • Automatic format detection
  • Adjustable compression levels
  • Base64 encoding for transport
  • Error messages when decompression fails

Best Practices

  1. Use GZip for HTTP and file compression — most widely supported
  2. Use Zlib for embedded compression — PNG, PDF, internal data structures
  3. Avoid raw DEFLATE — no checksum makes corruption detection impossible
  4. Always verify checksums — detect corrupted data early
  5. Document your format choice — prevent integration issues
  6. Test with incompressible data — ensure your code handles expansion gracefully
  7. Consider Brotli for static files — better compression than GZip

Common Pitfalls

  • Assuming "deflate" means raw DEFLATE — it should be Zlib but isn't always
  • Not checking magic bytes — leads to cryptic decompression errors
  • Ignoring checksum errors — corrupted data can cause security issues
  • Using wrong window size — causes "invalid distance" errors
  • Compressing already compressed data — wastes CPU and actually expands size

Key Takeaways

  • GZip, Zlib, and raw DEFLATE all use the same DEFLATE algorithm
  • GZip: 10-byte header + CRC-32, most common for files and HTTP
  • Zlib: 2-byte header + Adler-32, used in PNG, PDF, and many libraries
  • Raw DEFLATE: no headers or checksums, used in ZIP files
  • Detect format by checking magic bytes: 0x1f8b = GZip, 0x78XX = Zlib
  • HTTP "deflate" encoding is ambiguous — stick with GZip
  • Always use checksums to detect corrupted data

Related Tools: