GZip vs Deflate vs Zlib — Formats and Headers
You're trying to decompress data and getting "invalid header" or "incorrect header check" errors. Or maybe compressed data works with one tool but fails with another. The problem? GZip, Deflate, and Zlib all use the same core compression algorithm (DEFLATE) but wrap it in different headers and checksums. Understanding these format differences is essential for debugging compression issues, working with APIs, and choosing the right compression format.
The Core: DEFLATE Algorithm
At the heart of all three formats is the DEFLATE compression algorithm, defined in RFC 1951. DEFLATE combines LZ77 (sliding window compression) with Huffman coding to achieve efficient data compression. It's the same algorithm used in ZIP files, PNG images, and HTTP compression.
DEFLATE produces a raw compressed data stream with no headers, no checksums, and no metadata about the original data. This raw format is efficient but requires additional wrapping to be useful in practice.
Format #1: GZip (RFC 1952)
GZip is the most common format you'll encounter. It wraps DEFLATE compressed data with:
- 10-byte header: Starts with magic bytes
1F 8B - Metadata: Compression method, flags, timestamp, OS type
- Optional extras: Original filename, comment, extra fields
- DEFLATE compressed data
- 8-byte footer: CRC-32 checksum + uncompressed size (mod 2^32)
GZip Header Structure
Byte 0-1: 1F 8B # Magic number (identifies GZip format)
Byte 2: 08 # Compression method (08 = DEFLATE)
Byte 3: Flags # Optional features (filename, comment, etc.)
Byte 4-7: Timestamp # Modification time (Unix timestamp)
Byte 8: Extra flags # Compression level indicator
Byte 9: OS # Operating system (0=FAT, 3=Unix, 11=NTFS)
... [optional] # Filename, comment, extra data
... DEFLATE data # Compressed payload
... CRC-32 # 4 bytes: checksum of uncompressed data
... Size # 4 bytes: uncompressed size mod 2^32When GZip is Used
- HTTP compression:
Content-Encoding: gzip - File compression:
.gzfiles (e.g.,archive.tar.gz) - Gzip command-line tool:
gzip/gunzip - Git objects: Compressed with GZip format
GZip Example
# Original text
"Hello, World!"
# GZip compressed (hex):
1F 8B 08 00 00 00 00 00 00 03 F3 48 CD C9 C9 D7
51 08 CF 2F CA 49 51 04 00 D0 C3 4A EC 0D 00 00 00
# Breakdown:
1F 8B - GZip magic number
08 - DEFLATE compression
00 - No flags
00 00 00 00 - No timestamp
00 - Default compression
03 - Unix OS
... - DEFLATE compressed data
D0 C3 4A EC - CRC-32 checksum
0D 00 00 00 - Uncompressed size (13 bytes)Format #2: Zlib (RFC 1950)
Zlib is similar to GZip but with a simpler header structure. It wraps DEFLATE compressed data with:
- 2-byte header: Compression method and flags
- DEFLATE compressed data
- 4-byte footer: Adler-32 checksum (faster than CRC-32)
Zlib Header Structure
Byte 0: CMF # Compression Method and Flags
Bits 0-3: CM = 8 # Compression method (8 = DEFLATE)
Bits 4-7: CINFO # Window size (log2(window) - 8)
Byte 1: FLG # Flags
Bits 0-4: FCHECK # Header checksum
Bit 5: FDICT # Preset dictionary flag
Bits 6-7: FLEVEL # Compression level
... DEFLATE data # Compressed payload
... Adler-32 # 4 bytes: checksum of uncompressed dataWhen Zlib is Used
- PNG images: Each PNG chunk is Zlib-compressed
- PDF files: Embedded streams use Zlib compression
- Python zlib module: Default format
- Java Deflater/Inflater: Default format
- OpenSSL: Default compression format
Zlib Example
# Original text
"Hello, World!"
# Zlib compressed (hex):
78 9C F3 48 CD C9 C9 D7 51 08 CF 2F CA 49 51 04
00 1F 9E 04 6A
# Breakdown:
78 - CMF: DEFLATE, 32K window
9C - FLG: Default compression, checksum
... - DEFLATE compressed data
1F 9E 04 6A - Adler-32 checksumFormat #3: Raw DEFLATE
Raw DEFLATE is just the compressed data stream with no headers or checksums at all. This is the most compact format but requires external knowledge of the data to decompress and verify correctly.
When Raw DEFLATE is Used
- HTTP compression:
Content-Encoding: deflate(technically should be Zlib, but some servers send raw DEFLATE) - ZIP files: Individual entries use raw DEFLATE (headers/checksums are in the ZIP structure)
- 7-Zip: DEFLATE method uses raw format
Raw DEFLATE Example
# Original text
"Hello, World!"
# Raw DEFLATE compressed (hex):
F3 48 CD C9 C9 D7 51 08 CF 2F CA 49 51 04 00
# No header, no footer, just compressed dataFormat Comparison Table
| Feature | GZip | Zlib | Raw DEFLATE |
|------------------|-----------|-----------|-------------|
| Header Size | 10+ bytes | 2 bytes | 0 bytes |
| Checksum | CRC-32 | Adler-32 | None |
| Metadata | Yes | Minimal | None |
| Overhead | ~18 bytes | ~6 bytes | 0 bytes |
| Magic Bytes | 1F 8B | 78 XX | None |
| Size Info | Yes | No | No |
| Timestamp | Yes | No | No |
| RFC | 1952 | 1950 | 1951 |Why Tools Fail: Format Mismatch
The most common compression error is trying to decompress data with the wrong format expectation:
Error #1: "incorrect header check"
You're trying to decompress GZip data as Zlib (or vice versa). The decompressor reads the first two bytes as a Zlib header and the checksum fails.
# Python example: Wrong format
import zlib
gzip_data = b'\x1f\x8b\x08...' # GZip format
zlib.decompress(gzip_data)
# Error: zlib.error: Error -3 while decompressing: incorrect header checkFix: Use zlib.decompress(data, wbits=16+zlib.MAX_WBITS) for GZip or gzip.decompress()
Error #2: "invalid stored block lengths"
You're trying to decompress Zlib data as raw DEFLATE (or vice versa).
# Python example: Missing header
import zlib
zlib_data = b'\x78\x9c...' # Zlib format
zlib.decompress(zlib_data, wbits=-zlib.MAX_WBITS) # Raw DEFLATE mode
# Error: zlib.error: Error -5 while decompressing: incomplete or truncated streamFix: Use positive wbits for Zlib, negative for raw DEFLATE
Error #3: "invalid distance too far back"
You're decompressing with the wrong window size or the data is corrupted.
Language-Specific Compression APIs
Python
import zlib
import gzip
# GZip format
gzip_data = gzip.compress(b"Hello") # Create GZip
original = gzip.decompress(gzip_data) # Decompress GZip
# Zlib format (default)
zlib_data = zlib.compress(b"Hello") # Create Zlib
original = zlib.decompress(zlib_data) # Decompress Zlib
# Raw DEFLATE
deflate_data = zlib.compress(b"Hello", wbits=-zlib.MAX_WBITS)
original = zlib.decompress(deflate_data, wbits=-zlib.MAX_WBITS)
# Auto-detect format
def decompress_any(data):
# Try GZip
if data[:2] == b'\x1f\x8b':
return gzip.decompress(data)
# Try Zlib
elif data[:1] == b'\x78':
return zlib.decompress(data)
# Try raw DEFLATE
else:
return zlib.decompress(data, wbits=-zlib.MAX_WBITS)JavaScript (Node.js)
const zlib = require('zlib');
// GZip format
const gzipData = zlib.gzipSync(Buffer.from('Hello'));
const original = zlib.gunzipSync(gzipData);
// Zlib format
const zlibData = zlib.deflateSync(Buffer.from('Hello'));
const original = zlib.inflateSync(zlibData);
// Raw DEFLATE
const deflateData = zlib.deflateRawSync(Buffer.from('Hello'));
const original = zlib.inflateRawSync(deflateData);Java
import java.util.zip.*;
// Zlib format (default)
Deflater deflater = new Deflater();
deflater.setInput(data);
deflater.finish();
byte[] zlibData = new byte[1024];
int size = deflater.deflate(zlibData);
// Raw DEFLATE
Deflater deflater = new Deflater(Deflater.DEFAULT_COMPRESSION, true); // nowrap=true
// ... same as above
// GZip format
ByteArrayOutputStream baos = new ByteArrayOutputStream();
GZIPOutputStream gzos = new GZIPOutputStream(baos);
gzos.write(data);
gzos.close();
byte[] gzipData = baos.toByteArray();HTTP Content-Encoding Confusion
The HTTP Content-Encoding: deflate header is ambiguous. According to RFC 2616, it should use Zlib format, but many implementations use raw DEFLATE instead.
What Browsers Expect
Content-Encoding: gzip— GZip format (most common, best supported)Content-Encoding: deflate— Zlib format (RFC spec) OR raw DEFLATE (common bug)Content-Encoding: br— Brotli (newer, better compression)
Best Practice for HTTP
Always use GZip for HTTP compression. It's universally supported, clearly defined, and avoids the deflate ambiguity.
# Nginx configuration
gzip on;
gzip_types text/plain text/css application/json application/javascript;
# Apache configuration
<IfModule mod_deflate.c>
AddOutputFilterByType DEFLATE text/html text/plain text/xml
</IfModule>Detecting Compression Format
You can detect the format by inspecting the first few bytes:
function detectFormat(data) {
if (data.length < 2) return 'unknown';
// Check GZip magic number
if (data[0] === 0x1f && data[1] === 0x8b) {
return 'gzip';
}
// Check Zlib header
// CMF byte: 0x78 is common (DEFLATE with 32K window)
// FLG byte varies but header checksum must be valid
if (data[0] === 0x78 && (data[0] * 256 + data[1]) % 31 === 0) {
return 'zlib';
}
// Might be raw DEFLATE (no reliable detection)
return 'raw-deflate (or unknown)';
}
// Examples
detectFormat([0x1f, 0x8b, 0x08, ...]); // 'gzip'
detectFormat([0x78, 0x9c, ...]); // 'zlib'
detectFormat([0x78, 0xda, ...]); // 'zlib' (higher compression)
detectFormat([0xf3, 0x48, ...]); // 'raw-deflate (or unknown)'Compression Levels and Trade-offs
All three formats support compression levels (typically 0-9 or 1-9, where higher = better compression but slower):
- Level 0: No compression (store only) — fast but no space savings
- Level 1-3: Fast compression — good for real-time data
- Level 6: Default — balanced speed and compression
- Level 9: Maximum compression — slow but best ratio
Compression Ratio Comparison
For typical text data (like JSON or HTML):
- GZip level 6: ~70-80% compression (3-5x smaller)
- GZip level 9: ~75-85% compression (slightly better, much slower)
- Brotli level 11: ~80-90% compression (best for static files)
Real-World Debugging Scenario
You're downloading compressed API responses and getting decompression errors. Here's how to debug:
Step 1: Capture the Raw Data
# Python example
import requests
response = requests.get('https://api.example.com/data',
headers={'Accept-Encoding': 'gzip'})
# Raw compressed data
compressed = response.content
# Check Content-Encoding header
print(response.headers.get('Content-Encoding')) # 'gzip' or 'deflate'
# Inspect first bytes
print(compressed[:10].hex()) # '1f8b08...' = GZipStep 2: Detect Format
# Check magic bytes
if compressed[:2] == b'\x1f\x8b':
print("Format: GZip")
data = gzip.decompress(compressed)
elif compressed[:1] == b'\x78':
print("Format: Zlib")
data = zlib.decompress(compressed)
else:
print("Format: Raw DEFLATE (or unknown)")
try:
data = zlib.decompress(compressed, wbits=-zlib.MAX_WBITS)
except:
print("Not raw DEFLATE either!")Step 3: Verify Decompression
# Check if decompressed data looks correct
print(f"Decompressed size: {len(data)} bytes")
print(f"First 100 chars: {data[:100]}")
# Try parsing as JSON if expected
import json
try:
parsed = json.loads(data)
print("Valid JSON!")
except:
print("Not valid JSON - might still be compressed or corrupted")Using Compression Tools
When working with compressed data, use tools that support all three formats:
- GZip Compressor/Decompressor — Compress and decompress text with GZip format
- Server-Side GZip Processor — High-performance compression for large files
The GZip tool on DevToys Pro supports:
- GZip, Zlib, and raw DEFLATE formats
- Automatic format detection
- Adjustable compression levels
- Base64 encoding for transport
- Error messages when decompression fails
Best Practices
- Use GZip for HTTP and file compression — most widely supported
- Use Zlib for embedded compression — PNG, PDF, internal data structures
- Avoid raw DEFLATE — no checksum makes corruption detection impossible
- Always verify checksums — detect corrupted data early
- Document your format choice — prevent integration issues
- Test with incompressible data — ensure your code handles expansion gracefully
- Consider Brotli for static files — better compression than GZip
Common Pitfalls
- Assuming "deflate" means raw DEFLATE — it should be Zlib but isn't always
- Not checking magic bytes — leads to cryptic decompression errors
- Ignoring checksum errors — corrupted data can cause security issues
- Using wrong window size — causes "invalid distance" errors
- Compressing already compressed data — wastes CPU and actually expands size
Key Takeaways
- GZip, Zlib, and raw DEFLATE all use the same DEFLATE algorithm
- GZip: 10-byte header + CRC-32, most common for files and HTTP
- Zlib: 2-byte header + Adler-32, used in PNG, PDF, and many libraries
- Raw DEFLATE: no headers or checksums, used in ZIP files
- Detect format by checking magic bytes: 0x1f8b = GZip, 0x78XX = Zlib
- HTTP "deflate" encoding is ambiguous — stick with GZip
- Always use checksums to detect corrupted data
Related Tools:
- GZip Compressor/Decompressor — Compress and decompress text with GZip
- Server-Side GZip Processor — High-performance compression for large files
- Base64 Encoder — Encode compressed data for transport