How to gzip Data in Memory Using Objective-C

March 2, 2009

I recently had to write a utility for compressing data in memory using the gzip format. Thankfully, there’s a C library called zlib you can use to do the actual compression (and thankfully you can link to libz.dylib on the iPhone). Using the library is not trivial, however, so I had to spend a day reading the zlib headers/documentation and also searching for examples from other developers (one example shared by Robbie Hanson was particularly helpful).

While Robbie’s example is great, I wanted something a bit more robust and easier to “plug in” to any existing project. As part of making it “plug ‘n play,” I also wanted to make it developer-friendly: if something goes wrong, the utility should be helpful in solving the problem instead of just exiting with a cryptic error code. That means adding a healthy amount of documentation and descriptive error message logging so that Joe Developer–who just wanted to copy and paste the utility into his project and move on–can quickly understand the code and the error message if problems come up.

Here’s an example of how you would use the class:

/**
 @file LFCGzipUtility.h
 @author Clint Harris (www.clintharris.net)

 Note: The code in this file has been commented so as to be compatible with
 Doxygen, a tool for automatically generating HTML-based documentation from
 source code. See http://www.doxygen.org for more info.
 */

#import <Foundation/Foundation.h>
#import "zlib.h"

@interface LFCGzipUtility : NSObject
{

}

/***************************************************************************//**
 Uses zlib to compress the given data. Note that gzip headers will be added so
 that the data can be easily decompressed using a tool like WinZip, gunzip, etc.

 Note: Special thanks to Robbie Hanson of Deusty Designs for sharing sample code
 showing how deflateInit2() can be used to make zlib generate a compressed file
 with gzip headers:
 http://deusty.blogspot.com/2007/07/gzip-compressiondecompression.html

 @param pUncompressedData memory buffer of bytes to compress
 @return Compressed data as an NSData object
 */
+(NSData*) gzipData: (NSData*)pUncompressedData;

@end
/**
 @file LFCGzipUtility.m
 @author Clint Harris (www.clintharris.net)

 Note: The code in this file has been commented so as to be compatible with
 Doxygen, a tool for automatically generating HTML-based documentation from
 source code. See http://www.doxygen.org for more info.
 */

#import "LFCGzipUtility.h"

@implementation LFCGzipUtility

/*******************************************************************************
 See header for documentation.
 */
+(NSData*) gzipData: (NSData*)pUncompressedData
{
	/*
	 Special thanks to Robbie Hanson of Deusty Designs for sharing sample code
	 showing how deflateInit2() can be used to make zlib generate a compressed
	 file with gzip headers:
	 http://deusty.blogspot.com/2007/07/gzip-compressiondecompression.html
	 */

	if (!pUncompressedData || [pUncompressedData length] == 0)
	{
		NSLog(@"%s: Error: Can't compress an empty or null NSData object.", __func__);
		return nil;
	}

	/* Before we can begin compressing (aka "deflating") data using the zlib
	 functions, we must initialize zlib. Normally this is done by calling the
	 deflateInit() function; in this case, however, we'll use deflateInit2() so
	 that the compressed data will have gzip headers. This will make it easy to
	 decompress the data later using a tool like gunzip, WinZip, etc.

	 deflateInit2() accepts many parameters, the first of which is a C struct of
	 type "z_stream" defined in zlib.h. The properties of this struct are used to
	 control how the compression algorithms work. z_stream is also used to
	 maintain pointers to the "input" and "output" byte buffers (next_in/out) as
	 well as information about how many bytes have been processed, how many are
	 left to process, etc. */
	z_stream zlibStreamStruct;
	zlibStreamStruct.zalloc    = Z_NULL; // Set zalloc, zfree, and opaque to Z_NULL so
	zlibStreamStruct.zfree     = Z_NULL; // that when we call deflateInit2 they will be
	zlibStreamStruct.opaque    = Z_NULL; // updated to use default allocation functions.
	zlibStreamStruct.total_out = 0; // Total number of output bytes produced so far
	zlibStreamStruct.next_in   = (Bytef*)[pUncompressedData bytes]; // Pointer to input bytes
	zlibStreamStruct.avail_in  = [pUncompressedData length]; // Number of input bytes left to process

	/* Initialize the zlib deflation (i.e. compression) internals with deflateInit2().
	 The parameters are as follows:

	 z_streamp strm - Pointer to a zstream struct
	 int level      - Compression level. Must be Z_DEFAULT_COMPRESSION, or between
	                  0 and 9: 1 gives best speed, 9 gives best compression, 0 gives
	                  no compression.
	 int method     - Compression method. Only method supported is "Z_DEFLATED".
	 int windowBits - Base two logarithm of the maximum window size (the size of
	                  the history buffer). It should be in the range 8..15. Add
	                  16 to windowBits to write a simple gzip header and trailer
	                  around the compressed data instead of a zlib wrapper. The
	                  gzip header will have no file name, no extra data, no comment,
	                  no modification time (set to zero), no header crc, and the
	                  operating system will be set to 255 (unknown).
	 int memLevel   - Amount of memory allocated for internal compression state.
	                  1 uses minimum memory but is slow and reduces compression
	                  ratio; 9 uses maximum memory for optimal speed. Default value
	                  is 8.
	 int strategy   - Used to tune the compression algorithm. Use the value
	                  Z_DEFAULT_STRATEGY for normal data, Z_FILTERED for data
	                  produced by a filter (or predictor), or Z_HUFFMAN_ONLY to
	                  force Huffman encoding only (no string match) */
    int initError = deflateInit2(&zlibStreamStruct, Z_DEFAULT_COMPRESSION, Z_DEFLATED, (15+16), 8, Z_DEFAULT_STRATEGY);
	if (initError != Z_OK)
	{
		NSString *errorMsg = nil;
		switch (initError)
		{
			case Z_STREAM_ERROR:
				errorMsg = @"Invalid parameter passed in to function.";
				break;
			case Z_MEM_ERROR:
				errorMsg = @"Insufficient memory.";
				break;
			case Z_VERSION_ERROR:
				errorMsg = @"The version of zlib.h and the version of the library linked do not match.";
				break;
			default:
				errorMsg = @"Unknown error code.";
				break;
		}
		NSLog(@"%s: deflateInit2() Error: \"%@\" Message: \"%s\"", __func__, errorMsg, zlibStreamStruct.msg);
		[errorMsg release];
		return nil;
	}

	// Create output memory buffer for compressed data. The zlib documentation states that
	// destination buffer size must be at least 0.1% larger than avail_in plus 12 bytes.
	NSMutableData *compressedData = [NSMutableData dataWithLength:[pUncompressedData length] * 1.01 + 12];

	int deflateStatus;
	do
	{
		// Store location where next byte should be put in next_out
		zlibStreamStruct.next_out = [compressedData mutableBytes] + zlibStreamStruct.total_out;

		// Calculate the amount of remaining free space in the output buffer
		// by subtracting the number of bytes that have been written so far
		// from the buffer's total capacity
		zlibStreamStruct.avail_out = [compressedData length] - zlibStreamStruct.total_out;

		/* deflate() compresses as much data as possible, and stops/returns when
		 the input buffer becomes empty or the output buffer becomes full. If
		 deflate() returns Z_OK, it means that there are more bytes left to
		 compress in the input buffer but the output buffer is full; the output
		 buffer should be expanded and deflate should be called again (i.e., the
		 loop should continue to rune). If deflate() returns Z_STREAM_END, the
		 end of the input stream was reached (i.e.g, all of the data has been
		 compressed) and the loop should stop. */
		deflateStatus = deflate(&zlibStreamStruct, Z_FINISH);

	} while ( deflateStatus == Z_OK );		

	// Check for zlib error and convert code to usable error message if appropriate
	if (deflateStatus != Z_STREAM_END)
	{
		NSString *errorMsg = nil;
		switch (deflateStatus)
		{
			case Z_ERRNO:
				errorMsg = @"Error occured while reading file.";
				break;
			case Z_STREAM_ERROR:
				errorMsg = @"The stream state was inconsistent (e.g., next_in or next_out was NULL).";
				break;
			case Z_DATA_ERROR:
				errorMsg = @"The deflate data was invalid or incomplete.";
				break;
			case Z_MEM_ERROR:
				errorMsg = @"Memory could not be allocated for processing.";
				break;
			case Z_BUF_ERROR:
				errorMsg = @"Ran out of output buffer for writing compressed bytes.";
				break;
			case Z_VERSION_ERROR:
				errorMsg = @"The version of zlib.h and the version of the library linked do not match.";
				break;
			default:
				errorMsg = @"Unknown error code.";
				break;
		}
		NSLog(@"%s: zlib error while attempting compression: \"%@\" Message: \"%s\"", __func__, errorMsg, zlibStreamStruct.msg);
		[errorMsg release];

		// Free data structures that were dynamically created for the stream.
		deflateEnd(&zlibStreamStruct);

		return nil;
	}
	// Free data structures that were dynamically created for the stream.
	deflateEnd(&zlibStreamStruct);
	[compressedData setLength: zlibStreamStruct.total_out];
	NSLog(@"%s: Compressed file from %d KB to %d KB", __func__, [pUncompressedData length]/1024, [compressedData length]/1024);

	return compressedData;
}

@end
Clint Harris is an independent software consultant living in Brooklyn, New York. He can be contacted directly at ten.sirrahtnilc@tnilc.

{ 8 comments… read them below or add one }

Mike Orlov 05.14.09 at 4:52 am

Clint,

Thank you very much. It’s the most useful info and very clear example (actually, just ‘take an go’) ever found by me on the subject!
It took me several days to investigate a lot of web-pages including Mac’s Dev center ones just to collect all my negative emotions on why such widely used thing is not properly settle by Mac’s guys with Objective-C library and has very restricted way to apply the gzip. I had to have gzip c-sources added into my project but so far has been experiencing some problems in run time. Now I’m in a pad-saddle again and can run further:) Thank you, again!

Casey 09.07.09 at 11:41 am

Hey,
I get this error when I try to compile, any ideas?

Undefined symbols:
“_deflate”, referenced from:
+[aViewController gzipData:] in aViewController.o
“_deflateInit2_”, referenced from:
+[aViewController gzipData:] in aViewController.o
“_deflateEnd”, referenced from:
+[aViewController gzipData:] in aViewController.o
+[aViewController gzipData:] in aViewController.o

I think it is because I am not installing zlib properly. How do you get zlib.dylib? All I can find is source code that has all the .h and .c files. I have copied them all into my Classes directory, but still get this error.

Clint 09.08.09 at 6:30 pm

@Casey: Yep, it sounds like you need to link to the libz library.

Try the following:

1. Select your main executable (under Targets) and hit Cmd+i
2. Select the General tab
3. Under Linked Libraries, click the “+” button
4. Scroll down and select libz.dylib, then click the Add button

Casey 09.12.09 at 12:47 am

But where do you download zlib.dylib from? All I can find it as is .h and .c files

kiamlaluno 12.17.09 at 5:45 pm

The zlib library is already installed on Mac OS X. It is under /usr/lib, which is a hidden directory, and you cannot see it using Finder (if you don’t use a tool that enables hidden directories view on Finder).

andy 12.21.09 at 7:22 pm

will this work with a zip archive?

thanks

andy

pwnified 01.10.10 at 12:13 am

I like the code, the only thing is the [errorMsg release] calls, you should remove those.

JohnB 02.01.10 at 7:26 pm

Sweet - this is very helpful!

I wonder about the reverse? I’m building iPhone stuff and imagine zipping stuff to make transfer over the net faster… but then I’d need to unzip when it comes back down… or is that likely to be too much for an iPhone’s processor? (In other words might it take longer to unzip than to just download the bits over the wireless network?)

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>