How to gzip Data in Memory Using Objective-C

by Clint on March 2, 2009

I recently had to write a utility for compressing data in memory using the gzip format. Thankfully, there’s a C library called zlib you can use to do the actual compression (and thankfully you can link to libz.dylib on the iPhone). Using the library is not trivial, however, so I had to spend a day reading the zlib headers/documentation and also searching for examples from other developers (one example shared by Robbie Hanson was particularly helpful).

While Robbie’s example is great, I wanted something a bit more robust and easier to “plug in” to any existing project. As part of making it “plug ‘n play,” I also wanted to make it developer-friendly: if something goes wrong, the utility should be helpful in solving the problem instead of just exiting with a cryptic error code. That means adding a healthy amount of documentation and descriptive error message logging so that Joe Developer–who just wanted to copy and paste the utility into his project and move on–can quickly understand the code and the error message if problems come up.

Here’s an example of how you would use the class:

/**
 @file LFCGzipUtility.h
 @author Clint Harris (clintharris.net)

 Note: The code in this file has been commented so as to be compatible with
 Doxygen, a tool for automatically generating HTML-based documentation from
 source code. See http://www.doxygen.org for more info.
 */

#import 
#import "zlib.h"

@interface LFCGzipUtility : NSObject
{

}

/***************************************************************************//**
 Uses zlib to compress the given data. Note that gzip headers will be added so
 that the data can be easily decompressed using a tool like WinZip, gunzip, etc.

 Note: Special thanks to Robbie Hanson of Deusty Designs for sharing sample code
 showing how deflateInit2() can be used to make zlib generate a compressed file
 with gzip headers:

http://deusty.blogspot.com/2007/07/gzip-compressiondecompression.html

 @param pUncompressedData memory buffer of bytes to compress
 @return Compressed data as an NSData object
 */
+(NSData*) gzipData: (NSData*)pUncompressedData;

@end
/**
 @file LFCGzipUtility.m
 @author Clint Harris (clintharris.net)

 Note: The code in this file has been commented so as to be compatible with
 Doxygen, a tool for automatically generating HTML-based documentation from
 source code. See http://www.doxygen.org for more info.
 */

#import "LFCGzipUtility.h"

@implementation LFCGzipUtility

/*******************************************************************************
 See header for documentation.
 */
+(NSData*) gzipData: (NSData*)pUncompressedData
{
        /*
         Special thanks to Robbie Hanson of Deusty Designs for sharing sample code
         showing how deflateInit2() can be used to make zlib generate a compressed
         file with gzip headers:

http://deusty.blogspot.com/2007/07/gzip-compressiondecompression.html

         */

        if (!pUncompressedData || [pUncompressedData length] == 0)
        {
                NSLog(@"%s: Error: Can't compress an empty or null NSData object.", __func__);
                return nil;
        }

        /* Before we can begin compressing (aka "deflating") data using the zlib
         functions, we must initialize zlib. Normally this is done by calling the
         deflateInit() function; in this case, however, we'll use deflateInit2() so
         that the compressed data will have gzip headers. This will make it easy to
         decompress the data later using a tool like gunzip, WinZip, etc.

         deflateInit2() accepts many parameters, the first of which is a C struct of
         type "z_stream" defined in zlib.h. The properties of this struct are used to
         control how the compression algorithms work. z_stream is also used to
         maintain pointers to the "input" and "output" byte buffers (next_in/out) as
         well as information about how many bytes have been processed, how many are
         left to process, etc. */
        z_stream zlibStreamStruct;
        zlibStreamStruct.zalloc    = Z_NULL; // Set zalloc, zfree, and opaque to Z_NULL so
        zlibStreamStruct.zfree     = Z_NULL; // that when we call deflateInit2 they will be
        zlibStreamStruct.opaque    = Z_NULL; // updated to use default allocation functions.
        zlibStreamStruct.total_out = 0; // Total number of output bytes produced so far
        zlibStreamStruct.next_in   = (Bytef*)[pUncompressedData bytes]; // Pointer to input bytes
        zlibStreamStruct.avail_in  = [pUncompressedData length]; // Number of input bytes left to process

        /* Initialize the zlib deflation (i.e. compression) internals with deflateInit2().
         The parameters are as follows:

         z_streamp strm - Pointer to a zstream struct
         int level      - Compression level. Must be Z_DEFAULT_COMPRESSION, or between
                          0 and 9: 1 gives best speed, 9 gives best compression, 0 gives
                          no compression.
         int method     - Compression method. Only method supported is "Z_DEFLATED".
         int windowBits - Base two logarithm of the maximum window size (the size of
                          the history buffer). It should be in the range 8..15. Add 
                          16 to windowBits to write a simple gzip header and trailer 
                          around the compressed data instead of a zlib wrapper. The 
                          gzip header will have no file name, no extra data, no comment, 
                          no modification time (set to zero), no header crc, and the 
                          operating system will be set to 255 (unknown). 
         int memLevel   - Amount of memory allocated for internal compression state.
                          1 uses minimum memory but is slow and reduces compression
                          ratio; 9 uses maximum memory for optimal speed. Default value
                          is 8.
         int strategy   - Used to tune the compression algorithm. Use the value
                          Z_DEFAULT_STRATEGY for normal data, Z_FILTERED for data
                          produced by a filter (or predictor), or Z_HUFFMAN_ONLY to
                          force Huffman encoding only (no string match) */
    int initError = deflateInit2(&zlibStreamStruct, Z_DEFAULT_COMPRESSION, Z_DEFLATED, (15+16), 8, Z_DEFAULT_STRATEGY);
        if (initError != Z_OK)
        {
                NSString *errorMsg = nil;
                switch (initError)
                {
                        case Z_STREAM_ERROR:
                                errorMsg = @"Invalid parameter passed in to function.";
                                break;
                        case Z_MEM_ERROR:
                                errorMsg = @"Insufficient memory.";
                                break;
                        case Z_VERSION_ERROR:
                                errorMsg = @"The version of zlib.h and the version of the library linked do not match.";
                                break;
                        default:
                                errorMsg = @"Unknown error code.";
                                break;
                }
                NSLog(@"%s: deflateInit2() Error: \"%@\" Message: \"%s\"", __func__, errorMsg, zlibStreamStruct.msg);
                [errorMsg release];
                return nil;
        }

        // Create output memory buffer for compressed data. The zlib documentation states that
        // destination buffer size must be at least 0.1% larger than avail_in plus 12 bytes.
        NSMutableData *compressedData = [NSMutableData dataWithLength:[pUncompressedData length] * 1.01 + 12];

        int deflateStatus;
        do
        {
                // Store location where next byte should be put in next_out
                zlibStreamStruct.next_out = [compressedData mutableBytes] + zlibStreamStruct.total_out;

                // Calculate the amount of remaining free space in the output buffer
                // by subtracting the number of bytes that have been written so far
                // from the buffer's total capacity
                zlibStreamStruct.avail_out = [compressedData length] - zlibStreamStruct.total_out;

                /* deflate() compresses as much data as possible, and stops/returns when
                 the input buffer becomes empty or the output buffer becomes full. If
                 deflate() returns Z_OK, it means that there are more bytes left to
                 compress in the input buffer but the output buffer is full; the output
                 buffer should be expanded and deflate should be called again (i.e., the
                 loop should continue to rune). If deflate() returns Z_STREAM_END, the
                 end of the input stream was reached (i.e.g, all of the data has been
                 compressed) and the loop should stop. */
                deflateStatus = deflate(&zlibStreamStruct, Z_FINISH);

        } while ( deflateStatus == Z_OK );              

        // Check for zlib error and convert code to usable error message if appropriate
        if (deflateStatus != Z_STREAM_END)
        {
                NSString *errorMsg = nil;
                switch (deflateStatus)
                {
                        case Z_ERRNO:
                                errorMsg = @"Error occured while reading file.";
                                break;
                        case Z_STREAM_ERROR:
                                errorMsg = @"The stream state was inconsistent (e.g., next_in or next_out was NULL).";
                                break;
                        case Z_DATA_ERROR:
                                errorMsg = @"The deflate data was invalid or incomplete.";
                                break;
                        case Z_MEM_ERROR:
                                errorMsg = @"Memory could not be allocated for processing.";
                                break;
                        case Z_BUF_ERROR:
                                errorMsg = @"Ran out of output buffer for writing compressed bytes.";
                                break;
                        case Z_VERSION_ERROR:
                                errorMsg = @"The version of zlib.h and the version of the library linked do not match.";
                                break;
                        default:
                                errorMsg = @"Unknown error code.";
                                break;
                }
                NSLog(@"%s: zlib error while attempting compression: \"%@\" Message: \"%s\"", __func__, errorMsg, zlibStreamStruct.msg);
                [errorMsg release];

                // Free data structures that were dynamically created for the stream.
                deflateEnd(&zlibStreamStruct);

                return nil;
        }
        // Free data structures that were dynamically created for the stream.
        deflateEnd(&zlibStreamStruct);
        [compressedData setLength: zlibStreamStruct.total_out];
        NSLog(@"%s: Compressed file from %d KB to %d KB", __func__, [pUncompressedData length]/1024, [compressedData length]/1024);

        return compressedData;
}

@end
Clint Harris is an independent software consultant living in Brooklyn, New York. He can be contacted directly at ten.sirrahtnilc@tnilc.
  • Mike Orlov

    Clint,

    Thank you very much. It’s the most useful info and very clear example (actually, just ‘take an go’) ever found by me on the subject!
    It took me several days to investigate a lot of web-pages including Mac’s Dev center ones just to collect all my negative emotions on why such widely used thing is not properly settle by Mac’s guys with Objective-C library and has very restricted way to apply the gzip. I had to have gzip c-sources added into my project but so far has been experiencing some problems in run time. Now I’m in a pad-saddle again and can run further:) Thank you, again!

  • Casey

    Hey,
    I get this error when I try to compile, any ideas?

    Undefined symbols:
    “_deflate”, referenced from:
    +[aViewController gzipData:] in aViewController.o
    “_deflateInit2_”, referenced from:
    +[aViewController gzipData:] in aViewController.o
    “_deflateEnd”, referenced from:
    +[aViewController gzipData:] in aViewController.o
    +[aViewController gzipData:] in aViewController.o

    I think it is because I am not installing zlib properly. How do you get zlib.dylib? All I can find is source code that has all the .h and .c files. I have copied them all into my Classes directory, but still get this error.

  • https://clintharris.net Clint

    @Casey: Yep, it sounds like you need to link to the libz library.

    Try the following:

    1. Select your main executable (under Targets) and hit Cmd+i
    2. Select the General tab
    3. Under Linked Libraries, click the “+” button
    4. Scroll down and select libz.dylib, then click the Add button

  • Casey

    But where do you download zlib.dylib from? All I can find it as is .h and .c files

  • kiamlaluno

    The zlib library is already installed on Mac OS X. It is under /usr/lib, which is a hidden directory, and you cannot see it using Finder (if you don’t use a tool that enables hidden directories view on Finder).

  • andy

    will this work with a zip archive?

    thanks

    andy

  • http://www.harmonicdog.com pwnified

    I like the code, the only thing is the [errorMsg release] calls, you should remove those.

  • JohnB

    Sweet – this is very helpful!

    I wonder about the reverse? I’m building iPhone stuff and imagine zipping stuff to make transfer over the net faster… but then I’d need to unzip when it comes back down… or is that likely to be too much for an iPhone’s processor? (In other words might it take longer to unzip than to just download the bits over the wireless network?)

  • Pete

    this is absolutely excellent work. saved me a huge headache.

  • Tyrone

    Hi, This is very useful for me. Thanks.
    I am a newbie. I got the code to compile. I am hitting “Ran out of output buffer for writing compressed bytes.” error. I create my data as follows before calling the zip method:
    NSString *msg = [[NSString alloc] initWithString:@”hello”];
    NSData* messageData = [msg dataUsingEncoding:NSUTF8StringEncoding];
    Any help is appreciated.

  • Michael

    Is there an example on how to compress NSData objects?

  • JT

    Hi, this is great!
    may I know what is the license for this code? is it a public domain?
    Thanks

  • https://clintharris.net Clint

    @JT: Do whatever you want with it! Thanks for the feedback.

  • Pingback: Quora

  • Matt

    I had issues initially with this code when I was going cross platform (.NET XML web service to iOS… the .NET XML web service worked with other platforms including BlackBerry, etc.).

    At any rate, I am sending compressed data back and forth from each of these, and I used this implementation of gzip to deflate a string as a guide. My process includes deflating, 64 bit encoding, then sending. With several implementations of a gzip inflate, I could not get compressed data out of this function to work well.

    However, the solution is as follows… the length needs to be tagged on to the resulting bytes. I added the code at the very bottom of this function above … might be a little rough here, but I’m under the gun to get a prototype out:


    // Free data structures that were dynamically created for the stream.
    deflateEnd(&zlibStreamStruct);
    [compressedData setLength: zlibStreamStruct.total_out];
    NSLog(@"%s: Compressed file from %d KB to %d KB", __func__, [pUncompressedData length]/1024, [compressedData length]/1024);

    //add the length bytes to the compressed data...
    NSUInteger len = [compressedData length];
    Byte *byteData = (Byte*)malloc(len);
    memcpy(byteData, [compressedData bytes], len);

    compressedData = [[NSMutableData alloc] init];
    [compressedData setLength:len + 4];

    Byte *compressedBytes = (Byte*)[compressedData bytes];
    memcpy(compressedBytes + 4, byteData, len);

    len = [pUncompressedData length];
    Byte *byteLength = (Byte*)malloc(4);
    byteLength[0] = (len & 0xff);
    byteLength[1] = (len >> 8) & 0xff;
    byteLength[2] = (len >> 16) & 0xff;
    byteLength[3] = (len >> 24) & 0xff;

    memcpy(compressedBytes, byteLength, 4);

    free(byteData);
    free(byteLength);

    return compressedData;

  • suresh kumar

    While trying to use the above mentioned source, I’m getting the Z_STREAM_ERROR while compressing the data.

    Z_STREAM_ERROR if level is not a valid compression level.

  • Anonymous

    add that in your libraries in your project. Target -> build phases -> link Binary with libraries

  • jianjian

    thanks, could you please give the compress wrapper also?

  • hex

    Thank you very much. This article is the only place on whole internet where I’ve found really meaningful information about that specific problem. For some reason everybody’s intereting in decompression exclusively))

  • http://www.sourcekode.in Karan vora

    Finally a solution that works thanks alot Clint