Skip to content

rahulsingh71/compress

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

compress

This package is based on an optimized Deflate function, which is used by gzip/zip/zlib packages.

It offers slightly better compression at lower compression settings, and up to 3x faster encoding at highest compression level.

Build Status

usage

The packages are drop-in replacements for standard libraries. Simply replace the import path to use them:

old import new import
compress/gzip github.com/klauspost/compress/gzip
compress/zlib github.com/klauspost/compress/zlib
archive/zip github.com/klauspost/compress/zip
compress/deflate github.com/klauspost/compress/deflate

You may also be interested in pgzip, which is a drop in replacement for gzip, which support multithreaded compression on big files and the optimized crc32 package used by these packages.

The packages contains the same as the standard library, so you can use the godoc for that: gzip, zip, zlib, flate.

Currently there is only minor speedup on decompression (primarily CRC32 calculation).

deflate optimizations

  • Minimum matches are 4 bytes, this leads to fewer searches and better compression.
  • Stronger hash (iSCSI CRC32) for matches on x64 with SSE 4.2 support. This leads to fewer hash collisions.
  • Literal byte matching using SSE 4.2 for faster string comparisons.
  • Bulk hashing on matches.
  • Much faster dictionary indexing with NewWriterDict()/Reset().
  • Make Bit Coder faster by assuming we are on a 64 bit CPU.
BenchmarkEncodeDigitsSpeed1e4        571065        571799        +0.13%
BenchmarkEncodeDigitsSpeed1e5        3680010       4645932       +26.25%
BenchmarkEncodeDigitsSpeed1e6        34667982      45532604      +31.34%
BenchmarkEncodeDigitsDefault1e4      770694        619535        -19.61%
BenchmarkEncodeDigitsDefault1e5      13682782      6032845       -55.91%
BenchmarkEncodeDigitsDefault1e6      152778738     61443514      -59.78%
BenchmarkEncodeDigitsCompress1e4     771094        620635        -19.51%
BenchmarkEncodeDigitsCompress1e5     13683782      5999343       -56.16%
BenchmarkEncodeDigitsCompress1e6     152648731     61228502      -59.89%
BenchmarkEncodeTwainSpeed1e4         595100        570165        -4.19%
BenchmarkEncodeTwainSpeed1e5         3432796       3376593       -1.64%
BenchmarkEncodeTwainSpeed1e6         31573806      30687755      -2.81%
BenchmarkEncodeTwainDefault1e4       828697        674388        -18.62%
BenchmarkEncodeTwainDefault1e5       11572161      6733885       -41.81%
BenchmarkEncodeTwainDefault1e6       122607013     68998946      -43.72%
BenchmarkEncodeTwainCompress1e4      833297        679738        -18.43%
BenchmarkEncodeTwainCompress1e5      14539831      7372921       -49.29%
BenchmarkEncodeTwainCompress1e6      160019152     77099410      -51.82%

benchmark                            old MB/s     new MB/s     speedup
BenchmarkEncodeDigitsSpeed1e4        17.51        17.49        1.00x
BenchmarkEncodeDigitsSpeed1e5        27.17        21.52        0.79x
BenchmarkEncodeDigitsSpeed1e6        28.85        21.96        0.76x
BenchmarkEncodeDigitsDefault1e4      12.98        16.14        1.24x
BenchmarkEncodeDigitsDefault1e5      7.31         16.58        2.27x
BenchmarkEncodeDigitsDefault1e6      6.55         16.28        2.49x
BenchmarkEncodeDigitsCompress1e4     12.97        16.11        1.24x
BenchmarkEncodeDigitsCompress1e5     7.31         16.67        2.28x
BenchmarkEncodeDigitsCompress1e6     6.55         16.33        2.49x
BenchmarkEncodeTwainSpeed1e4         16.80        17.54        1.04x
BenchmarkEncodeTwainSpeed1e5         29.13        29.62        1.02x
BenchmarkEncodeTwainSpeed1e6         31.67        32.59        1.03x
BenchmarkEncodeTwainDefault1e4       12.07        14.83        1.23x
BenchmarkEncodeTwainDefault1e5       8.64         14.85        1.72x
BenchmarkEncodeTwainDefault1e6       8.16         14.49        1.78x
BenchmarkEncodeTwainCompress1e4      12.00        14.71        1.23x
BenchmarkEncodeTwainCompress1e5      6.88         13.56        1.97x
BenchmarkEncodeTwainCompress1e6      6.25         12.97        2.08x
  • "Speed" is compression level 1
  • "Default" is compression level 6
  • "Compress" is compression level 9
  • Test files are Digits (no matches) and Twain (plain text) .

As can be seen speed on low-matching souces Digits are a tiny bit slower at compression level 1, but for default compression it shows a very good speedup.

Twain is a much more realistic benchmark, and will be closer to JSON/HTML performance. Here speed is equivalent or faster, up to 2 times.

Without assembly. This is what you can expect on systems that does not have amd64 and SSE 4:

benchmark                            old ns/op     new ns/op     delta
BenchmarkEncodeDigitsSpeed1e4        571065        647787        +13.43%
BenchmarkEncodeDigitsSpeed1e5        3680010       5925338       +61.01%
BenchmarkEncodeDigitsSpeed1e6        34667982      59040043      +70.30%
BenchmarkEncodeDigitsDefault1e4      770694        723391        -6.14%
BenchmarkEncodeDigitsDefault1e5      13682782      9633051       -29.60%
BenchmarkEncodeDigitsDefault1e6      152778738     102595868     -32.85%
BenchmarkEncodeDigitsCompress1e4     771094        724141        -6.09%
BenchmarkEncodeDigitsCompress1e5     13683782      9589048       -29.92%
BenchmarkEncodeDigitsCompress1e6     152648731     102295851     -32.99%
BenchmarkEncodeTwainSpeed1e4         595100        620835        +4.32%
BenchmarkEncodeTwainSpeed1e5         3432796       4013029       +16.90%
BenchmarkEncodeTwainSpeed1e6         31573806      37160125      +17.69%
BenchmarkEncodeTwainDefault1e4       828697        774044        -6.60%
BenchmarkEncodeTwainDefault1e5       11572161      9537045       -17.59%
BenchmarkEncodeTwainDefault1e6       122607013     99745705      -18.65%
BenchmarkEncodeTwainCompress1e4      833297        784094        -5.90%
BenchmarkEncodeTwainCompress1e5      14539831      10679610      -26.55%
BenchmarkEncodeTwainCompress1e6      160019152     113616498     -29.00%

benchmark                            old MB/s     new MB/s     speedup
BenchmarkEncodeDigitsSpeed1e4        17.51        15.44        0.88x
BenchmarkEncodeDigitsSpeed1e5        27.17        16.88        0.62x
BenchmarkEncodeDigitsSpeed1e6        28.85        16.94        0.59x
BenchmarkEncodeDigitsDefault1e4      12.98        13.82        1.06x
BenchmarkEncodeDigitsDefault1e5      7.31         10.38        1.42x
BenchmarkEncodeDigitsDefault1e6      6.55         9.75         1.49x
BenchmarkEncodeDigitsCompress1e4     12.97        13.81        1.06x
BenchmarkEncodeDigitsCompress1e5     7.31         10.43        1.43x
BenchmarkEncodeDigitsCompress1e6     6.55         9.78         1.49x
BenchmarkEncodeTwainSpeed1e4         16.80        16.11        0.96x
BenchmarkEncodeTwainSpeed1e5         29.13        24.92        0.86x
BenchmarkEncodeTwainSpeed1e6         31.67        26.91        0.85x
BenchmarkEncodeTwainDefault1e4       12.07        12.92        1.07x
BenchmarkEncodeTwainDefault1e5       8.64         10.49        1.21x
BenchmarkEncodeTwainDefault1e6       8.16         10.03        1.23x
BenchmarkEncodeTwainCompress1e4      12.00        12.75        1.06x
BenchmarkEncodeTwainCompress1e5      6.88         9.36         1.36x
BenchmarkEncodeTwainCompress1e6      6.25         8.80         1.41x

Compression level

This table shows the compression at each level, and the percentage of the output size compared to output at the similar level with the standard library. Compression data is Twain, see above.

Level Bytes % size
1 180539 96.24%
2 174684 96.85%
3 170301 98.45%
4 165253 97.69%
5 161274 98.65%
6 160464 99.71%
7 160304 99.87%
8 160279 99.99%
9 160279 99.99%

To interpret and example, this version of deflate compresses input of 407287 bytes to 180539 bytes at level 1, which is 96% of the size of what the standard library produces; 187563 bytes.

This means that from level 1-5 you can expect a compression level increase of a few percent.

gzip/zip optimizations

  • Uses the faster deflate
  • Uses SSE 4.2 CRC32 calculations.

Speed increase is up to 3x of the standard library, but usually around 30%. Without SSE 4.2, speed is roughly equivalent, but compression should be slightly better.

This is close to a real world benchmark as you will get. A 2.3MB JSON file.

benchmark           old ns/op     new ns/op     delta
BenchmarkGzipL1     95035436      71914113      -24.33%
BenchmarkGzipL2     100665758     74774276      -25.72%
BenchmarkGzipL3     111666387     80764620      -27.67%
BenchmarkGzipL4     141848114     101145785     -28.69%
BenchmarkGzipL5     185630618     127187274     -31.48%
BenchmarkGzipL6     207511870     137047840     -33.96%
BenchmarkGzipL7     265115163     183970522     -30.61%
BenchmarkGzipL8     454926020     348619940     -23.37%
BenchmarkGzipL9     488327935     377671600     -22.66%

benchmark           old MB/s     new MB/s     speedup
BenchmarkGzipL1     52.21        69.00        1.32x
BenchmarkGzipL2     49.29        66.36        1.35x
BenchmarkGzipL3     44.43        61.43        1.38x
BenchmarkGzipL4     34.98        49.06        1.40x
BenchmarkGzipL5     26.73        39.01        1.46x
BenchmarkGzipL6     23.91        36.20        1.51x
BenchmarkGzipL7     18.72        26.97        1.44x
BenchmarkGzipL8     10.91        14.23        1.30x
BenchmarkGzipL9     10.16        13.14        1.29x

Multithreaded compression using pgzip comparison, Quadcore, CPU = 8:

benchmark           old ns/op     new ns/op     delta
BenchmarkGzipL1     95035436      30381737      -68.03%
BenchmarkGzipL2     100665758     31341793      -68.87%
BenchmarkGzipL3     111666387     32891881      -70.54%
BenchmarkGzipL4     141848114     41767389      -70.55%
BenchmarkGzipL5     185630618     47742730      -74.28%
BenchmarkGzipL6     207511870     50272875      -75.77%
BenchmarkGzipL7     265115163     62693586      -76.35%
BenchmarkGzipL8     454926020     107436145     -76.38%
BenchmarkGzipL9     488327935     114066524     -76.64%

benchmark           old MB/s     new MB/s     speedup
BenchmarkGzipL1     52.21        163.31       3.13x
BenchmarkGzipL2     49.29        158.31       3.21x
BenchmarkGzipL3     44.43        150.85       3.40x
BenchmarkGzipL4     34.98        118.80       3.40x
BenchmarkGzipL5     26.73        103.93       3.89x
BenchmarkGzipL6     23.91        98.70        4.13x
BenchmarkGzipL7     18.72        79.14        4.23x
BenchmarkGzipL8     10.91        46.18        4.23x
BenchmarkGzipL9     10.16        43.50        4.28x

About

Optimized compression packages

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 98.6%
  • Assembly 1.4%