Experiments in RPM delta compression

Joe Desbonnet, 13 Mar 2005

A snapshot of the Fedora Core 3 (i386) updates directory was downloaded (approx date 10 Mar 2005). A script was written to make a repository of deltas (binary diffs) against the original RPMs in the FC3 distribution.

Two algorithms were utilized: xdelta [1] and jbdiff [2]. Before the algorithm is applied, the RPM payload is uncompressed. In general the bsdiff algorithm produces the smallest deltas, but is computationally expensive [3]. For large updates the xdelta algorithm is used instead.

From the table below, it can be seen that a vast majority of the update RPMs can be compressed by a significant amount. There are some exceptions: Omni-foomatic (delta 138% of RPM), boost-devel (jddiff, 132% of RPM), libofx-devel (jbdiff, 126% of RPM ), openoffice.org-libs (167% of RPM), ruby-tcltk (jbdiff, 118% of RPM), selinux-policy-strict (jbdiff, 197% of RPM), words (jbdiff, 174% of RPM).

The size of all all the deltas is 206MBytes. The size of all the update RPMs is 1070MBytes. Total time to create delta repository: approx 2 hours on 1.8GHz AMD64 with 1GB RAM.

update RPM size distribution RPM size algorithm delta size Saving (%)

[1] XDelta implementation used was javaxdelta

[2] The 'bsdiff' algorithm is described at http://www.daemonology.net/bsdiff/ The implemenation used was JBDiff a Java port of the bsdiff utility.

[3] The JBDiff implementation requires approx (n * 8 + m) memory where n is the source file size and m is the target file size. As memory access is very random the memory must be real memory.