RPM Delta Compression Documentation

Joe Desbonnet, joe@galway.net
Software version: 0.2.0, 28 Feb 2006

About this software

The problem:

Not long after a new Fedora Core release, the bandwidth required to update a fresh installation can become burdensome for low/medium bandwidth internet connections. Some update RPMs can be over 80 MBytes in size, yet the update may be almost identical to the original RPM on the distribution disk.

This software presents two solutions to administrators of Fedora computers:
  1. A HTTP service which acts as a virtual update RPM repository. It dynamically re-creates update RPMs by downloading a small 'patch' or 'delta' file and applying it to locally stored RPMs from the original distribution.
  2. A command line tool which can regenerate an update RPM repository from a delta repository + RPMs from the original distribution.

In addition to the virtual update repository, this software distribution also contains the tools needed to create and maintain a delta repository.

Changes since the last version

The most significant change since the last release is that all my own binary difference code has been removed. Instead I rely exclusively on the SUSE DeltaRPM tool by Michael Schroeder.

See release notes and ChangeLog for details on recent changes

Licensing

This software is released under the terms of the GPL v2 license. See LICENSE.txt in the distribution for details.

Installation

This software has four major dependencies:

Java and Tomcat RPMs are available in the Fedora Core and Extra repositories. Alternatively you can install from tarballs from Sun's Java site and Apache Jakarta Project site.

DeltaRPM is available from this FTP location ftp://ftp.suse.com/pub/projects/deltarpm/. Download and follow the make/install procedure (ie just 'make install'). This software has been tested with version 3.3 of DeltaRPM. This software will look for the DeltaRPM binaries in /bin, /usr/bin and /usr/local/bin.

Copy RPMDC.war into the Tomcat webapps directory. You may need to restart the Tomcat server.

The following instructions assume that your Tomcat server is running on http://localhost:8080/. Substitute with the appropriate host name and port number if this assumption is incorrect.

Open a browser on the home page: http://localhost:8080/RPMDC/. Click on server administration and set:

You can now test the operation of the server by going back to the home page and clicking on http://localhost:8080/RPMDC/repo/.

Finally edit the yum configuration on the computers that need to be updated. On Fedora Core 4, edit Edit /etc/yum.repos.d/fedora-updates.repo. In the 'updates-released' section of the file change the baseurl to

baseurl=http://localhost:8080/RPMDC/repo/pub/fedora/linux/core/$releasever/$basearch/os/
(ie replace 'download.fedora.redhat.com' with 'localhost:8080/RPMDC/repo').

Creating a delta repository

Creating a delta repository is not required for the operation of the virtual repository. This tool is bundled in the same distribution for convenience.

To create a delta repository you need the following:

The creation and maintenance of the delta repository is currently accomplished with a command line tool.

Copy repotool.jar to a convenient location. Set this directory to be your current working directory. Use the following command line:

java -jar repotool.jar   orig-dist-dir    updates-dir   delta-dir

Optional switches are:
--version Display software version and exit.
--help Display short usage help message and exit.
--temp-dir=dir A directory to use for temporary files. There must be enough space to expand a full RPM here. For Fedora Core 3 this can take over 600MBytes for OpenOffice RPMs. The temporary directory defaults to /var/tmp.
--no-clobber If the delta file already exists do not regenerate it.
--test-delta Test delta to ensure that recreated RPM is identical to update RPM
--latest-update-only If there is more than one update for a package, generate deltas for the latest update only.
--packages=[pkg1[,pkg2[,pkg3...]]] By default all packages in the updates directory are considered. This option allows only certain packages to be processed. Package names exclude version or release number (eg 'kernel', 'glibc-common').
--ignore-packages=[pkg1[,pkg2[,pkg3...]]] If set, any packages listed here will be skipped.
--max-rpm-size=n Do not process if source or target RPM exceeds this limit in bytes. This option is useful on computers with low memory. It is recommend that limit should be no greater than half the computer's RAM.
--min-saving=n Minimum saving in bytes that must be realized by using a delta vs downloading the update RPM. If the saving is lower than this threshold then the delta is discarded. Defaults to 20KBytes.

Example:

java -jar repotool.jar ./fc4dist ./fc4updates ./fc4deltas --latest-updates-only --no-clobber --ignore-packages=Omni,Omni-foomatic

Note: by default the repository builder will not save any deltas with compression of less than 20KBytes. This threshold can be changed with --min-saving=n (n in bytes).

Creating an update repository

There are two ways to recreate a update RPM repository from the deltas:

  1. If you have deployed RPMDC as a webapp you can use wget:
    wget --mirror http://localhost:8080/RPMDC/repo
    
  2. The second method is ideal for those who do not want to depoy a servlet container such as Tomcat. In addition to the original RPMS from the distribution CD/DVD you will need to download the delta repository. This can be achieved as follows:
    wget --mirror http://rpmdelta.wombat.ie/deltarepo
    

    Now create a new directory to hold your recreated update RPMs. You can now create an update RPM repository with this line:

    java -jar repotool.jar --deltas-to-updates distDir recreatedUpdatesDir deltasDir
    

    You can verify the integrety of the resulting RPMS with rpm --checksig *.rpm
[Note: script to download deltas and create repository is on the to-do list. Or if someone makes one, please email to me.]

Delta Repository Format

The target RPM is defined as the update RPM for which a request is made. Example mypackage-1.0-3.i386.rpm. The source RPM is defined as an RPM to which a delta is applied (usually an RPM in the original distribution, eg mypackage-1.0-1.i386.rpm).

For each target RPM a directory is created. The directory file name is the name of the target RPM with the ".rpm" suffix removed.

The target directory can hold zero or more deltas which can be applied to various source RPMs. A file called deltas.xml provides metadata about each delta in the directory.

Example: mypackage-1.0-1.i386.rpm in the original distribution. Two updates are released some time later: mypackage-1.0-2.i386.rpm and mypackage-1.0-3.i386.rpm. The repository files for these updates will be as follows:

The deltas.xml file looks like this:

<?xml version="1.0"?>
<delta-manifest version="0.2">
<delta 
	file="from_mypackage-1.0-1.i386.deltarpm"
	size="1234"
	algorithm="deltarpm" 
	source="mypackage-1.0-1.i386.rpm"
	target="mypackage-1.0-3.i386.rpm"
	target-size="456789"
/>
<delta 
	file="from_mypackage-1.0-2.i386.deltarpm"
	size="1211"
	algorithm="deltarpm"
	source="mypackage-1.0-2.i386.rpm"
	target="mypackage-1.0-3.i386.rpm"
	target-size="456789"
/>
</delta-manifest>

Testing

I have made a delta repository for the Fedora Core 4 distribution at http://rpmdelta.wombat.ie/deltarepo/fc4/i386/. This is provided without any reliability guarantees. If bandwidth consumption becomes and issue I may have to switch it off. If this software proves successful I hope that the delta repositories will be mirrored along with the distribution and updates.

For testing I recommend that you build your own repository. See section on building a repository.

To test you need to make a directory with all the FC4 RPMs from the original distribution. You can make this by copying /media/cdrom/Fedora/RPMS/* from the 4 distribution CDs or by downloading all the RPM files from a distribution mirror. For example, the following 'wget' command will download all distribution RPMS from download.fedora.redhat.com.

wget --mirror ftp://download.fedora.redhat.com/pub/fedora/linux/core/4/i386/os/Fedora/RPMS/

Please use your nearest mirror site if possible.

I found the VMWare server (http://www.vmware.com/) useful for testing. This software is free of charge, but is not open source. I was able to install a fresh FC4 installation and upgrade using this system (although I did have to run yum -y update several times before it was successful -- see note in release notes).

Development Plan

This software is in development and should be considered 'alpha' grade software. Protocols, file formats, schemas, repository layout and repository locations may change as the software matures.

There is no time table for future releases. If it proves useful I hope to release version 1.0 a few months after the release of FC5.

Any feedback will be greatly appreciated. In particular I'm looking for feedback on the nomenclature used ('delta' vs 'diff', 'proxy server' vs 'virtual repository' etc). Also the format of the delta repository and the schema of the delta.xml manifest file.

The current version has some obvious limitations which I plan to remove in future releases:

Related Projects