Insufficiently Random

The lonely musings of a loosely connected software developer.

Tuesday, July 8, 2008

Using jgit To Publish on Amazon S3

Recent versions of jgit, the 100% pure Java implementation of the Git version control system, support fetch and push directly over Amazon S3 .

It behaves like http push does in C git in that it is transparent to the end-user. Transparent client-side encryption can also be enabled, in case the repository data must be protected from the operators of S3.

First you need to create a bucket using some sort of standard S3 tools. I used jets3t's cockpit tool to create "gitney". A bucket may hold any number of repositories and acts as a root directory. It may also be a domain name if you want to use S3 based virtual hosting.

Next you need to create a properties file containing your AWSAccessKeyId and AWSSecretAccessKey so that jgit can authenticate itself with the S3 service. Since the AWSSecretAccessKey should be maintained privately its a good idea to store this in a protected file within your home directory.


$ touch ~/.jgit_s3_public
$ chmod 600 ~/.jgit_s3_public
$ cat >>~/.jgit_s3_public
accesskey: AWSAccessKeyId
secretkey: AWSSecretAccessKey
acl: public
EOF


We also include acl: public so all objects (files) created by jgit through this configuration file are readable by anyone. The default (if not specified) is acl: private, making the objects readable only by yourself, and those who manage the S3 service.

Next we configure the remote in Git and push to the S3 bucket:

$ git remote add s3 amazon-s3://.jgit_s3_public@gitney/projects/egit.git/
$ jgit push s3 refs/heads/master
$ jgit push --tags s3


Future updates are just as easy:

$ jgit push s3 refs/heads/master

(or)

$ git config --add remote.s3.push refs/heads/master
$ jgit push s3

Pushes are always incremental and consequently there is relatively little bandwidth usage during subsequent pushes.

Our repository is now cloneable directly over HTTP (assuming we used acl: public):

$ git clone http://gitney.s3.amazonaws.com/projects/egit.git


A jgit amazon-s3 URL is organized as:

amazon-s3://$config@$bucket/$prefix
http://$bucket.s3.amazonaws.com/$prefix

where the three major components are:

  • $config is the name of the configuration properties file stored in $GIT_DIR/$config or $HOME/$config (searched for in that order).

  • $bucket is the name of the Amazon S3 bucket holding the objects.

  • $prefix is the prefix to apply to all objects (files) within this repository. It implicitly ends in "/". You may omit this portion of the URI if you want the bucket to contain only one repository.


This is something of an abuse of URI syntax as the traditional username field is holding the name of a file in either $GIT_DIR or $HOME, however it permits hiding the secret access key from prying eyes as well as supplies a way to carry more information (such as acl or encryption settings) than what can appear in a URI.

Transparent client-side encryption for a repository stored on S3 can be enabled by adding a password to the properties file:

$ cp ~/.jgit_s3_public ~/.jgit_s3_private
$ echo password: Sup3rS3cr3t >>~/.jgit_s3_private

and using .jgit_s3_private in the $config field of an amazon-s3:// URL. The encryption algorithm can also be specified in property crypto.algorithm, which defaults to PBEWithMD5AndDES.

The encryption format currently used by jgit matches the format used by jets3t (specifically format version 2), making it possible to download and decrypt a repository through cockpit in the event that jgit is not readily available.

7 comments :

Guillermo said...

Hi. I've been trying to install the git eclipse plugin in my eclipse but I'm in trouble compiling the package org.spearce.jgit. I've eight errors that don't allow to be built projects. These is the eclipse output:
The method getRequestorType() is undefined for the type AwtAuthenticator AwtAuthenticator.java org.spearce.jgit/src/org/spearce/jgit/awtui line 94 Java Problem
The method getRequestorType() is undefined for the type AwtAuthenticator AwtAuthenticator.java org.spearce.jgit/src/org/spearce/jgit/awtui line 95 Java Problem
The method getRequestingURL() is undefined for the type AwtAuthenticator AwtAuthenticator.java org.spearce.jgit/src/org/spearce/jgit/awtui line 103 Java Problem
The method openConnection() in the type URL is not applicable for the arguments (Proxy) TransportHttp.java org.spearce.jgit/src/org/spearce/jgit/transport line 179 Java Problem
The method setFixedLengthStreamingMode(int) is undefined for the type HttpURLConnection AmazonS3.java org.spearce.jgit/src/org/spearce/jgit/transport line 396 Java Problem
The method setFixedLengthStreamingMode(int) is undefined for the type HttpURLConnection AmazonS3.java org.spearce.jgit/src/org/spearce/jgit/transport line 482 Java Problem
The method openConnection() in the type URL is not applicable for the arguments (Proxy) AmazonS3.java org.spearce.jgit/src/org/spearce/jgit/transport line 565 Java Problem

My HEAD is at: da33dee3a7b93370f50cb5134d7f4fa49ce2e3bf

Thanks :-)

spearce said...

These errors seem to be on APIs that were introduced in Java 5.

Check that your projects are pointing to a Java 5 or Java 6 JRE. I suspect they are configured to point to a Java 1.4 JRE. Generally we change this setting in the workspace, and allow the projects to inherit the workspace setting.

Git on S3 | Web Initiative said...

[...] author of jgit has implemented a Amazon S3 protocol to support git fetch and git push on a S3 bucket. Anyone who compiled jgit on [...]

jc said...

That's great, however jgit's installation is completely retarded. Why require eclipse? Anyways, the whole thing is a mess of java nonsense. I'm sure there's a better way.

spearce said...

Actually, there is a script in the top level directory: ./make_jgit.sh

Run that script to compile with plain old javac, and avoid Eclipse entirely.

Troy Hakala said...

I can't find a better way to contact you...

I'm trying to use jgit (v0.5.0) to push a repository to S3 and I get the following. Any ideas how to fix this?:

Counting objects: 8430
Compressing objects: 100% (8430/8430)
Writing objects: 99% (8346/8430)java.lang.IllegalArgumentException: invalid content length
at java.net.HttpURLConnection.setFixedLengthStreamingMode(HttpURLConnection.java:109)
at org.spearce.jgit.transport.AmazonS3.putImpl(AmazonS3.java:482)
at org.spearce.jgit.transport.AmazonS3.access$000(AmazonS3.java:106)
at org.spearce.jgit.transport.AmazonS3$1.close(AmazonS3.java:453)
at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
at org.spearce.jgit.transport.WalkPushConnection.sendpack(WalkPushConnection.java:249)
at org.spearce.jgit.transport.WalkPushConnection.push(WalkPushConnection.java:155)
at org.spearce.jgit.transport.PushProcess.execute(PushProcess.java:127)
at org.spearce.jgit.transport.Transport.push(Transport.java:866)
at org.spearce.jgit.pgm.Push.run(Push.java:124)
at org.spearce.jgit.pgm.TextBuiltin.execute(TextBuiltin.java:131)
at org.spearce.jgit.pgm.Main.execute(Main.java:159)
at org.spearce.jgit.pgm.Main.main(Main.java:84)

spearce said...

For the record, my email address is on the top right in the sidebar, and the JGit project can be reached through its mailing list at https://dev.eclipse.org/mailman/listinfo/jgit-dev.

Its strange we got an invalid content length. Is it possible the repository is very large, like over 2 GiB, and thus the content length might have wrapped around on us to be negative here? If I recall that section of code, we probably don't handle anything over 2 GiB in a single push because the setFixedLengthStreamingMode method takes a signed int for its argument.

Post a Comment