Insufficiently Random

The lonely musings of a loosely connected software developer.

Friday, May 14, 2010

Don't Assume You Know What's Best

Yesterday I think we finally found the cause of Gerrit Code Review issue 390 and JGit bug 308945. In issue 390 a long-running Gerrit Code Review daemon suddenly loses access to objects in one or more Git repositories. The daemon's error log shows the server cannot find a commit, but git cat-file on the command line is able to read the same commit without errors. A restart of the daemon JVM always corrects the problem. So we knew the problem had to be data corruption within JGit's in-memory caches.

All along I've been looking for some sort of data corruption in the list of known pack files. The list is managed using volatiles and AtomicReferences, and does most atomic operations itself, rather than building upon the data structures offered by the java.util.concurrent package. Since I'm not Doug Lea, its entirely possible that there is a race or unsafe write within this code. Fortunately, this code appears to be OK. I've gone over it dozens of times and cannot find a logic fault.

Then along comes bug 308945, where JGit is reading from a closed pack file.

JGit opens each pack file once using a RandomAccessFile, and then uses the NIO API read(ByteBuffer, long) to execute a thread-safe pread(2) system call anytime it needs data from the file. This allows JGit to reuse the same file descriptor across multiple concurrent threads, reducing the number of times that it needs to check the pack file's header and footer. It also minimizes the number of open file descriptors required to service a given traffic load. To prevent the file from being closed by one thread while its being read by another, JGit keeps its own internal 'in-use' counter for each file, and only closes the file descriptor when this counter drops to 0. In theory, this should work well.

Right until we mixed in MINA SSHD, a pure Java SSH server. When a client disconnects unexpectedly, MINA appears to be sending an interrupt to the thread that last read data from the client connection. If that thread is currently inside of read(ByteBuffer,long), the read is interrupted, the file descriptor is closed, and an exception is thrown to the caller.

Wait, what? The file descriptor is closed?

And therein lies the bug. When I selected read(ByteBuffer,long) and this file descriptor reuse strategy for JGit, I failed to notice the documentation for throws ClosedByInterruptException. That oversight on my part lead to the misuse of an API, and an ugly race condition.

When MINA SSHD interrupts a working JGit thread at the right place, the current pack file gets closed, but JGit's in-use counter thinks its still open. Subsequent attempts to access that file all fail, because its closed. When JGit encounters a pack file that is failing to perform IO as expected, it removes the file from the in-memory pack file list, but leaves it alone on disk. JGit never picks up the pack file again, as the pack file list is only updated when the modification time on the GIT_DIR/objects/pack directory changes. Without the pack file in the list, its contained objects cannot be found, and they appear to just vanish from the repository.

I don't know what possessed the people who worked on JSR 51 (New I/O APIs for the JavaTM Platform) to think that closing a file descriptor automatically during an interrupt was a good idea. RandomAccessFile's own read method doesn't do this, but its associated FileChannel does. In my opinion, they might as well have just invoked a setuid root copy of /sbin/halt and powered off the host computer.

Wednesday, April 28, 2010

Why am I surprised when things work?

I recently purchased a Fujitsu ScanSnap S1500M. This isn't interesting, its a scanner. You plug it into your computer, and its supposed to make picture files from paper. Yay. We've had scanners for ages. Its not blog worthy.

What shocked me was, the damn thing does exactly what it says on the tin.

Load its sheet feeder up with paper, plug it into the computer's USB port. Push the only "Scan" button on the front. Next thing you know, there is a folder full of sequentially numbered JPEG files. It automatically detects the length of the paper. It scans double-sided at the same speed it scans single-sided. It automatically drops back sides which are completely blank. Pages narrower than 8.5" are correctly detected and scanned with a narrower image width. It goes through 20 pages per minute. That's fast enough that its done before you realize its started.

I realized after scanning several hundred pages in just a few minutes that very few things I purchase these days "just work". Most products still require a lot of tinkering from the user, or are still so complex that you need an advanced degree to operate them. This scanner, well, anyone's cat could use it. Just tap that scan button.

Most products require you to purchase additional stuff, e.g. cables, to get them to work. Fujitsu actually included a USB cable in the box. Just unpack, plug in, and go. Its hard to argue with that. Even my HD TiVo was harder to get setup and going.

To organize that directory of image files, I started using Brad Fitzpatrick's scanningcabinet application. Though I did make a few changes in my own scanningcabinet fork on GitHub. Now if only Google AppEngine supported full text search better...

Friday, April 23, 2010

Gerrit Code Review on FLOSS Weekly

On Wednesday I recorded a netcast for FLOSS Weekly with Randal Schwartz and Randi Harper about Gerrit Code Review, JGit, EGit, and Git in general. The video and audio versions of the netcast are now available.

It was fun recording the show. I don't usually do these sorts of things, I find talking to a laptop somewhat challenging conceptually. Its just a thing sitting there, and it doesn't talk back. You can't see your audience's reactions to your words. I guess that's why I never got into radio, I couldn't sit and talk to a wall for four hours a day, every day. I definitely prefer getting up on stage and giving a talk in person.

Monday, April 12, 2010

Pre-testing commits with Git

The awesome folks who work on Hudson CI have finally brought us pre-tested commits with Gerrit Code Review. Their solution of watching everything under refs/changes/ is a bit brute-force, but its an amazing first start, because Hudson can "vote" on the change and prevent it from being submitted if the build failed.

A few years ago I started a similar sort of thing for Git. Its carried in the contrib/continuous directory of the git.git source code distribution. But this whole Hudson-Gerrit integration is way better, because it lets you catch the failure before its submitted to your development branches.

Wednesday, April 7, 2010

Git is moving...

Cedric recently wrote an interesting post on his blog, Git for the nervous developer. Unlike a lot of the other blogs out there, he approached Git from the kicking and screaming angle, where he was already comfortable with another VCS and was forced to switch to Git for his day-job. Its an interesting perspective that he has, how he has found some sort of happiness with the tool he didn't choose to use.

Today I also gave a talk on Git, JGit, EGit and Gerrit Code Review at the Sonatype Maven Meetup in Philadelphia. The talk was really well attended, according to Jason van Zyl, everyone chose my talk during the first time slot of the day. Most of the audience is apparently from the financial services sector, so they are a bit behind the bleeding edge of the open source VCS curve, but they were aware of Git and asking some really great questions about its capabilities. I'm glad I went, maybe we'll see some wider adoption of Git outside of the more usual open source communities.

Monday, March 29, 2010

JGit 0.7.1

We finally managed to release 0.7.1 of JGit, through the Eclipse Foundation's incubation process. Unfortunately we have yet to figure out how to get our Hudson CI server to produce a Maven update site and make that available through the download farm used by Eclipse projects. So we have yet to get an official Maven repository published.

But Eclipse users can install EGit 0.7.1 through the official P2 update site,

[Update] We now have an official JGit Maven site.

Saturday, March 27, 2010

Can't beat the cloud

I have decided it is time to stop running my own web server just for my silly little blog. And that was about the only reason I'm still renting a virtual server from Slicehost (err Rackspace). At $20/month it just doesn't make much sense anymore.

So I've moved the blog onto Blogger, static files onto AWS S3, and some small URL redirection glue onto Google AppEngine. With free DNS hosting provided by my registrar, free blog hosting at Blogger, and free redirection glue on AppEngine, I can cut my costs by nearly $20/month. The only real cost is the static files on S3, which is just a couple of tiny images. I fortunately have never had a very media rich site. Given S3's prices, this is pennies/month.

The domain's email was moved months ago onto Google Apps for Your Domain. I just couldn't keep SpamAssassin running with sufficiently up-to-date rules on a tiny virtual server with only 256 MB of memory allocated to it. Fortunately, Gmail works great over IMAP and direct SMTP. And I do like having the fast web based search every once in a while.

This means the experiment will be going away soon. Most likely I'll keep my Git repositories at GitHub, or Fortunately these are small and generally just mirrors of open source projects whose primary repository lives elsewhere.