Insufficiently Random: 2010

Friday, May 14, 2010

Don't Assume You Know What's Best

Yesterday I think we finally found the cause of Gerrit Code Review issue 390 and JGit bug 308945. In issue 390 a long-running Gerrit Code Review daemon suddenly loses access to objects in one or more Git repositories. The daemon's error log shows the server cannot find a commit, but git cat-file on the command line is able to read the same commit without errors. A restart of the daemon JVM always corrects the problem. So we knew the problem had to be data corruption within JGit's in-memory caches.

All along I've been looking for some sort of data corruption in the list of known pack files. The list is managed using volatiles and AtomicReferences, and does most atomic operations itself, rather than building upon the data structures offered by the java.util.concurrent package. Since I'm not Doug Lea, its entirely possible that there is a race or unsafe write within this code. Fortunately, this code appears to be OK. I've gone over it dozens of times and cannot find a logic fault.

Then along comes bug 308945, where JGit is reading from a closed pack file.

JGit opens each pack file once using a RandomAccessFile, and then uses the NIO API read(ByteBuffer, long) to execute a thread-safe pread(2) system call anytime it needs data from the file. This allows JGit to reuse the same file descriptor across multiple concurrent threads, reducing the number of times that it needs to check the pack file's header and footer. It also minimizes the number of open file descriptors required to service a given traffic load. To prevent the file from being closed by one thread while its being read by another, JGit keeps its own internal 'in-use' counter for each file, and only closes the file descriptor when this counter drops to 0. In theory, this should work well.

Right until we mixed in MINA SSHD, a pure Java SSH server. When a client disconnects unexpectedly, MINA appears to be sending an interrupt to the thread that last read data from the client connection. If that thread is currently inside of read(ByteBuffer,long), the read is interrupted, the file descriptor is closed, and an exception is thrown to the caller.

Wait, what? The file descriptor is closed?

And therein lies the bug. When I selected read(ByteBuffer,long) and this file descriptor reuse strategy for JGit, I failed to notice the documentation for throws ClosedByInterruptException. That oversight on my part lead to the misuse of an API, and an ugly race condition.

When MINA SSHD interrupts a working JGit thread at the right place, the current pack file gets closed, but JGit's in-use counter thinks its still open. Subsequent attempts to access that file all fail, because its closed. When JGit encounters a pack file that is failing to perform IO as expected, it removes the file from the in-memory pack file list, but leaves it alone on disk. JGit never picks up the pack file again, as the pack file list is only updated when the modification time on the GIT_DIR/objects/pack directory changes. Without the pack file in the list, its contained objects cannot be found, and they appear to just vanish from the repository.

I don't know what possessed the people who worked on JSR 51 (New I/O APIs for the Java^TM Platform) to think that closing a file descriptor automatically during an interrupt was a good idea. RandomAccessFile's own read method doesn't do this, but its associated FileChannel does. In my opinion, they might as well have just invoked a setuid root copy of /sbin/halt and powered off the host computer.

Wednesday, April 28, 2010

Why am I surprised when things work?

I recently purchased a Fujitsu ScanSnap S1500M. This isn't interesting, its a scanner. You plug it into your computer, and its supposed to make picture files from paper. Yay. We've had scanners for ages. Its not blog worthy.

What shocked me was, the damn thing does exactly what it says on the tin.

Load its sheet feeder up with paper, plug it into the computer's USB port. Push the only "Scan" button on the front. Next thing you know, there is a folder full of sequentially numbered JPEG files. It automatically detects the length of the paper. It scans double-sided at the same speed it scans single-sided. It automatically drops back sides which are completely blank. Pages narrower than 8.5" are correctly detected and scanned with a narrower image width. It goes through 20 pages per minute. That's fast enough that its done before you realize its started.

I realized after scanning several hundred pages in just a few minutes that very few things I purchase these days "just work". Most products still require a lot of tinkering from the user, or are still so complex that you need an advanced degree to operate them. This scanner, well, anyone's cat could use it. Just tap that scan button.

Most products require you to purchase additional stuff, e.g. cables, to get them to work. Fujitsu actually included a USB cable in the box. Just unpack, plug in, and go. Its hard to argue with that. Even my HD TiVo was harder to get setup and going.

To organize that directory of image files, I started using Brad Fitzpatrick's scanningcabinet application. Though I did make a few changes in my own scanningcabinet fork on GitHub. Now if only Google AppEngine supported full text search better...

Friday, April 23, 2010

Gerrit Code Review on FLOSS Weekly

On Wednesday I recorded a netcast for FLOSS Weekly with Randal Schwartz and Randi Harper about Gerrit Code Review, JGit, EGit, and Git in general. The video and audio versions of the netcast are now available.

It was fun recording the show. I don't usually do these sorts of things, I find talking to a laptop somewhat challenging conceptually. Its just a thing sitting there, and it doesn't talk back. You can't see your audience's reactions to your words. I guess that's why I never got into radio, I couldn't sit and talk to a wall for four hours a day, every day. I definitely prefer getting up on stage and giving a talk in person.

Monday, April 12, 2010

Pre-testing commits with Git

The awesome folks who work on Hudson CI have finally brought us pre-tested commits with Gerrit Code Review. Their solution of watching everything under refs/changes/ is a bit brute-force, but its an amazing first start, because Hudson can "vote" on the change and prevent it from being submitted if the build failed.

A few years ago I started a similar sort of thing for Git. Its carried in the contrib/continuous directory of the git.git source code distribution. But this whole Hudson-Gerrit integration is way better, because it lets you catch the failure before its submitted to your development branches.

Wednesday, April 7, 2010

Git is moving...

Cedric recently wrote an interesting post on his blog, Git for the nervous developer. Unlike a lot of the other blogs out there, he approached Git from the kicking and screaming angle, where he was already comfortable with another VCS and was forced to switch to Git for his day-job. Its an interesting perspective that he has, how he has found some sort of happiness with the tool he didn't choose to use.

Today I also gave a talk on Git, JGit, EGit and Gerrit Code Review at the Sonatype Maven Meetup in Philadelphia. The talk was really well attended, according to Jason van Zyl, everyone chose my talk during the first time slot of the day. Most of the audience is apparently from the financial services sector, so they are a bit behind the bleeding edge of the open source VCS curve, but they were aware of Git and asking some really great questions about its capabilities. I'm glad I went, maybe we'll see some wider adoption of Git outside of the more usual open source communities.

Monday, March 29, 2010

JGit 0.7.1

We finally managed to release 0.7.1 of JGit, through the Eclipse Foundation's incubation process. Unfortunately we have yet to figure out how to get our Hudson CI server to produce a Maven update site and make that available through the download farm used by Eclipse projects. So we have yet to get an official Maven repository published.

But Eclipse users can install EGit 0.7.1 through the official P2 update site, http://download.eclipse.org/egit/updates.

[Update] We now have an official JGit Maven site.

Saturday, March 27, 2010

Can't beat the cloud

I have decided it is time to stop running my own web server just for my silly little blog. And that was about the only reason I'm still renting a virtual server from Slicehost (err Rackspace). At $20/month it just doesn't make much sense anymore.

So I've moved the blog onto Blogger, static files onto AWS S3, and some small URL redirection glue onto Google AppEngine. With free DNS hosting provided by my registrar, free blog hosting at Blogger, and free redirection glue on AppEngine, I can cut my costs by nearly $20/month. The only real cost is the static files on S3, which is just a couple of tiny images. I fortunately have never had a very media rich site. Given S3's prices, this is pennies/month.

The domain's email was moved months ago onto Google Apps for Your Domain. I just couldn't keep SpamAssassin running with sufficiently up-to-date rules on a tiny virtual server with only 256 MB of memory allocated to it. Fortunately, Gmail works great over IMAP and direct SMTP. And I do like having the fast web based search every once in a while.

This means the git.spearce.org experiment will be going away soon. Most likely I'll keep my Git repositories at GitHub, or repo.or.cz. Fortunately these are small and generally just mirrors of open source projects whose primary repository lives elsewhere.

Wednesday, February 10, 2010

The Eclipse.org JGit follies continue...

Another day. Another compliant from me about running a project at Eclipse.org. This time it wound up in the jgit-dev mailing list archives, as replies to a thread that I think started from my blog post on the tragedy of Eclipse.

Instead of reposting the whole thing, I'll just point to my two messages in context:

why do I need to spend my time on this crap?
why is the new file header what it is?

Monday, February 8, 2010

The tragedy of Eclipse.org

I've probably posted something about this before. But I'm really getting fed up with the Eclipse Development Process. Its a frelling nightmare for a committer to work with. I'm really starting to regret moving JGit there.

Right now, if I have X hours to work on a project, I seem to be averaging what feels like X/2 hours in paperwork and other "important steps" of the development process. None of which have helped my project to ship higher quality, or more feature complete code. Which means either my or my employer's time is being wasted. I don't have time to waste when I have 108 bugs open in Gerrit Code Review, and 64 bugs open in EGit and JGit.

Based on a private email chain I'm having with the Eclipse IP review team, it looks like the initial EGit code contribution was bungled not just by myself, but also by the foundation's IP review process. Which means I probably have to run EGit back through IP review, almost from scratch. But only after I write a script to datamine contributors out of the old EGit history and inject a complete, per-file git short-log into each file header. Its a good thing I have an awesome version control system like Git to keep these records for me. Too bad nobody else on the planet can use it to obtain information they might want to know about our source code. I guess running software to read information about a file is too scary for some individuals. So I have to do it for them. Now, and for every change we make in the future. Yay. :-(

The astute reader may notice in that above paragraph, "private email chain" doesn't jive with other publications from the Eclipse Foundation demanding that projects be run in an open and transparent manner (see how do I start a project on Eclipse Newcomers). I really do feel like JGit is a less open project now that it has moved to Eclipse.org. Conversations with the Eclipse IP team about the legal status of any contribution is always discussed by private email. These things never make it to the project mailing list. The IPzilla database is closed to everyone but committers. There are backroom deals going on about what our file headers should look like in order to sufficiently convey that the source code is under the new-style BSD. The discussion that led to the approval of the EGit IP log for 0.7.0, approved despite what appears to be an error in the initial review, also happened by private email.

It took a significant amount of effort on my part to even get JGit hosted at Eclipse.org. Originally, the new-style BSD license wasn't permissible for a hosted project, and I had to seek a special exemption from the Eclipse Board of Directors. A process that required significant backroom conversations, over at least 6 months. Again, not exactly open. The only reason I think I haven't pulled the project back is because of the huge initial investment I've already made in this.

Maybe JGit and EGit are just unique projects. But in my experience, I am not a unique snowflake, and neither is my work. I'm not as special as I might seem at first glance.

I wouldn't be surprised if I've lost at least 2 days every month to paperwork. That's about 30 days, or 1.5 person-months since the project really started this move in January 2009. 1/12 of my time over the past year has just gone to catering to the Eclipse development process. Food for thought. Join Eclipse... make sure you pick up at least 1/12 of another full-time developer just to deal with the red-tape.

The part that really troubles me with the red-tape isn't so much that it is there, but that committers bear the brunt of the effort, while large corporations that are strategic members reap the benefits of having a concise change history listed inside of each source code file, or knowing that every contributor who ever touched this source code has been grilled in detail on a bug tracker.

So back to my post title. The real tragedy is, these corporations who sell commercial products based on top of Eclipse.org distributions are pushing not just the open source development work, but also a whole ton of onerous legal and reporting constraints back onto their project committers. Its enough to make this committer start to reconsider things. I wish I had been using a time clock this past year, to accurately record how many days the Eclipse development process has robbed me of since the start of all of this. It feels significant enough that if I went to my manager with it, I think he'd go ballistic.

Thursday, February 4, 2010

Why commit messages matter

Some folks wonder why I want longer, detailed commit messages in a project. Often other people claim "Fix the frobinator bug when it frobs too slow" might be sufficiently detailed to cover a change. But its usually not.

How class names can go horribly wrong

Somehow I found myself writing this in a JGit test case:


assertTrue("isa TransportHttp", t instanceof TransportHttp);
assertTrue("isa HttpTransport", t instanceof HttpTransport);

What is wrong with me...

Insufficiently Random

The lonely musings of a loosely connected software developer.

Friday, May 14, 2010

Don't Assume You Know What's Best

Wednesday, April 28, 2010

Why am I surprised when things work?

Friday, April 23, 2010

Gerrit Code Review on FLOSS Weekly

Monday, April 12, 2010

Pre-testing commits with Git

Wednesday, April 7, 2010

Git is moving...

Monday, March 29, 2010

JGit 0.7.1

Saturday, March 27, 2010

Can't beat the cloud

Wednesday, February 10, 2010

The Eclipse.org JGit follies continue...

Monday, February 8, 2010

The tragedy of Eclipse.org

Thursday, February 4, 2010

Why commit messages matter

Monday, January 4, 2010

How class names can go horribly wrong

Blog Archive

The lonely musings of a loosely connected software developer.

Friday, May 14, 2010

Wednesday, April 28, 2010

Friday, April 23, 2010

Monday, April 12, 2010

Wednesday, April 7, 2010

Monday, March 29, 2010

Saturday, March 27, 2010

Wednesday, February 10, 2010

Monday, February 8, 2010

Thursday, February 4, 2010

Monday, January 4, 2010

Subscribe To

Blog Archive