Insufficiently Random

The lonely musings of a loosely connected software developer.

Showing posts with label jgit. Show all posts
Showing posts with label jgit. Show all posts

Friday, May 14, 2010

Don't Assume You Know What's Best

Yesterday I think we finally found the cause of Gerrit Code Review issue 390 and JGit bug 308945. In issue 390 a long-running Gerrit Code Review daemon suddenly loses access to objects in one or more Git repositories. The daemon's error log shows the server cannot find a commit, but git cat-file on the command line is able to read the same commit without errors. A restart of the daemon JVM always corrects the problem. So we knew the problem had to be data corruption within JGit's in-memory caches.

All along I've been looking for some sort of data corruption in the list of known pack files. The list is managed using volatiles and AtomicReferences, and does most atomic operations itself, rather than building upon the data structures offered by the java.util.concurrent package. Since I'm not Doug Lea, its entirely possible that there is a race or unsafe write within this code. Fortunately, this code appears to be OK. I've gone over it dozens of times and cannot find a logic fault.

Then along comes bug 308945, where JGit is reading from a closed pack file.

JGit opens each pack file once using a RandomAccessFile, and then uses the NIO API read(ByteBuffer, long) to execute a thread-safe pread(2) system call anytime it needs data from the file. This allows JGit to reuse the same file descriptor across multiple concurrent threads, reducing the number of times that it needs to check the pack file's header and footer. It also minimizes the number of open file descriptors required to service a given traffic load. To prevent the file from being closed by one thread while its being read by another, JGit keeps its own internal 'in-use' counter for each file, and only closes the file descriptor when this counter drops to 0. In theory, this should work well.

Right until we mixed in MINA SSHD, a pure Java SSH server. When a client disconnects unexpectedly, MINA appears to be sending an interrupt to the thread that last read data from the client connection. If that thread is currently inside of read(ByteBuffer,long), the read is interrupted, the file descriptor is closed, and an exception is thrown to the caller.

Wait, what? The file descriptor is closed?

And therein lies the bug. When I selected read(ByteBuffer,long) and this file descriptor reuse strategy for JGit, I failed to notice the documentation for throws ClosedByInterruptException. That oversight on my part lead to the misuse of an API, and an ugly race condition.

When MINA SSHD interrupts a working JGit thread at the right place, the current pack file gets closed, but JGit's in-use counter thinks its still open. Subsequent attempts to access that file all fail, because its closed. When JGit encounters a pack file that is failing to perform IO as expected, it removes the file from the in-memory pack file list, but leaves it alone on disk. JGit never picks up the pack file again, as the pack file list is only updated when the modification time on the GIT_DIR/objects/pack directory changes. Without the pack file in the list, its contained objects cannot be found, and they appear to just vanish from the repository.

I don't know what possessed the people who worked on JSR 51 (New I/O APIs for the JavaTM Platform) to think that closing a file descriptor automatically during an interrupt was a good idea. RandomAccessFile's own read method doesn't do this, but its associated FileChannel does. In my opinion, they might as well have just invoked a setuid root copy of /sbin/halt and powered off the host computer.

Friday, April 23, 2010

Gerrit Code Review on FLOSS Weekly

On Wednesday I recorded a netcast for FLOSS Weekly with Randal Schwartz and Randi Harper about Gerrit Code Review, JGit, EGit, and Git in general. The video and audio versions of the netcast are now available.

It was fun recording the show. I don't usually do these sorts of things, I find talking to a laptop somewhat challenging conceptually. Its just a thing sitting there, and it doesn't talk back. You can't see your audience's reactions to your words. I guess that's why I never got into radio, I couldn't sit and talk to a wall for four hours a day, every day. I definitely prefer getting up on stage and giving a talk in person.

Monday, March 29, 2010

JGit 0.7.1

We finally managed to release 0.7.1 of JGit, through the Eclipse Foundation's incubation process. Unfortunately we have yet to figure out how to get our Hudson CI server to produce a Maven update site and make that available through the download farm used by Eclipse projects. So we have yet to get an official Maven repository published.

But Eclipse users can install EGit 0.7.1 through the official P2 update site, http://download.eclipse.org/egit/updates.

[Update] We now have an official JGit Maven site.

Wednesday, February 10, 2010

The Eclipse.org JGit follies continue...

Another day. Another compliant from me about running a project at Eclipse.org. This time it wound up in the jgit-dev mailing list archives, as replies to a thread that I think started from my blog post on the tragedy of Eclipse.

Instead of reposting the whole thing, I'll just point to my two messages in context:

why do I need to spend my time on this crap?
why is the new file header what it is?

Monday, February 8, 2010

The tragedy of Eclipse.org

I've probably posted something about this before. But I'm really getting fed up with the Eclipse Development Process. Its a frelling nightmare for a committer to work with. I'm really starting to regret moving JGit there.

Right now, if I have X hours to work on a project, I seem to be averaging what feels like X/2 hours in paperwork and other "important steps" of the development process. None of which have helped my project to ship higher quality, or more feature complete code. Which means either my or my employer's time is being wasted. I don't have time to waste when I have 108 bugs open in Gerrit Code Review, and 64 bugs open in EGit and JGit.

Based on a private email chain I'm having with the Eclipse IP review team, it looks like the initial EGit code contribution was bungled not just by myself, but also by the foundation's IP review process. Which means I probably have to run EGit back through IP review, almost from scratch. But only after I write a script to datamine contributors out of the old EGit history and inject a complete, per-file git short-log into each file header. Its a good thing I have an awesome version control system like Git to keep these records for me. Too bad nobody else on the planet can use it to obtain information they might want to know about our source code. I guess running software to read information about a file is too scary for some individuals. So I have to do it for them. Now, and for every change we make in the future. Yay. :-(

The astute reader may notice in that above paragraph, "private email chain" doesn't jive with other publications from the Eclipse Foundation demanding that projects be run in an open and transparent manner (see how do I start a project on Eclipse Newcomers). I really do feel like JGit is a less open project now that it has moved to Eclipse.org. Conversations with the Eclipse IP team about the legal status of any contribution is always discussed by private email. These things never make it to the project mailing list. The IPzilla database is closed to everyone but committers. There are backroom deals going on about what our file headers should look like in order to sufficiently convey that the source code is under the new-style BSD. The discussion that led to the approval of the EGit IP log for 0.7.0, approved despite what appears to be an error in the initial review, also happened by private email.

It took a significant amount of effort on my part to even get JGit hosted at Eclipse.org. Originally, the new-style BSD license wasn't permissible for a hosted project, and I had to seek a special exemption from the Eclipse Board of Directors. A process that required significant backroom conversations, over at least 6 months. Again, not exactly open. The only reason I think I haven't pulled the project back is because of the huge initial investment I've already made in this.

Maybe JGit and EGit are just unique projects. But in my experience, I am not a unique snowflake, and neither is my work. I'm not as special as I might seem at first glance.

I wouldn't be surprised if I've lost at least 2 days every month to paperwork. That's about 30 days, or 1.5 person-months since the project really started this move in January 2009. 1/12 of my time over the past year has just gone to catering to the Eclipse development process. Food for thought. Join Eclipse... make sure you pick up at least 1/12 of another full-time developer just to deal with the red-tape.

The part that really troubles me with the red-tape isn't so much that it is there, but that committers bear the brunt of the effort, while large corporations that are strategic members reap the benefits of having a concise change history listed inside of each source code file, or knowing that every contributor who ever touched this source code has been grilled in detail on a bug tracker.

So back to my post title. The real tragedy is, these corporations who sell commercial products based on top of Eclipse.org distributions are pushing not just the open source development work, but also a whole ton of onerous legal and reporting constraints back onto their project committers. Its enough to make this committer start to reconsider things. I wish I had been using a time clock this past year, to accurately record how many days the Eclipse development process has robbed me of since the start of all of this. It feels significant enough that if I went to my manager with it, I think he'd go ballistic.

Thursday, December 3, 2009

EGit at Eclipse

A few months ago we moved EGit, the Git team provider for Eclipse, over to the Eclipse Foundation.  Along the way we decided to try out some new development techniques, like taking advantage of my day-job project Gerrit Code Review to help us discuss pending changes.  This lead us down the road of not paying too much attention to the Eclipse IP process, and failing to tag all contributed patches with the +iplog flag in Bugzilla.

Fortunately Wayne Beaton helped us get Gerrit configured in a way that meets the foundation's IP process guidelines, and has encouraged us to continue forward.  This is great, because it means we can rely on Git for attribution tracking, rather than Bugzilla.

Also, since the project moved homes we picked up 3 prolific contributors, and all of them have turned into committers on the project.

Tuesday, July 8, 2008

Using jgit To Publish on Amazon S3

Recent versions of jgit, the 100% pure Java implementation of the Git version control system, support fetch and push directly over Amazon S3 .

It behaves like http push does in C git in that it is transparent to the end-user. Transparent client-side encryption can also be enabled, in case the repository data must be protected from the operators of S3.