Insufficiently Random

The lonely musings of a loosely connected software developer.

Friday, May 14, 2010

Don't Assume You Know What's Best

Yesterday I think we finally found the cause of Gerrit Code Review issue 390 and JGit bug 308945. In issue 390 a long-running Gerrit Code Review daemon suddenly loses access to objects in one or more Git repositories. The daemon's error log shows the server cannot find a commit, but git cat-file on the command line is able to read the same commit without errors. A restart of the daemon JVM always corrects the problem. So we knew the problem had to be data corruption within JGit's in-memory caches.

All along I've been looking for some sort of data corruption in the list of known pack files. The list is managed using volatiles and AtomicReferences, and does most atomic operations itself, rather than building upon the data structures offered by the java.util.concurrent package. Since I'm not Doug Lea, its entirely possible that there is a race or unsafe write within this code. Fortunately, this code appears to be OK. I've gone over it dozens of times and cannot find a logic fault.

Then along comes bug 308945, where JGit is reading from a closed pack file.

JGit opens each pack file once using a RandomAccessFile, and then uses the NIO API read(ByteBuffer, long) to execute a thread-safe pread(2) system call anytime it needs data from the file. This allows JGit to reuse the same file descriptor across multiple concurrent threads, reducing the number of times that it needs to check the pack file's header and footer. It also minimizes the number of open file descriptors required to service a given traffic load. To prevent the file from being closed by one thread while its being read by another, JGit keeps its own internal 'in-use' counter for each file, and only closes the file descriptor when this counter drops to 0. In theory, this should work well.

Right until we mixed in MINA SSHD, a pure Java SSH server. When a client disconnects unexpectedly, MINA appears to be sending an interrupt to the thread that last read data from the client connection. If that thread is currently inside of read(ByteBuffer,long), the read is interrupted, the file descriptor is closed, and an exception is thrown to the caller.

Wait, what? The file descriptor is closed?

And therein lies the bug. When I selected read(ByteBuffer,long) and this file descriptor reuse strategy for JGit, I failed to notice the documentation for throws ClosedByInterruptException. That oversight on my part lead to the misuse of an API, and an ugly race condition.

When MINA SSHD interrupts a working JGit thread at the right place, the current pack file gets closed, but JGit's in-use counter thinks its still open. Subsequent attempts to access that file all fail, because its closed. When JGit encounters a pack file that is failing to perform IO as expected, it removes the file from the in-memory pack file list, but leaves it alone on disk. JGit never picks up the pack file again, as the pack file list is only updated when the modification time on the GIT_DIR/objects/pack directory changes. Without the pack file in the list, its contained objects cannot be found, and they appear to just vanish from the repository.

I don't know what possessed the people who worked on JSR 51 (New I/O APIs for the JavaTM Platform) to think that closing a file descriptor automatically during an interrupt was a good idea. RandomAccessFile's own read method doesn't do this, but its associated FileChannel does. In my opinion, they might as well have just invoked a setuid root copy of /sbin/halt and powered off the host computer.

7 comments :

Unknown said...

Oh wow, that's a tough one! I can well imagine the frustration, but I must admit I laughed a few times when I read this post, thanks to the great wording..
Let's hope this fixes it once and for all! :-)
Cheers!

Anonymous said...

Ah, I noticed this peculiarity a (long) while back. If I recall right the controversial design was required because the state of the file descriptor could not be guaranteed on all operating systems. Native bugs or something?

I did a quick google to look for references, but nothing came up. Anyone?

Hmm say would the setuid thing be platform independent? ;) Seriously though, good catch..

BR,
volley

Shawn said...

Given that the older IO APIs (RandomAccessFile and Socket) are interruptible without closing the underlying file descriptor, I'm not entirely convinced that NIO's close-on-interrupt behavior is forced by operating system limitations. Based on the information I was able to find in a limited period of time, it seemed like this choice was to help application developers avoid leaking a file descriptor when their current task was interrupted. If that is true, then its a horribly flawed assumption, because not everyone is writing a network server that needs to disconnect the client when the current thread is interrupted.

Anonymous said...

The new APIs are more closely tied to the underlying OS. For example, FileChannel supports "transferTo" and ByteBuffers can be allocated as direct. The new APIs hence live in a nastier world. I suspect if you tried to use RandomAccessFile directly (rather than its sibling channel), some other problem (possibly performance) would pop up. I wouldn't bet my live on it though...

(And yep, if the intention was to "help" it would indeed be a horrible design flaw from a seemingly very competent group..)

Ernst Sjöstrand said...

Hurray! Hope to see this in production at $DAYJOB soon! Any ETA for the fix? In which projects will you do the fixes needed?

Shawn said...

I'm mainly doing this to solve Gerrit Code Review's issue 390 (link above in the blog post). I think I can get away with fixing this by just changing how JGit reads from files. Instead of sharing an open RandomAccessFile across threads, I'm trying to create a pool of RandomAccessFiles where a thread can checkout an exclusive file object for use, then return it when its done. This may mean we open the same file multiple times if there are concurrent readers.

Right now I'm largely trying to ignore anything that isn't this task. I had hoped to finish the code and get it posted for code review on the JGit review site by Monday, but personal life got in the way a bit. Hopefully by the end of today or tomorrow I should have the first draft ready for review.

Shawn said...
This comment has been removed by the author.

Post a Comment