Don't mess with Time Machine

Having just recently survived the classic “Oh yeah, I have backups… erase drive… oh… wait…” moment, I thought I'd write down my experiences for future posterity.

The setup

My backup setup is of pretty much your typical “slightly more techie mac geek” -variety, combining the ease-of-use of Time Machine (TM) with the awesome power and versatility of FreeNAS. Both my machines are backed up this way - my main 27” iMac (mid 2011) and my (t)rusty 2008 MacBook Pro, both running 10.9. I had been planning to give the iMac a much-needed IO boost by replacing the optical drive with an SSD and setting up a home-brew Fusion Drive setup. Naturally, this required formatting the internal HDD, but “No matter, I have Time Machine”, I thought. How very wrong I was…

To cut a longer story shorter, when it came time to restore the backup, it didn’t work and the whole problem can be summed up by this screen shot:

No mountable file systems

Here’s what popped up in system.log when trying to mount the sparse bundle from the Finder:

28/11/13 21:59:16,000 kernel[0]: hfs: early journal init: volume on disk4s2 is read-only and journal is dirty.  Can not mount volume.
28/11/13 21:59:16,000 kernel[0]: hfs_mountfs: hfs_early_journal_init failed, erroring out 
28/11/13 21:59:16,000 kernel[0]: hfs_mount: hfs_mountfs returned error=22 for device disk4s2
28/11/13 21:59:16,603 diskarbitrationd[17]: unable to mount /dev/disk4s2 (status code 0x00000001).

My “go-to” solution in these situations has usually been: - attach the broken image (no mount) - rebuild (or preview) the attached partition with DiskWarrior

except in this case, DiskWarrior totally bailed with a “Not enough memory error”. It did recommend booting from it’s recovery DVD, which I tried, but that proved of little help use since the data’s on a file server and there’s no way to mount shares from the DVD (the DVD does include Terminal, but there’s no way to launch it…weird)

DiskWarrior not enough memory

I was able to attach the drive with

$ hdiutil -attach -nomount /Volumes/TimeMachine/flipMac.sparsebundle

but that would always attach the device read-only, rendering fsck completely useless. Trying to attach it read-write (so that I could fix it with fsck) would not work:

$ hdiutil attach -nomount -readwrite /Volumes/TimeMachine/flipMac.sparsebundle/
hdiutil: attach failed - Permission denied

… even after running chflags on the whole sparse bundle.

The diagnosis

What I had on my hands was an HFS+J volume with a corrupt journal file.To delete the journal, you have to be able to mount the volume read/write, but to do that the volume has to have a valid journal. In other words, diskutil and fsck are helpless in these situations.

-j saves the day

I was getting desperate. All my truly critical data (source code) was safe, but just the thought of losing all my music felt depressing and more and more real.

Then I got an idea - would it be possible to mount the volume read-only while ignoring the journal completely? It turns out there is - from the mount_fsck man-page:

-j Ignore the journal for this mount.

sudo -s
hdiutil attach -nomount /Volumes/TimeMachine/flipMac.sparsebundle/
mkdir /tmp/mount
mount_hfs -o ro -j /dev/disk4s2 /tmp/mount/
ditto -V —noacl /tmp/mount/Backups.backupdb/flipMac/Latest/Macintosh\ HD/Users/filipp/ /Users/filipp

… after which you can go and re-create the user account on the new machine, drag over any applications you need, yadayadayada.

Post mortem

My TM backups were running just fine, there was absolutely no indication that anything was wrong. I think what caused this was that, while the iMac was offline reinstalling 10.9, I mounted the TM sparse bundle from my ProBook (to retrieve a file I needed to work on and hadn’t synced to any cloud). I mounted the image, got the file and then at some point, without thinking about it, put the ProBook to sleep. When the machine woke up, the image (“Time Machine Backups”) was still mounted, I ejected it and continued working. After a little while I needed to access the image again, but this time got a “resource unavailable” after double-clicking the image. I didn’t pay it much attention at that time (mounting images over wireless isn’t totally bomb-proof) , but in hindsight, the image was already corrupt at that time.

This would also explain a previous episode I had with TM where it suddenly said that “there’s a problem with the backup and that it has to be reinitialised” (or something like that). I had probably just opened the image manually at some point which caused the same corruption. Back then it just seemed like an annoyance having to back everything up from scratch again, but this time, shit was serious…

So to summarise - don’t go opening TM sparsebundles just because you can, or if you really have to, mount then read-only!