Finding things

Sat Mar 17, 2007 · 642 words

I find it ironic, that things get increasingly difficult to find as their physical proximity increases (ie distance decreases). Thanks to Google, I can easily find a solution to a problem some guy, thousands of kilometres, and more importantly, years away, found and yet I'm utterly helpless if I have to find my bus ticket under the pile of papers that is my workdesk. In other words, filipp's search theorem states that “the physical distance of an object is inversely proportionate to the chance of me actually finding it".

My media library is another example (and I'm sure I'm not alone in this). I have tends of thousands of files stored on different mediums (computers and discs) within the confines of my home, only tens of meters apart, but when I need to find a single file, it's pretty much hopeless. While I can't really do much about my workdesk - it's just a force of habit, I'm still convinced I can do something about those files.

So what are the solutions?

Consolidate your storage by getting a bigger disk. I could image this working quite well. If I had about a terabyte of local storage, I could just store everything in one place, and use Spotlight to find it. Ofcourse, this is a band-aid. In a year, would need 2 TB, at least. It's also expensive.
Stop hoarding stuff. This isn't really an option for me, but the basic idea is that the really important data doesn't really take up much space. For me it's mostly text, with some pictures that I could easily fit within 10 GB. The rest (music and videos) is just enterntainment, right? Applications have become the most disposable form of “hd filler”. But then you also have music bought from iTunes that you simply have nowhere else.
Start using some sort of DAM (Digital Asset Management) system. We're already doing this in many areas - Mail.app, iTunes, iPhoto, del.icio.us etc are all managing some assets. But which one should I install? I'm sure there's a really fancy and expensive solution out there, but all I really need is a generic tool that would essentially take snapshots of folders and then allow me to search them.
Hack Spotlight to support read-only media and file servers. I actually tried this, but after failed attempts to index SMB shares, I quickly gave up. Who knows, Leopard might help here, but I'm not betting on it.

Taking a snapshot of a folder on a UNIX system, is ofcourse a piece of cake:

> find ~/Movies \! -name '.*' > movies.txt

With more effort this could be made to check for duplicates (when importing the same folder again) and all sorts of cool things. Things will get hairy when I ned to store some extra attributes for an item. Like the QuickTime comment of a .mov or the year of an MP3. For this, a flat file becomes tedious.

The snapshot approach is really all I need - a medium-agnostic method for indexing and offline-searching file archives. So I'm starting yet another sideproject - a Python script that does exactly that. Here's what I have so far:

> ./lumi.py /Volumes/300G/Movies/

 > importing /Volumes/300G/Movies/

 > Successfully imported 1296 items

> ./lumi.py list

 > 2007-03-17T12:22:23.827078 (/Volumes/300G/Movies/)

 > 2007-03-17T12:23:04.715256 (/Volumes/Shared Items/Movies/)

> ./lumi.py name blender

 > /Volumes/300G/Movies/Training (blendervt-interface-3dviewport-v234-r0.avi)

 > /Volumes/300G/Movies/Training (blendervt-interface-concept-v234-r0.avi)

 > /Volumes/Shared Items/Movies/Animation (Blender3d_SIGGRAPH2005_DVDRip.avi)

 > /Volumes/Shared Items/Movies/Animation (BlenderTricksEpisode.avi)

The datastore is XML. Every import action has it's own Import element. This allows me to do things like undos etc. Oh, and the trick to getting Unicode to work somewhat properly, is to add:

import sys

sys.setdefaultencoding('iso-8859-1')

to _/Library/Frameworks/Python.framework//Versions/2.4/lib/python2.4/site-packages/sitecustomize.py

_While all of this is very basic ATM, it's already useful for me. No longer do I have to boot up a file server just find out if some music video is stored on it.