A Universal Subtitle Format and distribution method

April 1, 2007

OnTrees will be a weekly “column” about ideas for tools and technologies I think would make working with computers and consuming media more enjoyable and fulfilling. The name comes from the slogan of another site I run. I normally just jot my ideas down as iCal To Dos, but I think it’s time to try and articulate them. If for no other reason than to help me better understand them myself and obviously in the hopes of someone else picking up on them.

This week is all about subtitles and translations. Is it just me or is this an area that has gone totally unnoticed by the media industry? I’m a firm believer in the power and increasing importance of online distribution, but as you’ll see in the findings to come, this “tiny” issue has only been addressed by pirates! There are ~6000 languages in the world and English is one of them. Of course it’s also the language that the media that most people want to buy, is also produced in. Not to mention the hearing impaired English natives.

The iTunes Store doesn’t carry subtitles (the Podcasting XML extensions define a tag by that name for another purpose). Amazon’s Unbox service is only available in the US store and at least none of the titles I clicked through were available outside the US and seemed to not have any subtitles. CinemaNow (among other stores) doesn’t work on anything other than Internet Explorer (a whole subject in itself) to the extent that it’s not even possible to obtain more information (such as available subs) about a title without IE. Grr… Netflix, another US-only service carries subtitles, but that’s ofcourse because it’s a DVD rental. There are talks about Netflix starting a download service, we’ll see how that pans out.

Trying to analyse the legitimate movie download business from this perspective is a surprisingly frustrating experience. Most of them are limited to the US and none of them provided sufficient information about a title, such as available subtitles. If anyone has any positive examples of such services, I’d be very interesting in hearing about them.

It really seems like the pirates have the definitive upper hand here. First of all, there’s technology in place to support subs. SRT (SubRip) and SUB (VOBSub), two of the most popular subtitle formats have been around for years and stem both from (Windows-only) tools that rip DVD-s. But maybe even more importantly, they’ve created somesort of formats for people who want to translate movies themselves.

Translation is hard work. Especially with movies without a transcript. I have no idea how the procedure works exactly, but I would guess that the local movie distributors are in charge of commissioning a translation for the theatrical release which is then re-used for the DVD. It’s probably not cheap but subtitles are a pretty important feature on a DVD and very few people would buy DVD-s without them (duh. Download services, hello?). I would also imagine that the subs are covered by the same kind of copyright as the movie with the rights belonging to the particular distributor.

Then there’s the big number of really good films that might never be released in your home country, and therefore without an “official” translation. Donnie Darko never made it to theatres in Estonia nor to DVD and most Finns will never understand the philosophical commentary in Waking Life as just two examples. Most documentaries are like this, not to mention the thousands of classic titles that will never get translated by the distributors simply because it’s not profitable. The fewer people speak your language, the bigger the chance that you’ll never get a move translation. Luckily there are local communities that work hard on translating movies to their mother tongue, like divxfinland.org. More power to them!

The problem is this area is extremely fragmented and there are a ton of compatibility issues. First of all it’s not always easy to find subs for your title. There are different communities working on subtitles and sometimes you can find diferent subs in the same language for the same movie and none of them might work with your copy of that title. DVD rips from different regions come in different frame rates which creates a whole bunch of problems. Releases are sometimes also of different lengths and so subs for a particular title will usually not work for a title of a different release. Video players also handle subtitles differently with just a few of them allowing you to tweak timing, usually with disastrous effects. I must admit the smoothest subtitle experience I’ve had was with BSPlayer, which is Windows-only. On top of it all, getting subtitles to work is definitely not something that every computer user is even capable of.

In other words, while we kinda have all the necessary components, this whole area is a mess and I think it’s high time we do something about it. Here’s a few suggestions:

1) Start using a unified file format. Preferably something structured and human-readable that would also have a metadata header that would at the very least, say which language this is (which neither SUB or SRT seems to have!) and what framerate and call it Universal Subtitle Format. Well, what do you know, there actually is one already! The only problem is, their website is down and I couldn’t get to the specs even through Google cache. The styling I would personally leave out of the subtitle spec, but I could see how some would need it for highlights or whatever (since this is more of a general-purpose titling format). The default encoding should be Unicode and a strcutured format would allow us to extend it later.

In a perfect world, all subs for every language would be nicely distributed inside the media file itself, but I don’t see this happening any time soon.

UPDATE: After taking a closer look at USF (from the info here) I get the feeling this is not what I want. The spec has been in draft v0.16 since 2002, it’s overly complicated (although there’s a mention of a plan to introduce different levels) and verbose (already thanks to XML). Player support is also vague to say the least. I tried reading the example USF file with VLC and nothing happened.

The format should be simple and use industry standard timecode format. The only problem I see with using YAML is that you would need a library to easily take advantage of it which are not yet available for every language. It also seems a bit of an overkill for a smiple task such as this (simple header + timecode/titles). Frame-based timecode seemed like a good idea at first (very easy to count) but it seems like you might run into sync issues if the media skips - how would the player know to re-sync the titles if the media skipped an unknown number of frames?

2) Create a universal online database to host the subtitles, preferably something with an XML-RPC backend so that video players and other clients could easily find and download subs for a title. Imagine if Wikipedia and all the translations would be scattered around the web on different domains, because that’s exactly the situation we’re in here atm. Think of this as CDDB for subtitles. Host the subs under somekind of accessible, globally recognized license. Add in some basic version control.

3) Make it as easy as possible to write and improve subtitles. There are a bunch subtitle editors out there, but none of the ones I’ve seen could be called user friendly. This situation might actually be better on Windows. Ideally, you could be watching a movie and make corrections on the fly by just pausing the film and editing the text in-place, or after wathing it by leaving markers at places you think could be improved. This editor could also be used as a separate subtitle player that would overlay the text using the title timecode over any video player. I know this should be possible on OS X but I’m not sure about other platforms. This player should also intelligently adjust frame rate and have some optional features to time shift titles. Make this editor work with the aforementioned database so that after you revision, you could just hit save and everyone else could take advantage of your work instantly.

This player/editor should also support importing subs from SRT and SUB formats and have a command for “burning” the titles into a media file (for standalone players).