Blog Pseudoaccidentale

2009-06-05

query-replace-regexp or “making movies enjoyable”

One of the movies I recently watched came with subtitles in a separate *.srt file. The subtitles had 99% correct timing information, a pretty amazing feat. They had a minor glitch though. Many instances of lowercase ‘el’ had been replaced with uppercase ‘i’.

I noticed this tiny glitch in the first few lines of text, and then discovered that it was a buglet that occured far too often. My OCD side started feeling bad about the movie, because all the bogus capital ‘i’ letters were distracting me from the “real” fun of watching the actual movie.

So I stopped watching, and I fired up GNU Emacs on the srt file.

After 1-2 minutes of work, and a lot of fun with query-replace-regexp I found a nice replacement pattern to interactively fix all the broken ‘i’ instances:

M-x query-replace-regexp RET
    \([^I ]*\)\(I+\)\([^I ]*\) RET
    \1\,(replace-regexp-in-string "i" "l" (downcase \2))\3 RET

This had quite a few false positives, but it did 95% of the work, so I manually fixed the 5-10 instances it didn’t catch, and I was finally able to enjoy the movie.

Note: The perceptive reader who is also a fan of regular expressions will probably notice very quickly that part of the third line is Lisp code. This is an amazing feature of regexp replacement in Emacs. When the special pattern \, appears in the replacement text it evaluates the following expression as Emacs Lisp. An arbitrarily complex Lisp expression can be used after \, and its return value is used as the replacement text.

I’m positively thrilled that Emacs saved the day… again :-)

Blog at WordPress.com.