Tag Archives: FreeBSD

Powerful Regular Expressions Combined with Lisp in Emacs

Regular expressions are a powerful text transformation tool. Any UNIX geek will tell you that. It’s so deeply ingrained into our culture, that we even make jokes about it. Another thing that we also love is having a powerful extension language at hand, and Lisp is one of the most powerful extension languages around (and of course, we make jokes about that too).

Emacs, one of the most famous Lisp applications today, has for a while now the ability to combine both of these, to reach entirely new levels of usefulness. Combining regular expressions and Lisp can do really magical things.

An example that I recently used a few times is parsing & de-humanizing numbers in dstat output. The output of dstat includes numbers that are printed with a suffix, like ‘B’ for bytes, ‘k’ for kilobytes and ‘M’ for megabytes, e.g.:

----system---- ----total-cpu-usage---- --net/eth0- -dsk/total- sda-
     time     |usr sys idl wai hiq siq| recv  send| read  writ|util
16-05 08:36:15|  2   3  96   0   0   0|  66B  178B|   0     0 |   0
16-05 08:36:16| 42  14  37   0   0   7|  92M 1268k|   0     0 |   0
16-05 08:36:17| 45  11  36   0   0   7|  76M 1135k|   0     0 |   0
16-05 08:36:18| 27  55   8   0   0  11|  67M  754k|   0    99M|79.6
16-05 08:36:19| 29  41  16   5   0  10| 113M 2079k|4096B   63M|59.6
16-05 08:36:20| 28  48  12   4   0   8|  58M  397k|   0    95M|76.0
16-05 08:36:21| 38  37  14   1   0  10| 114M 2620k|4096B   52M|23.2
16-05 08:36:22| 37  54   0   1   0   8|  76M 1506k|8192B   76M|33.6

So if you want to graph one of the columns, it’s useful to convert all the numbers in the same unit. Bytes would be nice in this case.

Separating all columns with ‘|’ characters is a good start, so you can use e.g. a CSV-capable graphing tool, or even simple awk scripts to extract a specific column. ‘C-x r t’ can do that in Emacs, and you end up with something like this:

|     time     |cpu|cpu|cpu|cpu|cpu|cpu|eth0 |eth0 | disk| disk|sda-|
|     time     |usr|sys|idl|wai|hiq|siq| recv| send| read| writ|util|
|16-05 08:36:15|  2|  3| 96|  0|  0|  0|  66B| 178B|   0 |   0 |   0|
|16-05 08:36:16| 42| 14| 37|  0|  0|  7|  92M|1268k|   0 |   0 |   0|
|16-05 08:36:17| 45| 11| 36|  0|  0|  7|  76M|1135k|   0 |   0 |   0|
|16-05 08:36:18| 27| 55|  8|  0|  0| 11|  67M| 754k|   0 |  99M|79.6|
|16-05 08:36:19| 29| 41| 16|  5|  0| 10| 113M|2079k|4096B|  63M|59.6|
|16-05 08:36:20| 28| 48| 12|  4|  0|  8|  58M| 397k|   0 |  95M|76.0|
|16-05 08:36:21| 38| 37| 14|  1|  0| 10| 114M|2620k|4096B|  52M|23.2|
|16-05 08:36:22| 37| 54|  0|  1|  0|  8|  76M|1506k|8192B|  76M|33.6|

The leading and trailing ‘|’ characters are there so we can later use orgtbl-mode, an awesome table editing and realignment tool of Emacs. Now to the really magical step: regular expressions and lisp working together.

What we would like to do is convert text like “408B” to just “408”, text like “1268k” to the value of (1268 * 1024), and finally text like “67M” to the value of (67 * 1024 * 1024). The first part is easy:

M-x replace-regexp RET \([0-9]+\)B RET \1 RET

This should just strip the “B” suffix from byte values.

For the kilobyte and megabyte values what we would like is to be able to evaluate an arithmetic expression that involves \1. Something like “replace \1 with the value of (expression \1)“. This is possible in Emacs by prefixing the substitution pattern with \,. This instructs Emacs to evaluate the rest of the substitution pattern as a Lisp expression, and use its string representation as the “real” substitution text.

So if we match all numeric values that are suffixed by ‘k’, we can use (string-to-number \1) to convert the matching digits to an integer, multiply by 1024 and insert the resulting value by using the following substitution pattern:

\,(* 1024 (string-to-number \1))

The full Emacs command would then become:

M-x replace-regexp RET \([0-9]+\)k RET \,(* 1024 (string-to-number \1)) RET

This, and the byte suffix removal, yield now the following text in our Emacs buffer:

|     time     |cpu|cpu|cpu|cpu|cpu|cpu|eth0 |eth0 | disk| disk|sda-|
|     time     |usr|sys|idl|wai|hiq|siq| recv| send| read| writ|util|
|16-05 08:36:15|  2|  3| 96|  0|  0|  0|  66| 178|   0 |   0 |   0|
|16-05 08:36:16| 42| 14| 37|  0|  0|  7|  92M|1298432|   0 |   0 |   0|
|16-05 08:36:17| 45| 11| 36|  0|  0|  7|  76M|1162240|   0 |   0 |   0|
|16-05 08:36:18| 27| 55|  8|  0|  0| 11|  67M| 772096|   0 |  99M|79.6|
|16-05 08:36:19| 29| 41| 16|  5|  0| 10| 113M|2128896|4096|  63M|59.6|
|16-05 08:36:20| 28| 48| 12|  4|  0|  8|  58M| 406528|   0 |  95M|76.0|
|16-05 08:36:21| 38| 37| 14|  1|  0| 10| 114M|2682880|4096|  52M|23.2|
|16-05 08:36:22| 37| 54|  0|  1|  0|  8|  76M|1542144|8192|  76M|33.6|

Note: Some of the columns are indeed not aligned very well. We’ll fix that later. On to the megabyte conversion:

M-x replace-regexp RET \([0-9]+\)M RET \,(* 1024 1024 (string-to-number \1)) RET

Which produces a version that has no suffixes at all:

|     time     |cpu|cpu|cpu|cpu|cpu|cpu|eth0 |eth0 | disk| disk|sda-|
|     time     |usr|sys|idl|wai|hiq|siq| recv| send| read| writ|util|
|16-05 08:36:15|  2|  3| 96|  0|  0|  0|  66| 178|   0 |   0 |   0|
|16-05 08:36:16| 42| 14| 37|  0|  0|  7|  96468992|1298432|   0 |   0 |   0|
|16-05 08:36:17| 45| 11| 36|  0|  0|  7|  79691776|1162240|   0 |   0 |   0|
|16-05 08:36:18| 27| 55|  8|  0|  0| 11|  70254592| 772096|   0 |  103809024|79.6|
|16-05 08:36:19| 29| 41| 16|  5|  0| 10| 118489088|2128896|4096|  66060288|59.6|
|16-05 08:36:20| 28| 48| 12|  4|  0|  8|  60817408| 406528|   0 |  99614720|76.0|
|16-05 08:36:21| 38| 37| 14|  1|  0| 10| 119537664|2682880|4096|  54525952|23.2|
|16-05 08:36:22| 37| 54|  0|  1|  0|  8|  79691776|1542144|8192|  79691776|33.6|

Finally, to align everything in neat, pipe-separated columns, we enable M-x orgtbl-mode, and type “C-c C-c” with the pointer somewhere inside the transformed dstat output. The buffer now becomes something usable for pretty-much any graphing tool out there:

| time           | cpu | cpu | cpu | cpu | cpu | cpu |      eth0 |    eth0 |  disk |      disk | sda- |
| time           | usr | sys | idl | wai | hiq | siq |      recv |    send |  read |      writ | util |
| 16-05 08:36:15 |   2 |   3 |  96 |   0 |   0 |   0 |        66 |     178 |     0 |         0 |    0 |
| 16-05 08:36:16 |  42 |  14 |  37 |   0 |   0 |   7 |  96468992 | 1298432 |     0 |         0 |    0 |
| 16-05 08:36:17 |  45 |  11 |  36 |   0 |   0 |   7 |  79691776 | 1162240 |     0 |         0 |    0 |
| 16-05 08:36:18 |  27 |  55 |   8 |   0 |   0 |  11 |  70254592 |  772096 |     0 | 103809024 | 79.6 |
| 16-05 08:36:19 |  29 |  41 |  16 |   5 |   0 |  10 | 118489088 | 2128896 |  4096 |  66060288 | 59.6 |
| 16-05 08:36:20 |  28 |  48 |  12 |   4 |   0 |   8 |  60817408 |  406528 |     0 |  99614720 | 76.0 |
| 16-05 08:36:21 |  38 |  37 |  14 |   1 |   0 |  10 | 119537664 | 2682880 |  4096 |  54525952 | 23.2 |
| 16-05 08:36:22 |  37 |  54 |   0 |   1 |   0 |   8 |  79691776 | 1542144 |  8192 |  79691776 | 33.6 |

The trick of combining arbitrary Lisp expressions with regexp substitution patterns like \1, \2\9 is something I have found immensely useful in Emacs. Now that you know how it works, I hope you can find even more amusing use-cases for it.

Update: The Emacs manual has a few more useful examples of \, in action, as pointed out by tunixman on Twitter.

Fixing Shifted-Arrow Keys in 256-Color Terminals on Linux

The terminfo entry for “xterm-256color” that ships by default as part of ncurses-base on Debian Linux and its derivatives is a bit annoying. In particular, shifted up-arrow key presses work fine in some programs, but fail in others. It’s a bit of a gamble if Shift-Up works in joe, pico, vim, emacs, mutt, slrn, or what have you.

THis afternoon I got bored enough of losing my selected region in Emacs, because I forgot that I was typing in a terminal launched by a Linux desktop. SO I thought “what the heck… let’s give the FreeBSD termcap entry for xterm-256color a try”:

keramida> scp bsd:/etc/termcap /tmp/termcap-bsd
keramida> captoinfo -e $(                                  \
  echo $( grep '^xterm' termcap | sed -e 's/[:|].*//' ) |  \
  sed -e 's/ /,/g'                                         \
  ) /tmp/termcap  > /tmp/terminfo.src
keramida> tic /tmp/terminfo.src

Restarted my terminal, and quite unsurprisingly, the problem of Shift-Up keys was gone.

The broken xterm-256color terminfo entry from /lib/terminfo/x/xterm-256color is now shadowed by ~/.terminfo/x/xterm-256color, and I can happily keep typing without having to worry about losing mental state because of this annoying little misfeature of Linux terminfo entries.

The official terminfo database sources[1], also work fine. So now I think some extra digging is required to see what ncurses-base ships with. There’s definitely something broken in the terminfo entry of ncurses-base, but it will be nice to know which terminal capabilities the Linux package botched.

Notes:
[1] http://invisible-island.net/ncurses/ncurses.faq.html#which_terminfo

Mutt-like Scrolling for Gnus

Mutt scrolls the index of email folders up or down, one line at a time, with the press of a single key: ‘<‘ or ‘>’. This is a very convenient way to skim through email folder listings, so I wrote a small bit of Emacs Lisp to do the same in Gnus tonight.

;;;
;; Scrolling like mutt for group, summary, and article buffers.
;;
;; Being able to scroll the current buffer view by one line with a
;; single key, rather than having to guess a random number and recenter
;; with `C-u NUM C-l' is _very_ convenient.  Mutt binds scrolling by one
;; line to '<' and '>', and it's something I often miss when working
;; with Gnus buffers.  Thanks to the practically infinite customizability
;; of Gnus, this doesn't have to be an annoyance anymore.

(defun keramida-mutt-like-scrolling ()
  "Set up '<' and '>' keys to scroll down/up one line, like mutt."
  ;; mutt-like scrolling of summary buffers with '<' and '>' keys.
  (local-set-key (kbd ">") 'scroll-up-line)
  (local-set-key (kbd "<") 'scroll-down-line))

(add-hook 'gnus-group-mode-hook 'keramida-mutt-like-scrolling)
(add-hook 'gnus-summary-mode-hook 'keramida-mutt-like-scrolling)
(add-hook 'gnus-article-prepare-hook 'keramida-mutt-like-scrolling)

This is now the latest addition to my ~/.gnus startup code, and we’re one step closer to making Gnus behave like my favorite old-time mailer.

Concatenating Video Files with MPlayer

I found out how to concatenate the parts of a video to a single file with MPlayer. It’s relatively easy, so this is just a mini-post to save it for posterity:

$ cat video_part1.avi video_part2.avi ... > temp.avi
$ mencoder -forceidx -oac copy -ovc copy \
    temp.avi -o final.avi && \
    rm temp.avi

Unit Testing Uncovers Bugs

As part of the ‘utility’ library in one of the projects we are using at work, I wrote two small wrappers around strtol() and strtoul(). These two functions support a much more useful error reporting mechanism than the plain atoi() and atol() functions, but getting the error checking right in all the places they are called is a bit boring and cumbersome. This is probably part of the reason why there are still programs out there that use atoi() and atol(). Continue reading

Mercurial Clones without a Working Copy

Mercurial repository clones can have two parts:

  1. An .hg/ subdirectory, where all the repository metadata is stored
  2. A “working copy” area, where checked out files may live

The .hg/ subdirectory stores the repository metadata of the specific clone, including the history of all changesets stored in the specific clone, clone-specific hooks and scripts, information about local tags and bookmarks, and so on. This is the only part of a Mercurial repository that is actually mandatory for a functional repository. Continue reading

FOSDEM 2010

Earlier tonight, on December 7 2009, a friend and me booked our flight tickets for FOSDEM 2010. I am really excited that I am going to attend another open source & free software conference. It has been a while since I had a chance to meet with other BSD people. The last time was in Milan, in EuroBSDCon 2006. It will certainly be tons of fun to meet in person with other free and open source fans, contributors and developers!

About FOSDEM

FOSDEM is an open conference, organized every year by volunteers to promote the widespread use of Free Software and Open Source Software. It takes place in the beautiful city of Brussels (Belgium). FOSDEM meetings are recognized as “The best Free Software and Open Source events in Europe”. Continue reading

Using the Condensed DejaVu Font-Family Variant by Default

The DejaVu font family is a very popular font collection for Linux and BSD systems. The font package of DejaVu includes a condensed variant; a variation of the same basic font theme that sports narrower characters. Continue reading

fts(3) or Avoiding to Reinvent the Wheel

One of the C programs I was working on this weekend had to find all files that satisfy a certain predicate and add them to a list of “pending work”. The first thing that comes to mind is probably a custom opendir(), readdir(), closedir() hack. This is probably ok when one only has these structures, but it also a bit cumbersome. There are various sorts of DIR and dirent structures and recursing down a large path requires manually keeping track of a lot of state. Continue reading