Michael McClimon

RSS

Goofy Program Files: fold-headers

August 2, 2023

My friend Chasen sent me a message tonight that he would also like to see more of the goofy program files, and who am I to say no?

Today’s entry comes via my last job, where I spent an awful lot of time looking at email. You probably know what email is, but maybe you don’t know what it actually looks like underneath.1 The part of the emails you actually read are the content; email also has a large set of headers that contain metadata about the message and its path to you; they’re very similar to HTTP headers.

Your mail client will definitely show you some of these headers: From, To, Subject, and Date are all headers. (CC is also a header, but BCC is not a header, even though it looks like one; email is weird.) There’s a lot more your mail client probably hides from you, because most people don’t ever need to care about them. They contain things like authentication information (DKIM-Signature, Authentication-Results headers), the path it took to get to your inbox (Received headers), information about the structure of the message (Content-Type, Content-Transfer-Encoding), and lots of other stuff (proprietary anti-spam headers, unsubscribe headers, maybe an Autocrypt header if you get emails from people with dubious opinions about encrypted email).

In my last job, we very often needed to see the raw headers from users who were having problems for whatever reason; because there’s so much information there, the full headers are a really useful debugging tool, if you know how to read them. The problem is that if you just paste the full headers into a text field, you lose all the formatting, making them roughly impossible to read.

This means we’d regularly get headers that look like this:2

Delivered-To: me@example.com
Received: by 2002:a05:7108:628f:b0:31f:8a9e:3cf3 with SMTP id l15csp768970gdq; Wed, 2 Aug 2023 01:28:27 -0700 (PDT) X-Google-Smtp-Source: APBJJlFPo9WVsfCEDg9JSwp22b4N2KkT9xpvL/7Mw/UGn13rGy3ofTkqkL5Sr8KgYMFerlwfEeCU
X-Received: by 2002:a17:902:d303:b0:1b8:95a1:847c with SMTP id b3-20020a170902d30300b001b895a1847cmr15716429plc.40.1690964907040; Wed, 02 Aug 2023 01:28:27 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1690964907; cv=none; d=google.com; s=arc-20160816; b=aL63zEpTMIZI4PKsrkozrgwJVCJVGTYxi06NxiU1m9K6TSIa1xFXfIeoNfY046wef7 R53P1ESzBjMMTxoMgrwFfeTMFdPY6yihg20cguHb5EpL7NTTTT/UGI3ykIMf6+7ssF6B pKiw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=feedback-id:list-unsubscribe:message-id:date:subject:cc:to:from :mime-version:dkim-signature:dkim-signature; bh=+pmocgT3Uz69wC0izq4zWpavLD9Wlxr9s4p4RaMhhRI=; fh=D9GT/t9o2pWT+j0wI/al/jbbeAiaiysa/vCNUoNlIuA=; eBGA==
ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dropbox.com header.s=hj554x4v55d33qgv7p7y4b5spdsgt6wl header.b=jVY63uHt; dkim=pass header.i=@amazonses.com header.s=hsbnp7p3ensaochzwyq5wwmceodymuwv header.b=ixBC+R6W; spf=pass smtp.mailfrom=01010189bc15-46de-a633-f1c7fd3231fa-000000@email.dropbox.com; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=dropbox.com
Received: from a60-122.smtp-out.us-west-2.amazonses.com (a60-122.smtp-out.us-west-2.amazonses.com. [54.240.60.122]) by mx.google.com with ESMTPS id h2-20020a170902748200b001b9eb349550si10567839pll.391.2023.08.02.01.28.26 for <me@example.com> (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 02 Aug 2023 01:28:27 -0700 (PDT)

This is, maybe obviously, roughly impossible to read. The way headers are supposed to be formatted is the header name, followed by a colon, followed by the value. If the value spans multiple lines, all subsequent lines must be indented. This makes it much easier to scan for the thing you’re actually looking for.

But this is all fine; I know how to program! Enter today’s goofy program, fold-headers:3

I wrote this program originally in Perl, and then ported it to Go, and then back into Python. (This sounds more exciting than it was; each port took 5 minutes, probably.) Like all goofy programs, it is very straightforward.

We start by assuming we’re in the headers (they come first in the raw email). There’s a regex for a header name, which is just: a series of letters or numbers or hyphens, followed by a colon, followed by a space.

Then, we loop through the lines. If we see a blank line, we’re no longer in the headers and we just print the line as is. If we see a line that looks like the start of a header, we print it; otherwise, we print 4 spaces and then the line.

This program turns the garbage above into this:

Delivered-To: me@example.com
Received: by 2002:a05:7108:628f:b0:31f:8a9e:3cf3 with SMTP id l15csp768970gdq;
    Wed, 2 Aug 2023 01:28:27 -0700 (PDT)
X-Google-Smtp-Source: APBJJlFPo9WVsfCEDg9JSwp22b4N2KkT9xpvL/7Mw/UGn13rGy3ofTkqkL5Sr8KgYMFerlwfEeCU
X-Received: by 2002:a17:902:d303:b0:1b8:95a1:847c with SMTP id b3-20020a170902d30300b001b895a1847cmr15716429plc.40.1690964907040;
    Wed, 02 Aug 2023 01:28:27 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1690964907; cv=none;
    d=google.com; s=arc-20160816;
    b=aL63zEpTMIZI4PKsrkozrgwJVCJVGTYxi06NxiU1m9K6TSIa1xFXfIeoNfY046wef7
    R53P1ESzBjMMTxoMgrwFfeTMFdPY6yihg20cguHb5EpL7NTTTT/UGI3ykIMf6+7ssF6B
    pKiw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816;
    h=feedback-id:list-unsubscribe:message-id:date:subject:cc:to:from
    :mime-version:dkim-signature:dkim-signature;
    bh=+pmocgT3Uz69wC0izq4zWpavLD9Wlxr9s4p4RaMhhRI=;
    fh=D9GT/t9o2pWT+j0wI/al/jbbeAiaiysa/vCNUoNlIuA=;
    eBGA==
ARC-Authentication-Results: i=1; mx.google.com;
    dkim=pass header.i=@dropbox.com header.s=hj554x4v55d33qgv7p7y4b5spdsgt6wl header.b=jVY63uHt;
    dkim=pass header.i=@amazonses.com header.s=hsbnp7p3ensaochzwyq5wwmceodymuwv header.b=ixBC+R6W;
    spf=pass smtp.mailfrom=01010189b55ccf28-36d1bd9a-dc15-46de-a633-f1c7fd3231fa-000000@email.dropbox.com;
    dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=dropbox.com
Received: from a60-122.smtp-out.us-west-2.amazonses.com (a60-122.smtp-out.us-west-2.amazonses.com. [54.240.60.122])
    by mx.google.com with ESMTPS id h2-20020a170902748200b001b9eb349550si10567839pll.391.2023.08.02.01.28.26
    for <me@example.com>
    (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
    Wed, 02 Aug 2023 01:28:27 -0700 (PDT)

…which is actually readable!

What a good program would do is parse the message using some email parser and print out just the headers using the email library in whatever language you’re using. This isn’t a good program, though, it’s a goofy program, and so it does this instead, which was good enough for 100% of the cases I needed it for.4


  1. To be perfectly honest, if you wound up on this page, you probably know me in real life, in which case the odds are pretty high you do know what it looks like underneath. That doesn’t make for an interesting read though, so instead you can just continue to marvel at the fact that I manage to overuse footnotes in every one of these posts. ↩︎

  2. These are headers from an email I got telling me I hadn’t used Dropbox in a while and wouldn’t I please start again. I have mangled them sufficiently for space and privacy, so although you could try some funny business, it’s unlikely to turn up anything useful. ↩︎

  3. I said in the inaugural post that I really think of these programs as “stupid programs” rather than “goofy programs”: for full disclosure, this program actually exists on my machine as ~/bin/shitty-fold-headers↩︎

  4. The reason this isn’t a good program is also practical: in most of the cases I wanted to use this program, the email would be mangled in some other way as well and the email parser probably couldn’t parse it to begin with, and a program that exited every time saying “uh, this email is garbage and I can’t parse it” would be totally useless. ↩︎

Goofy Program Files: jlink

July 14, 2023

After my last (aka, the inaugural) goofy program entry, I was overwhelmed with feedback1 that people wanted more, and who am I to deny them?

Today’s entry is a program I call jlink (for “Jira link”).2 We use Jira at work for task tracking, and the ticket identifiers (like “ABC-1234”) wind up littering everything we do. One day at work I was writing a document that had a long listing of Jira tickets, and I wanted to add links to them. I was doing that by: a) opening the Jira page for the ticket; b) โŒ˜L to select the address bar; c) โŒ˜C to copy; d) changing back to the tab I was writing in; e) โŒ˜K to open the add-link dialog; f) โŒ˜V to paste.

After doing this a bunch of times, I thought to myself “there’s got to be a better way!” (Incidentally, this runs through my head every time in David Cross’s voice, from his timeless bit about electric scissors.) Then, I spent 10 minutes (probably; maybe less) writing this program:

This program is very simple (recall that I really think of these as “stupid programs”). There is some basic argument parsing, and then it takes the first command-line argument you give it, turns it into uppercase, makes an appropriate URL, and causes it to wind up on my clipboard.3

Here are some things I like about this program:

  1. Python’s default option parser is nice. It’s not my favorite option parser in the world (that’s a tossup between clap and Getopt::Long::Descriptive, depending on what kind of mood I’m in), but it’s easy enough to write, provides --help without needing to do anything, and handles dealing with positional parameters in ARGV just fine.4

  2. Python f-strings are nice; one of the reasons I didn’t like Python many years ago is that interpolating variables in strings was super weird. As a Perl programmer I realize I have no basis for complaining about weird syntax, but "Hello %s" % "world" still feels so strange to me. "Hello {}".format("world") is better I guess, but f-strings are actually good and intuitive!

  3. The subprocess module makes a bunch of things easy might otherwise be tedious. You can do this correctly in Perl using only the built-in stuff (maybe using open, for crying out loud!), but more often I’d reach for IPC::Run3 or Capture::Tiny, plus Process::Status. The subprocess docs in Python are often confusing to me (actually, I find most of the Python docs to be sort of hard to navigate), but it’s trivial to just run a program and grab its output, having the program fail if something goes wrong.

  4. This program is written for me alone. I don’t need to think about whether this program works on Linux (it doesn’t), because I don’t run Linux. If pbcopy fails, or I get an ugly exception; that’s fine, I don’t care. If run jlink foo it will happily copy a link to the bogus url ending in /FOO; that’s fine too. If I were writing a non-goofy program, I’d make sure the exceptions were friendly, and do a bunch of error-checking up front to make sure the user wasn’t surprised. As is, the program is for me; if it breaks, I won’t be surprised, because it’s a stupid program!

This last thing is, I think, the best thing about goofy programs. I do care a lot about code quality, and software as craft, and maintainability, and all that other stuff. But at the end of the day, computers are just tools to make humans’ lives a little easier, and sometimes the only human I need to satisy is myself. There’s a time and a place for everything, after all, and I like the place these goofy programs occupy in my life.


  1. Three people said they’d read another similar, which is very close to 100% of the readership, as far as I can tell. ↩︎

  2. I can call it jlink because I do not write Java, and thus never have a need to use the Java linker of the same name. ↩︎

  3. Lest anyone question my opsec: a lot of Mongo’s Jira instance is actually public, though the project I work on most often is not. I had a small part in finding the bug in GODRIVER-2773, for example, though the actual issue I filed (GODRIVER-2768, mentioned there) is private because it links to a bunch of private code. ↩︎

  4. And uses double-dashes for long options. Sorry (not sorry) Rob Pike, I will be carrying this to my grave. ↩︎

Goofy Program Files: git-slog

July 4, 2023

Most of the programs I write are what I generally think of as stupid programs. That is: they do one tiny thing that was annoying me for whatever reason, and usually the thing wasn’t even all that interesting to begin with. A lot of the code I write, especially off the clock, falls into this category, and I feel like this is a category of program that doesn’t get talked about much, so I’m going to start talking about them.

“Stupid” isn’t quite the right word, because it implies that the programs are bad, which they’re not. I think maybe “goofy” is better: the New Oxford American dictionary (read: Dictionary.app on macOS) says that goofy can mean “harmlessly eccentric,” which also describes the way I see them. So let’s go with that: the Goofy Program Files.

I’m on record as saying that git is my favorite software. Its interface can be pretty obtuse, but the guts are extremely well designed, you can basically always do what you want to do, and it’s very easy to extend it in the cases where it can’t. During the first part of COVID quarantines, I wrote my own little git implementation in Rust; it’s called pidgit, and it totally works.1 Because of this, I’ve developed a (probably well-deserved) reputation as a git nerd, and happy to help with any and all git questions or problems.

My friend Rob recently started working full-time on ZFS. This means he often mutters about locks and data structures and pastes totally inscrutable walls of C at me that I don’t really understand, but more relevant to this post, needs to sign off all his commits. Recently he asked me if there was some way of getting git log to show some sort of indicator if a commit has a Signed-off-by trailer or not.2

If you are not aware: when you commit, you can pass the -s (or --signoff) option, which adds a line at the bottom of the commit message (a.k.a., a “trailer”) like this:

Signed-off-by: Your Name <yourname@example.com>

You probably wouldn’t have run across it if you just use git personally or for your job, but it’s pretty common to require it in open-source projects. It generally means that you’ve agreed to a DCO meaning you’re assigning your rights to the commit to the project, or similar. Rob only signs commits when he’s done with them; stuff still in progress doesn’t get signed, because it’s not ready to be published yet.

All of this is an extremely long prelude to introducing this goofy program, jeez, let’s just get to it. Here it is:

It’s in Perl, because it was contract work, and because I know Rob has a working Perl install and I do not know (and didn’t bother asking) if he has a working Python install (given that I’ve been writing a lot of Python recently). It’s very simple (one might also say “stupid”): it opens a pipe to git log with a funny format, then loops through the lines it prints, munging them ever so slightly as it goes.

The only interesting part is this somewhat inscrutable line:

--pretty=tformat:%(trailers:key=Signed-off-by,keyonly=yes,separator=+) %C(auto)%h%d %s

Breaking it down a bit, from the git log docs:

  • --pretty= just introduces the format string.
  • tformat: provides “terminator semantics”, which means that the last line is formatted like all the others. (I usually use this in programs, so that you don’t have to special-case anything.)
  • The %(trailers...) bit is the important part:
    • key=Signed-off-by prints the Signed-off-by trailer
    • keyonly=yes prints only the key, not the value (the name/email)
    • separator=+ means that if a commit has multiple Signed-off-by trailers, they’ll be joined together with a plus sign. I needed this because the default is a newline, and I wanted everything to be on a single line.
  • %C(auto) turns on git’s normal coloring for the rest of the line
  • %h%d %s is the short hash, the decoration (branch name, basically), and short subject for the commit. This is the same thing you get from git log --oneline.

The body of the while loop just reads each of these lines, checks for the signoff magic (the %(trailers) format is empty if there isn’t a signoff), and turns it into a single character. I used a check mark and upside down question mark, though I assume Rob will change them to something less exciting when he gets the program.

You can dump this program in your PATH as (say) git-slog, and then if you just run git slog it will Just Work, as if it were built into git itself.

Screenshot of the output of git slog: it's four lines of one-line git log output; three lines have check marks in front, and one has an upside down question mark

I will mention one other thing I usually put in my git-log wrapper programs, which is: including @ARGV in the pipe invocation. This makes this program way more useful, because it will just accept all the arguments to git log. You can call git slog --since '3 days ago' --author michael main.. and it will transparently pass those along to git log and do what you want.

It’s a goofy program, and it took me way longer to write this goofy post about it than it took me to write the program. But at least it’s harmlessly eccentric.


  1. I used James Coglan’s excellent Building Git book for this; his examples are in Ruby, but I was following along in Rust. I strongly recommend it, if you’re interested in that sort of thing! ↩︎

  2. Rob would disagree with “he asked me,” to be fair. It was more like he mentioned that such a thing would be nice in a place where he knew I would see it, but that’s basically equivalent to asking me to write it for him. ↩︎

I've been writing a bunch of Python lately

June 24, 2023

In February, I left Fastmail to join the team at MongoDB.1 Among other things, this means that after nearly six years, I’m no longer paid to write Perl. This is bittersweet: Perl was the first programming language I ever loved; it’s responsible for my leap from academia and accompanying move to Philadelphia, and indirectly responsible for some dear friendships. As a language, though, I think it has no future, and as such I’ve been trying to write my own little programs in other languages.

My job at Mongo is primarily writing Go (though there’s little bits of Python and JavaScript here and there). For a while, I was rewriting my little Perl tools in Go (you can even look at them on GitHub if you want), but mostly as a learning exercise to get more fluent in the language I was to be writing full-time at work. More recently, I’ve stopped doing that, because my feelings toward Go are… let’s say complicated. It’s a fine language; there are a lot of things to like about it, but I do not find writing it particularly fun, which is a nice-to-have when writing code off-the-clock.

What I’ve been doing instead (as you could probably guess from the title of this post) is writing a lot of Python. I fully realize that “Area man switches from Perl to Python” is a headline straight from 2009, but hey, I’ve always been a little behind the times.

And besides, Python is like, way nicer than it was last time I looked at it seriously (which would have been around 2012, if memory serves, in the midst of the pretty rocky 2-to-3 transition).2 Python has steadily gotten better over the last decade or so, and is really nice to use.

Some things I really like about writing Python:

  • It has a type system now! I think Python’s approach to adding types to the language is really pragmatic. It’s all gradually typed, but the interpreter itself doesn’t read the types, and instead leaves that to third-party tools. If you don’t want to use the type system, you don’t have to, but when you do, it just works. This is way nicer than my experience with (say) Typescript, because there’s no transpilation or anything.
  • The standard library is really good. One of the tedious things about writing personal things in Perl is that anything useful (niche things like JSON or HTTPS, for crying out loud) needs to be downloaded from CPAN, which means you need a working toolchain, and so on. Though PyPI is great, you can get a long way with just the stdlib, which makes it really nice for silly little command-line things I want for personal use.
  • The standard option parser, argparse, parses options like everyone on the planet expects options to be parsed. (Yes, this bullet is just an excuse to complain about Go.) It drives me batty that Go’s standard flag package uses single dashes for long options, which means that my-cool-program -qv does not work the way you expect. This means that I add a third-party flag library (usually pflag) to literally every Go tool I write, because I simply cannot even. I mean honestly.
  • Python programs are basically the same as their Perl equivalents. There are definitely things I miss from Perl (non-weird ternary operators, postfix conditionals, unless, etc.), but it’s easy for me to figure out how to write a thing in Python because it’s usually pretty similar to how I’d do it in Perl.
  • Python has a built-in set type. This is such a minor thing, but I use it all the time, and it’s way nicer than my %set = map {; $_ => 1 } @list.

I did write a new Perl program once in the last several weeks, because I needed it rightnow and didn’t want to think. This week I did the release of mongosync v1.4, which meant I needed to push directly to the upstream repository (which I usually do not do), and I didn’t want to push some random guff there.

This, then, is the pre-push git hook I whipped up in 5 minutes:

#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';

# See man githooks for this format.
my ($remote_name, $url) = @ARGV;
exit 0 unless $remote_name eq 'gitbox';  # my preferred upstream name

my @forbidden = qw(
  main
  master
  release-1.x
);

my $re = join q{|}, map {; quotemeta } @forbidden;

for (<STDIN>) {
  my ($local_ref, $local_sha, $remote_ref, $remote_sha) = split;

  if ($remote_ref =~ m!refs/heads/($re)!) {
    say "Probably you don't want to push to $remote_name/$1.";
    say "If you really do, pass --no-verify.";
    exit 1;
  }
}

The program is very simple: it exits 0 unless I’m trying to push to the upstream (which tells git to permit the push). If I am trying to push to the upstream, it forbids the push if I’m trying to push to one of the important branches.

Here’s the same program, which I rewrote in Python later. (The program has since gotten a bit more complicated after I accidentally pushed some random tags to the upstream, ๐Ÿคฆ๐Ÿปโ€โ™‚๏ธ.)

#!/usr/bin/env python
import sys

remote_name, url = sys.argv[1:3]
if remote_name != 'gitbox':
    sys.exit(0)

forbidden = {
    'main',
    'master',
    'release-1.x',
}

for line in sys.stdin:
    local_ref, local_sha, remote_ref, remote_sha = line.split()
    short_ref = remote_ref.removeprefix("refs/heads")

    if short_ref in forbidden:
        print(f"Probably you don't want to push to {remote_name}/{short_ref}.")
        print("If you really do, pass --no-verify.")
        sys.exit(1)

Using a set is obviously nicer than building a regex. (I could have done the same in Perl, but see also I wrote the program in 5 minutes, and regex can solve all problems for Perl programmers.) But the program is basically equivalent, and I could have written it equally as fast if I didn’t need to look up how to use sys.stdin.

Anyway: Python is pretty good. I’ve been rewriting a bunch of my personal tools in it, and I’ve been liking it. There’s some weird stuff about it (why are there so many build tools?), and if I keep up I’ll inevitably find some stuff to complain about, but for now I’ll just enjoy writing in a language where people don’t say “I didn’t know anybody still used that.”


  1. If I actually wrote in this blog regularly, I’d have written about that, but I don’t, so I didn’t, and am unlikely to do so. I still remember how to overuse footnotes, though. ↩︎

  2. My last big project at Fastmail, incidentally, was deciding what language would replace Perl there in the long term. I wrote a bunch of example software in TypeScript, Rust, Go, and Python, and ended up recommending Python. This was a bit of a surprise to the team (and to me), because I also had a reputation as a big fan of Rust, which was true then and remains true now. ↩︎

Writing Text with Flag Emojis

March 27, 2022

In our Slack at work, we have an #availability channel, where people can post their comings and goings. In the middle of the day, there is often a stack of posts that say simply “lunch” or “back.” Friday, my friend Joe, rather than writing “back”, did this:

Slack screenshot of the word “back” and ๐Ÿ‡ง๐Ÿ‡ฆ๐Ÿ‡จ๐Ÿ‡ฐ

That is: instead of the word “back” (as I posted), Joe posted “๐Ÿ‡ง๐Ÿ‡ฆ๐Ÿ‡จ๐Ÿ‡ฐ”. If you’re not super into vexillology, those are the flags for Bosnia and Herzegovina (left) and the Cook Islands (right). This makes roughly no sense, unless you happen to know that the ISO 3166-1 alpha-2 country codes use BA for Bosnia and Herzegovina and CK for the Cook Islands.1 Later that afternoon, Joe (who normally writes in a more civilized programming language) told me he was working on a secret Perl project, which I guessed more or less immediately.

In other words, this is all Joe’s fault. He decided not to finish his Perl project and instead nerdsniped me into writing the program, and also convinced me I should write this post. That’s because now, you can ask our Slack bot to speak to you in flags:

Slack screenshot of the text 'I am a normal person with normal hobbies,' with many of the letters replaced with flag emojis

A digression into Unicode

To understand this, we first need to understand how the flag emojis work. Most emojis are a single Unicode code point: ๐ŸŒฒ (\N{EVERGREEN TREE}, my favorite emoji) is code point U+1F332, for example. Some other emojis are represented by multiple codepoints and a combining Zero-Width Joiner (ZWJ). The emoji ๐Ÿ‘๐Ÿป, for example, is three code points: ๐Ÿ‘ (U+1F44D, \N{THUMBS UP SIGN}), a ZWJ, and ๐Ÿป (U+1F3FB, \N{EMOJI MODIFIER FITZPATRICK TYPE-1-2}).

This is not how the flag emojis work. Instead, they use flag sequences. There are 26 code points, with names like \N{REGIONAL INDICATOR SYMBOL LETTER A} (U+1F1E6). By themselves, these don’t look like much; on macOS, I see them as capital letters surrounded by a box, like this: ๐Ÿ‡ฆ. But when you put two of them together, and they form a valid two-letter country code, you get a flag emoji! That is, if you put ๐Ÿ‡ฆ (regional indicator A) right next to ๐Ÿ‡บ (regional indicator U), you get ๐Ÿ‡ฆ๐Ÿ‡บ, the flag for Australia.

Back to the Slack bot

Now that we know how the flag emojis are made, it’s approaching trivial to write a program to do the transliteration for us. For any given string, we just need to check every adjacent pair of letters to see if it’s a valid country code. You can read the whole commit if you want, but the core of it is very straightforward:

sub to_flags ($s) {
  require Locale::Codes;

  my %char_for = qw(
    a ๐Ÿ‡ฆ   b ๐Ÿ‡ง   c ๐Ÿ‡จ   d ๐Ÿ‡ฉ   e ๐Ÿ‡ช   f ๐Ÿ‡ซ   g ๐Ÿ‡ฌ   h ๐Ÿ‡ญ   i ๐Ÿ‡ฎ
    j ๐Ÿ‡ฏ   k ๐Ÿ‡ฐ   l ๐Ÿ‡ฑ   m ๐Ÿ‡ฒ   n ๐Ÿ‡ณ   o ๐Ÿ‡ด   p ๐Ÿ‡ต   q ๐Ÿ‡ถ   r ๐Ÿ‡ท
    s ๐Ÿ‡ธ   t ๐Ÿ‡น   u ๐Ÿ‡บ   v ๐Ÿ‡ป   w ๐Ÿ‡ผ   x ๐Ÿ‡ฝ   y ๐Ÿ‡พ   z ๐Ÿ‡ฟ
  );

  my $lc = Locale::Codes->new('country');
  my %is_country = map {; $_ => 1 } $lc->all_codes('alpha-2');

  my $out = '';

  for (my $i = 0; $i < (length $s) - 1; $i++) {
    my $digraph = lc substr $s, $i, 2;

    if ($is_country{$digraph}) {
      $out .= $char_for{$_} for split //, $digraph;
      $i++; # no double-counting
    } else {
      $out .= substr $s, $i, 1;
    }

    # make sure we don't drop the last char the last char if we need to
    $out .= substr $s, -1, 1 if $i == (length $s) - 2;
  }

  return $out;
};

This is Perl, but doing it in another language is equally trivial. First, we make a map of ASCII letters to regional indicators2, and load up a list of valid countries (I used Locale::Codes here just to avoid having to write out the list myself). Then, for every pair of letters in the source string $s, we check if it’s a valid country. If it is, we add the two relevant regional indicators (i.e., the flag emoji) to the output string, and if not we add the character itself. (Aside: this would be much less tedious in a language with subscriptable strings.)

This means we can translate any arbitrary string to be full of flags! The string “that” comes out to “๐Ÿ‡น๐Ÿ‡ญ๐Ÿ‡ฆ๐Ÿ‡น” (Thailand + Austria), “that’s amore” to “๐Ÿ‡น๐Ÿ‡ญ๐Ÿ‡ฆ๐Ÿ‡น’s ๐Ÿ‡ฆ๐Ÿ‡ฒo๐Ÿ‡ท๐Ÿ‡ช” (plus Armenia and Rรฉunion), and “support” to, well, “support.”

Trivia

This line of programming led to some obvious questions, for which I have some answers. (I’m just using the word list that ships with macOS at /usr/share/dict/words for this.)

  • The longest reasonable English words that can be written entirely with flag emojis are “inconclusiveness”, “nonimpressionist”, and “sacrilegiousness”. That’s ๐Ÿ‡ฎ๐Ÿ‡ณ๐Ÿ‡จ๐Ÿ‡ด๐Ÿ‡ณ๐Ÿ‡จ๐Ÿ‡ฑ๐Ÿ‡บ๐Ÿ‡ธ๐Ÿ‡ฎ๐Ÿ‡ป๐Ÿ‡ช๐Ÿ‡ณ๐Ÿ‡ช๐Ÿ‡ธ๐Ÿ‡ธ (India + Colombia + New Caledonia + Luxembourg + Slovenia + Venezuela + Niger + South Sudan), ๐Ÿ‡ณ๐Ÿ‡ด๐Ÿ‡ณ๐Ÿ‡ฎ๐Ÿ‡ฒ๐Ÿ‡ต๐Ÿ‡ท๐Ÿ‡ช๐Ÿ‡ธ๐Ÿ‡ธ๐Ÿ‡ฎ๐Ÿ‡ด๐Ÿ‡ณ๐Ÿ‡ฎ๐Ÿ‡ธ๐Ÿ‡น (Norway + Nicaragua + Northern Mariana Islands + Rรฉunion + South Sudan + British Indian Ocean Territory + Nicaragua again + Sao Tome and Principe), and ๐Ÿ‡ธ๐Ÿ‡ฆ๐Ÿ‡จ๐Ÿ‡ท๐Ÿ‡ฎ๐Ÿ‡ฑ๐Ÿ‡ช๐Ÿ‡ฌ๐Ÿ‡ฎ๐Ÿ‡ด๐Ÿ‡บ๐Ÿ‡ธ๐Ÿ‡ณ๐Ÿ‡ช๐Ÿ‡ธ๐Ÿ‡ธ (Saudi Arabia + Costa Rica + Israel + Egypt + British Indian Ocean Territories + United States + Niger + South Sudan).
  • If you count unreasonable English words, “gastropancreatitis” wins. That’s ๐Ÿ‡ฌ๐Ÿ‡ฆ๐Ÿ‡ธ๐Ÿ‡น๐Ÿ‡ท๐Ÿ‡ด๐Ÿ‡ต๐Ÿ‡ฆ๐Ÿ‡ณ๐Ÿ‡จ๐Ÿ‡ท๐Ÿ‡ช๐Ÿ‡ฆ๐Ÿ‡น๐Ÿ‡ฎ๐Ÿ‡น๐Ÿ‡ฎ๐Ÿ‡ธ: Gabon + Sao Tome and Principe + Romania + Panama + New Caledonia + Rรฉunion + Austria + Italy + Iceland.
  • The longest English words that have no valid flag digraphs are “equipollent,” “unsupported,” and “unturbulent.”
  • One letter shorter you get many more interesting flagless words, including “kookaburra,” “ponticello,” “surfactant,” “antelopian,” and “workfellow.”
  • The longest country that can be spelled entirely with flags is Bangladesh. That’s ๐Ÿ‡ง๐Ÿ‡ฆ๐Ÿ‡ณ๐Ÿ‡ฌ๐Ÿ‡ฑ๐Ÿ‡ฆ๐Ÿ‡ฉ๐Ÿ‡ช๐Ÿ‡ธ๐Ÿ‡ญ (Bosnia and Herzegovina + Nigeria + Laos + Germany + Saint Helena, but notably, not Banglagesh itself, which is BD ๐Ÿ‡ง๐Ÿ‡ฉ). The others are Brazil (๐Ÿ‡ง๐Ÿ‡ท๐Ÿ‡ฆ๐Ÿ‡ฟ๐Ÿ‡ฎ๐Ÿ‡ฑ), Cyprus (๐Ÿ‡จ๐Ÿ‡พ๐Ÿ‡ต๐Ÿ‡ท๐Ÿ‡บ๐Ÿ‡ธ), Monaco (๐Ÿ‡ฒ๐Ÿ‡ด๐Ÿ‡ณ๐Ÿ‡ฆ๐Ÿ‡จ๐Ÿ‡ด, which does not contain ๐Ÿ‡ฒ๐Ÿ‡จ), Panama (๐Ÿ‡ต๐Ÿ‡ฆ๐Ÿ‡ณ๐Ÿ‡ฆ๐Ÿ‡ฒ๐Ÿ‡ฆ), Cuba (๐Ÿ‡จ๐Ÿ‡บ๐Ÿ‡ง๐Ÿ‡ฆ), Guam (๐Ÿ‡ฌ๐Ÿ‡บ๐Ÿ‡ฆ๐Ÿ‡ฒ), Iraq (๐Ÿ‡ฎ๐Ÿ‡ท๐Ÿ‡ฆ๐Ÿ‡ถ), Mali (๐Ÿ‡ฒ๐Ÿ‡ฆ๐Ÿ‡ฑ๐Ÿ‡ฎ), Peru (๐Ÿ‡ต๐Ÿ‡ช๐Ÿ‡ท๐Ÿ‡บ), and Chad (๐Ÿ‡จ๐Ÿ‡ญ๐Ÿ‡ฆ๐Ÿ‡ฉ).
  • Sudan is the only country that cannot be written with a flag!
  • The five most common flags in my word list are ๐Ÿ‡ช๐Ÿ‡ท (Eritrea/ER), ๐Ÿ‡ฆ๐Ÿ‡ฑ (Albania/AL), ๐Ÿ‡ธ๐Ÿ‡น (Sao Tome and Principe/ST), ๐Ÿ‡ณ๐Ÿ‡ช (Niger/NE), and ๐Ÿ‡ฑ๐Ÿ‡ฎ (Liechtenstein/LI).
  • Eight countries’ flags never appear in my word list: ๐Ÿ‡จ๐Ÿ‡ฌ (Congo/CG), ๐Ÿ‡จ๐Ÿ‡ป (Cabo Verde/CV), ๐Ÿ‡จ๐Ÿ‡ฝ (Christmas Island/CX), ๐Ÿ‡ฌ๐Ÿ‡ถ (Equatorial Guinea/GQ), ๐Ÿ‡ฒ๐Ÿ‡ถ (Martinique/MQ), ๐Ÿ‡ฒ๐Ÿ‡ฝ (Mexico/MX), ๐Ÿ‡ฒ๐Ÿ‡ฟ (Mozambique/MZ), and ๐Ÿ‡ธ๐Ÿ‡ฝ (Sint Maarten (Dutch part)/SX).
  • My vim is very bad at flag digraphs, making this blog post quite difficult to write.

Thanks Joe, for the weekend diversion!


  1. I didn’t know these particular flags, but if you hover over them, Slack helpfully provides :flag_ba: and :flag_ck: tooltips. ↩︎

  2. Yes, I could do this some other way and generate them from the ASCII letters programmatically, but it was Friday night and I was lazy↩︎