In our Slack at work, we have an #availability channel, where people can post their comings and goings. In the middle of the day, there is often a stack of posts that say simply “lunch” or “back.” Friday, my friend Joe, rather than writing “back”, did this:
That is: instead of the word “back” (as I posted), Joe posted “๐ง๐ฆ๐จ๐ฐ”. If you’re not super into vexillology, those are the flags for Bosnia and Herzegovina (left) and the Cook Islands (right). This makes roughly no sense, unless you happen to know that the ISO 3166-1 alpha-2 country codes use BA for Bosnia and Herzegovina and CK for the Cook Islands.1 Later that afternoon, Joe (who normally writes in a more civilized programming language) told me he was working on a secret Perl project, which I guessed more or less immediately.
In other words, this is all Joe’s fault. He decided not to finish his Perl project and instead nerdsniped me into writing the program, and also convinced me I should write this post. That’s because now, you can ask our Slack bot to speak to you in flags:
A digression into Unicode
To understand this, we first need to understand how the flag emojis work. Most
emojis are a single Unicode code point: ๐ฒ (\N{EVERGREEN TREE}
, my favorite
emoji) is code point U+1F332,
for example. Some other emojis are represented by multiple codepoints and a
combining Zero-Width Joiner
(ZWJ). The emoji ๐๐ป, for example, is three code points: ๐ (U+1F44D,
\N{THUMBS UP SIGN}
), a ZWJ, and ๐ป (U+1F3FB, \N{EMOJI MODIFIER FITZPATRICK TYPE-1-2}
).
This is not how the flag emojis work. Instead, they use flag
sequences. There are 26 code
points, with names like \N{REGIONAL INDICATOR SYMBOL LETTER A}
(U+1F1E6). By
themselves, these don’t look like much; on macOS, I see them as capital
letters surrounded by a box, like this: ๐ฆ. But when you put two of them
together, and they form a valid two-letter country code, you get a flag emoji!
That is, if you put ๐ฆ (regional indicator A) right next to ๐บ (regional
indicator U), you get ๐ฆ๐บ, the flag for Australia.
Back to the Slack bot
Now that we know how the flag emojis are made, it’s approaching trivial to write a program to do the transliteration for us. For any given string, we just need to check every adjacent pair of letters to see if it’s a valid country code. You can read the whole commit if you want, but the core of it is very straightforward:
sub to_flags ($s) {
require Locale::Codes;
my %char_for = qw(
a ๐ฆ b ๐ง c ๐จ d ๐ฉ e ๐ช f ๐ซ g ๐ฌ h ๐ญ i ๐ฎ
j ๐ฏ k ๐ฐ l ๐ฑ m ๐ฒ n ๐ณ o ๐ด p ๐ต q ๐ถ r ๐ท
s ๐ธ t ๐น u ๐บ v ๐ป w ๐ผ x ๐ฝ y ๐พ z ๐ฟ
);
my $lc = Locale::Codes->new('country');
my %is_country = map {; $_ => 1 } $lc->all_codes('alpha-2');
my $out = '';
for (my $i = 0; $i < (length $s) - 1; $i++) {
my $digraph = lc substr $s, $i, 2;
if ($is_country{$digraph}) {
$out .= $char_for{$_} for split //, $digraph;
$i++; # no double-counting
} else {
$out .= substr $s, $i, 1;
}
# make sure we don't drop the last char the last char if we need to
$out .= substr $s, -1, 1 if $i == (length $s) - 2;
}
return $out;
};
This is Perl, but doing it in another language is equally trivial. First, we
make a map of ASCII letters to regional indicators2, and load up a list of
valid countries (I used Locale::Codes
here just to avoid having to write out the list myself). Then, for every pair
of letters in the source string $s
, we check if it’s a valid country. If it
is, we add the two relevant regional indicators (i.e., the flag emoji) to the
output string, and if not we add the character itself. (Aside: this would be
much less tedious in a language with subscriptable strings.)
This means we can translate any arbitrary string to be full of flags! The string “that” comes out to “๐น๐ญ๐ฆ๐น” (Thailand + Austria), “that’s amore” to “๐น๐ญ๐ฆ๐น’s ๐ฆ๐ฒo๐ท๐ช” (plus Armenia and Rรฉunion), and “support” to, well, “support.”
Trivia
This line of programming led to some obvious questions, for which I have some
answers. (I’m just using the word list that ships with macOS at
/usr/share/dict/words
for this.)
- The longest reasonable English words that can be written entirely with flag emojis are “inconclusiveness”, “nonimpressionist”, and “sacrilegiousness”. That’s ๐ฎ๐ณ๐จ๐ด๐ณ๐จ๐ฑ๐บ๐ธ๐ฎ๐ป๐ช๐ณ๐ช๐ธ๐ธ (India + Colombia + New Caledonia + Luxembourg + Slovenia + Venezuela + Niger + South Sudan), ๐ณ๐ด๐ณ๐ฎ๐ฒ๐ต๐ท๐ช๐ธ๐ธ๐ฎ๐ด๐ณ๐ฎ๐ธ๐น (Norway + Nicaragua + Northern Mariana Islands + Rรฉunion + South Sudan + British Indian Ocean Territory + Nicaragua again + Sao Tome and Principe), and ๐ธ๐ฆ๐จ๐ท๐ฎ๐ฑ๐ช๐ฌ๐ฎ๐ด๐บ๐ธ๐ณ๐ช๐ธ๐ธ (Saudi Arabia + Costa Rica + Israel + Egypt + British Indian Ocean Territories + United States + Niger + South Sudan).
- If you count unreasonable English words, “gastropancreatitis” wins. That’s ๐ฌ๐ฆ๐ธ๐น๐ท๐ด๐ต๐ฆ๐ณ๐จ๐ท๐ช๐ฆ๐น๐ฎ๐น๐ฎ๐ธ: Gabon + Sao Tome and Principe + Romania + Panama + New Caledonia + Rรฉunion + Austria + Italy + Iceland.
- The longest English words that have no valid flag digraphs are “equipollent,” “unsupported,” and “unturbulent.”
- One letter shorter you get many more interesting flagless words, including “kookaburra,” “ponticello,” “surfactant,” “antelopian,” and “workfellow.”
- The longest country that can be spelled entirely with flags is Bangladesh. That’s ๐ง๐ฆ๐ณ๐ฌ๐ฑ๐ฆ๐ฉ๐ช๐ธ๐ญ (Bosnia and Herzegovina + Nigeria + Laos + Germany + Saint Helena, but notably, not Banglagesh itself, which is BD ๐ง๐ฉ). The others are Brazil (๐ง๐ท๐ฆ๐ฟ๐ฎ๐ฑ), Cyprus (๐จ๐พ๐ต๐ท๐บ๐ธ), Monaco (๐ฒ๐ด๐ณ๐ฆ๐จ๐ด, which does not contain ๐ฒ๐จ), Panama (๐ต๐ฆ๐ณ๐ฆ๐ฒ๐ฆ), Cuba (๐จ๐บ๐ง๐ฆ), Guam (๐ฌ๐บ๐ฆ๐ฒ), Iraq (๐ฎ๐ท๐ฆ๐ถ), Mali (๐ฒ๐ฆ๐ฑ๐ฎ), Peru (๐ต๐ช๐ท๐บ), and Chad (๐จ๐ญ๐ฆ๐ฉ).
- Sudan is the only country that cannot be written with a flag!
- The five most common flags in my word list are ๐ช๐ท (Eritrea/ER), ๐ฆ๐ฑ (Albania/AL), ๐ธ๐น (Sao Tome and Principe/ST), ๐ณ๐ช (Niger/NE), and ๐ฑ๐ฎ (Liechtenstein/LI).
- Eight countries’ flags never appear in my word list: ๐จ๐ฌ (Congo/CG), ๐จ๐ป (Cabo Verde/CV), ๐จ๐ฝ (Christmas Island/CX), ๐ฌ๐ถ (Equatorial Guinea/GQ), ๐ฒ๐ถ (Martinique/MQ), ๐ฒ๐ฝ (Mexico/MX), ๐ฒ๐ฟ (Mozambique/MZ), and ๐ธ๐ฝ (Sint Maarten (Dutch part)/SX).
- My vim is very bad at flag digraphs, making this blog post quite difficult to write.
Thanks Joe, for the weekend diversion!