In our Slack at work, we have an #availability channel, where people can post their comings and goings. In the middle of the day, there is often a stack of posts that say simply “lunch” or “back.” Friday, my friend Joe, rather than writing “back”, did this:

Slack screenshot of the word “back” and ๐Ÿ‡ง๐Ÿ‡ฆ๐Ÿ‡จ๐Ÿ‡ฐ

That is: instead of the word “back” (as I posted), Joe posted “๐Ÿ‡ง๐Ÿ‡ฆ๐Ÿ‡จ๐Ÿ‡ฐ”. If you’re not super into vexillology, those are the flags for Bosnia and Herzegovina (left) and the Cook Islands (right). This makes roughly no sense, unless you happen to know that the ISO 3166-1 alpha-2 country codes use BA for Bosnia and Herzegovina and CK for the Cook Islands.1 Later that afternoon, Joe (who normally writes in a more civilized programming language) told me he was working on a secret Perl project, which I guessed more or less immediately.

In other words, this is all Joe’s fault. He decided not to finish his Perl project and instead nerdsniped me into writing the program, and also convinced me I should write this post. That’s because now, you can ask our Slack bot to speak to you in flags:

Slack screenshot of the text 'I am a normal person with normal hobbies,' with many of the letters replaced with flag emojis

A digression into Unicode

To understand this, we first need to understand how the flag emojis work. Most emojis are a single Unicode code point: ๐ŸŒฒ (\N{EVERGREEN TREE}, my favorite emoji) is code point U+1F332, for example. Some other emojis are represented by multiple codepoints and a combining Zero-Width Joiner (ZWJ). The emoji ๐Ÿ‘๐Ÿป, for example, is three code points: ๐Ÿ‘ (U+1F44D, \N{THUMBS UP SIGN}), a ZWJ, and ๐Ÿป (U+1F3FB, \N{EMOJI MODIFIER FITZPATRICK TYPE-1-2}).

This is not how the flag emojis work. Instead, they use flag sequences. There are 26 code points, with names like \N{REGIONAL INDICATOR SYMBOL LETTER A} (U+1F1E6). By themselves, these don’t look like much; on macOS, I see them as capital letters surrounded by a box, like this: ๐Ÿ‡ฆ. But when you put two of them together, and they form a valid two-letter country code, you get a flag emoji! That is, if you put ๐Ÿ‡ฆ (regional indicator A) right next to ๐Ÿ‡บ (regional indicator U), you get ๐Ÿ‡ฆ๐Ÿ‡บ, the flag for Australia.

Back to the Slack bot

Now that we know how the flag emojis are made, it’s approaching trivial to write a program to do the transliteration for us. For any given string, we just need to check every adjacent pair of letters to see if it’s a valid country code. You can read the whole commit if you want, but the core of it is very straightforward:

sub to_flags ($s) {
  require Locale::Codes;

  my %char_for = qw(
    a ๐Ÿ‡ฆ   b ๐Ÿ‡ง   c ๐Ÿ‡จ   d ๐Ÿ‡ฉ   e ๐Ÿ‡ช   f ๐Ÿ‡ซ   g ๐Ÿ‡ฌ   h ๐Ÿ‡ญ   i ๐Ÿ‡ฎ
    j ๐Ÿ‡ฏ   k ๐Ÿ‡ฐ   l ๐Ÿ‡ฑ   m ๐Ÿ‡ฒ   n ๐Ÿ‡ณ   o ๐Ÿ‡ด   p ๐Ÿ‡ต   q ๐Ÿ‡ถ   r ๐Ÿ‡ท
    s ๐Ÿ‡ธ   t ๐Ÿ‡น   u ๐Ÿ‡บ   v ๐Ÿ‡ป   w ๐Ÿ‡ผ   x ๐Ÿ‡ฝ   y ๐Ÿ‡พ   z ๐Ÿ‡ฟ
  );

  my $lc = Locale::Codes->new('country');
  my %is_country = map {; $_ => 1 } $lc->all_codes('alpha-2');

  my $out = '';

  for (my $i = 0; $i < (length $s) - 1; $i++) {
    my $digraph = lc substr $s, $i, 2;

    if ($is_country{$digraph}) {
      $out .= $char_for{$_} for split //, $digraph;
      $i++; # no double-counting
    } else {
      $out .= substr $s, $i, 1;
    }

    # make sure we don't drop the last char the last char if we need to
    $out .= substr $s, -1, 1 if $i == (length $s) - 2;
  }

  return $out;
};

This is Perl, but doing it in another language is equally trivial. First, we make a map of ASCII letters to regional indicators2, and load up a list of valid countries (I used Locale::Codes here just to avoid having to write out the list myself). Then, for every pair of letters in the source string $s, we check if it’s a valid country. If it is, we add the two relevant regional indicators (i.e., the flag emoji) to the output string, and if not we add the character itself. (Aside: this would be much less tedious in a language with subscriptable strings.)

This means we can translate any arbitrary string to be full of flags! The string “that” comes out to “๐Ÿ‡น๐Ÿ‡ญ๐Ÿ‡ฆ๐Ÿ‡น” (Thailand + Austria), “that’s amore” to “๐Ÿ‡น๐Ÿ‡ญ๐Ÿ‡ฆ๐Ÿ‡น’s ๐Ÿ‡ฆ๐Ÿ‡ฒo๐Ÿ‡ท๐Ÿ‡ช” (plus Armenia and Rรฉunion), and “support” to, well, “support.”

Trivia

This line of programming led to some obvious questions, for which I have some answers. (I’m just using the word list that ships with macOS at /usr/share/dict/words for this.)

  • The longest reasonable English words that can be written entirely with flag emojis are “inconclusiveness”, “nonimpressionist”, and “sacrilegiousness”. That’s ๐Ÿ‡ฎ๐Ÿ‡ณ๐Ÿ‡จ๐Ÿ‡ด๐Ÿ‡ณ๐Ÿ‡จ๐Ÿ‡ฑ๐Ÿ‡บ๐Ÿ‡ธ๐Ÿ‡ฎ๐Ÿ‡ป๐Ÿ‡ช๐Ÿ‡ณ๐Ÿ‡ช๐Ÿ‡ธ๐Ÿ‡ธ (India + Colombia + New Caledonia + Luxembourg + Slovenia + Venezuela + Niger + South Sudan), ๐Ÿ‡ณ๐Ÿ‡ด๐Ÿ‡ณ๐Ÿ‡ฎ๐Ÿ‡ฒ๐Ÿ‡ต๐Ÿ‡ท๐Ÿ‡ช๐Ÿ‡ธ๐Ÿ‡ธ๐Ÿ‡ฎ๐Ÿ‡ด๐Ÿ‡ณ๐Ÿ‡ฎ๐Ÿ‡ธ๐Ÿ‡น (Norway + Nicaragua + Northern Mariana Islands + Rรฉunion + South Sudan + British Indian Ocean Territory + Nicaragua again + Sao Tome and Principe), and ๐Ÿ‡ธ๐Ÿ‡ฆ๐Ÿ‡จ๐Ÿ‡ท๐Ÿ‡ฎ๐Ÿ‡ฑ๐Ÿ‡ช๐Ÿ‡ฌ๐Ÿ‡ฎ๐Ÿ‡ด๐Ÿ‡บ๐Ÿ‡ธ๐Ÿ‡ณ๐Ÿ‡ช๐Ÿ‡ธ๐Ÿ‡ธ (Saudi Arabia + Costa Rica + Israel + Egypt + British Indian Ocean Territories + United States + Niger + South Sudan).
  • If you count unreasonable English words, “gastropancreatitis” wins. That’s ๐Ÿ‡ฌ๐Ÿ‡ฆ๐Ÿ‡ธ๐Ÿ‡น๐Ÿ‡ท๐Ÿ‡ด๐Ÿ‡ต๐Ÿ‡ฆ๐Ÿ‡ณ๐Ÿ‡จ๐Ÿ‡ท๐Ÿ‡ช๐Ÿ‡ฆ๐Ÿ‡น๐Ÿ‡ฎ๐Ÿ‡น๐Ÿ‡ฎ๐Ÿ‡ธ: Gabon + Sao Tome and Principe + Romania + Panama + New Caledonia + Rรฉunion + Austria + Italy + Iceland.
  • The longest English words that have no valid flag digraphs are “equipollent,” “unsupported,” and “unturbulent.”
  • One letter shorter you get many more interesting flagless words, including “kookaburra,” “ponticello,” “surfactant,” “antelopian,” and “workfellow.”
  • The longest country that can be spelled entirely with flags is Bangladesh. That’s ๐Ÿ‡ง๐Ÿ‡ฆ๐Ÿ‡ณ๐Ÿ‡ฌ๐Ÿ‡ฑ๐Ÿ‡ฆ๐Ÿ‡ฉ๐Ÿ‡ช๐Ÿ‡ธ๐Ÿ‡ญ (Bosnia and Herzegovina + Nigeria + Laos + Germany + Saint Helena, but notably, not Banglagesh itself, which is BD ๐Ÿ‡ง๐Ÿ‡ฉ). The others are Brazil (๐Ÿ‡ง๐Ÿ‡ท๐Ÿ‡ฆ๐Ÿ‡ฟ๐Ÿ‡ฎ๐Ÿ‡ฑ), Cyprus (๐Ÿ‡จ๐Ÿ‡พ๐Ÿ‡ต๐Ÿ‡ท๐Ÿ‡บ๐Ÿ‡ธ), Monaco (๐Ÿ‡ฒ๐Ÿ‡ด๐Ÿ‡ณ๐Ÿ‡ฆ๐Ÿ‡จ๐Ÿ‡ด, which does not contain ๐Ÿ‡ฒ๐Ÿ‡จ), Panama (๐Ÿ‡ต๐Ÿ‡ฆ๐Ÿ‡ณ๐Ÿ‡ฆ๐Ÿ‡ฒ๐Ÿ‡ฆ), Cuba (๐Ÿ‡จ๐Ÿ‡บ๐Ÿ‡ง๐Ÿ‡ฆ), Guam (๐Ÿ‡ฌ๐Ÿ‡บ๐Ÿ‡ฆ๐Ÿ‡ฒ), Iraq (๐Ÿ‡ฎ๐Ÿ‡ท๐Ÿ‡ฆ๐Ÿ‡ถ), Mali (๐Ÿ‡ฒ๐Ÿ‡ฆ๐Ÿ‡ฑ๐Ÿ‡ฎ), Peru (๐Ÿ‡ต๐Ÿ‡ช๐Ÿ‡ท๐Ÿ‡บ), and Chad (๐Ÿ‡จ๐Ÿ‡ญ๐Ÿ‡ฆ๐Ÿ‡ฉ).
  • Sudan is the only country that cannot be written with a flag!
  • The five most common flags in my word list are ๐Ÿ‡ช๐Ÿ‡ท (Eritrea/ER), ๐Ÿ‡ฆ๐Ÿ‡ฑ (Albania/AL), ๐Ÿ‡ธ๐Ÿ‡น (Sao Tome and Principe/ST), ๐Ÿ‡ณ๐Ÿ‡ช (Niger/NE), and ๐Ÿ‡ฑ๐Ÿ‡ฎ (Liechtenstein/LI).
  • Eight countries’ flags never appear in my word list: ๐Ÿ‡จ๐Ÿ‡ฌ (Congo/CG), ๐Ÿ‡จ๐Ÿ‡ป (Cabo Verde/CV), ๐Ÿ‡จ๐Ÿ‡ฝ (Christmas Island/CX), ๐Ÿ‡ฌ๐Ÿ‡ถ (Equatorial Guinea/GQ), ๐Ÿ‡ฒ๐Ÿ‡ถ (Martinique/MQ), ๐Ÿ‡ฒ๐Ÿ‡ฝ (Mexico/MX), ๐Ÿ‡ฒ๐Ÿ‡ฟ (Mozambique/MZ), and ๐Ÿ‡ธ๐Ÿ‡ฝ (Sint Maarten (Dutch part)/SX).
  • My vim is very bad at flag digraphs, making this blog post quite difficult to write.

Thanks Joe, for the weekend diversion!


  1. I didn’t know these particular flags, but if you hover over them, Slack helpfully provides :flag_ba: and :flag_ck: tooltips. ↩︎

  2. Yes, I could do this some other way and generate them from the ASCII letters programmatically, but it was Friday night and I was lazy↩︎

Tagged: misc, programming