Let’s play Scattergories, but instead of rolling a letter die, I choose the letter, and the letter is N. Quick, name one thing in each of these categories that starts with N:
Category | Possible answers |
---|---|
Animals | Newt, narwhal, nightingale, nautilus, or naked mole rat |
Colors | Navy blue or neon green |
Restaurant chains | Nando’s or Noodles & Company |
Sports | Netball, Newcomb, or the only Olympic N sport, Nordic Combined |
Board or card games | Nerts, Nine men’s morris, or Niagara |
Flowers | Narcissus or night-scented stock |
Musical instruments | Nose flute or nadaswaram |
Vegetables | Napa cabbage or nopal, or maybe nori |
Gemstones | Good luck knowing nambulite or nephrite off the top of your head! |
That was… weirdly hard, wasn’t it? N gives off common-letter vibes—it’s one point in Scrabble, there are 8 of them in a Bananagrams pouch, it’s one of the RSTLNE letters on Wheel of Fortune—so by all accounts this should’ve been easy. So then what’s going on here? Why are all these super-normal categories so deficient in vitamin N?
I know what you’re thinking: “Adam, you can’t just cherry-pick a bunch of categories to prove your point!” And you’re right! To do that, we’re gonna need to look at the data.
For this data analysis, we’re using the ENABLE (Enhanced North American Benchmark LExicon) wordlist, a free-to-use 172,000-word English dictionary that’s based on the Official Scrabble Player’s Dictionary but isn’t throttled to words under a certain length. You probably don’t recognize its name, but you may know its work—ENABLE is the base wordlist for Words With Friends and the default English dictionary on Wordlisted. It’s not perfect, but it’s a representative enough survey of English words for our purposes.
Also, we’re looking at the English dictionary in particular, not a natural language corpus, since we’re more concerned with how many different words there are rather than how often a given word appears. That means letters like T and H won’t be weighted for appearing in common words like the and that, meaning our letter frequency ranking will look a little different than rankings that take word frequency into account, like the classic Linotype ordering etaoin shrdlu.
So here are the letters of the alphabet ranked by how often they appear in the English dictionary. On the left is how often the letters appear overall, while on the right is how often they appear as the first letter of a word:
Immediately, you can see N plummeting: from the big leagues of overall letter frequency, down to the bowels of initial letter frequency, joining the likes of WVKJQYZX in the dregs of alphabet society.
Sidenote, it’s eerie how similar these two graph’s shapes are: despite having two completely different letter orderings, they both have three big leaders, six-ish more trailing behind, and then a steep dropoff into the long tail. Zipf’s law is crazy like that.
Here’s another way to visualize the same data, this time combining both distributions into one coordinate plane:
The dotted line here indicates where every letter would be if letter frequency was uniform throughout the word, i.e. if initial letter frequency was the same as overall letter frequency. So the farther a letter is from that dotted line, you could say the farther its initial letter frequency strays from expectation.
As expected, there’s the JQXZ crew hanging out at the bottom left, and S leagues ahead at the top right (thanks to plurals, and also just being an ultra-common starting letter). Way above the line, we see letters like P, C, B, and F, which are much more common as first letters than they are in general. Every vowel is below the line, which makes sense, since vowels are omnipresent but tend to be less common as starting letters. And then there’s N, the farthest consonant below the line, the most disproportionately rare starting consonant in the alphabet.
They don’t teach you this in school, folks. It almost feels wrong, like a forbidden piece of information, that N is such a rare first letter. So why might this be?
Let’s consider a few prevailing theories, some combination of which explain why N is such an uncommon starting letter despite its overall commonness.
First, while N is undeniably a common letter in general, the bulk of its commonness is for reasons other than being at the beginning of words. Instead, its position in overall frequency rankings is inflated by the ubiquity of prefixes like un- and in-, as well as suffixes like -tion and -ing. In natural language, N’s frequency is further bolstered by its appearance in ultra-common words like in, an, and and. N is everywhere, but it’s not usually the letter in charge. In NBA terms, N is less of a star point guard and more of a reliable role player.
Second, N has less potential as an initial letter because of the lack of English consonant clusters beginning with N. Take a look at the most disproportionately common starting letters, the ones farthest above the dotted line of expectation: P, C, B, and F. They have one thing in common: they lead common consonant clusters like pr-, ch-, bl-, and fl-, giving them much more mileage in the initial position. But compared to all the cool consonants, N has nothing of the sort—the only words in the ENABLE wordlist that start with N followed by another consonant are nth, ngwee (1/100 of a kwacha, which is Zambia’s official currency), and ngultrum(s) (whaddya know, Bhutan’s official currency). Without consonant clusters, the priors for N as a first letter are lower, since there are fewer possible ways to construct a word starting with it.
Third, there is some historical phonological evidence that English words, especially nouns, have evolved away from starting with N. In English, we turn our indefinite article a into an when it’s followed by a word that starts with a vowel sound, enabling great jokes like “You’re going to Antarctica? Have an ice time!” But jokes aside, misunderstandings of N’s placement have led to the actual evolution of words as we know them today, in a process called faulty separation (also known as false splitting, a special case of rebracketing). For instance, an apron came from a misunderstanding of a napron, an umpire was once a noumpere, and an adder was a naddre. Granted, this transformation has also happened the other way around, turning an eft into a newt and an eke name into a nickname. Still, it’s likely that some of N’s unpopularity as a first letter was destined by a natural selection process—or should I say, an atural selection process.
Now you know that N is a lowkey rare first letter, and you can tell your friends about it, and they can be like, “neat!” That’s pretty much it. I don’t know, did you think there would be some higher purpose to a blog post about the letter N?
I mean, if you’re making a crossword, be careful putting an entry with a bunch of N’s in it at 1-Across. If you’re writing an acrostic poem or an alphabet book, make sure you have something good for N before you get in too deep. And maybe, if you’re playing Scattergories and the die lands on N, roll again.
]]>If you ask the university’s marketing office, their answer is perfectly clear: University of Illinois Urbana-Champaign. If you came here for a simple answer, there you go, but buckle up, because there’s a lot more to the story.
To truly understand the nuances of Illinois’s flagship university’s name and why people are so confused about it, we have to take a journey through 156 years of geopolitics, branding, and grammar.
The year was 1867, and Illinois needed a new school. When Abraham Lincoln signed the Morrill Land-Grant Act five years earlier, the federal government granted every state a piece of land to establish a federally endowed university, and each state got to choose where to put it. The states also kicked out lots of Indigenous people in the process, which the universities occasionally acknowledge to this day.
After a bidding war, the humble town of Urbana won Illinois’s jackpot, and in 1867, a new land-grant university was born: Illinois Industrial University. It was founded in Urbana by academic warhorse John Milton Gregory, who was more of a liberal arts guy himself but called the university “industrial” to appease industry-obsessed lawmakers.
Gregory served as president of the university for 13 years until he tossed his papers into the air and resigned in 1880. Soon after, the university was beginning to realize it wasn’t just “industrial,” with burgeoning programs in agriculture, engineering, and Gregory’s favorite liberal arts. So in 1881, a year after Gregory’s resignation, students voted 250–20 to ditch the word “industrial” in favor of a new name. It took four years, but in 1885, the university finally changed its name to the more holistic University of Illinois, a name that stuck for a while.
John Milton Gregory died in 1898 and was buried next to Altgeld Hall on the university’s Main Quad. Legend has it, Gregory’s dying wish was to leave a modest legacy and have nothing named after him. So he’d be thrilled to know the university’s Department of History is now housed at Gregory Hall, which is a quick walk away from both Gregory Street and Gregory Drive.
It was the turn of the 20th century, and the University of Illinois was expanding. It was already leaking into Champaign, Urbana’s larger neighbor to the west, but it was time to go north.
In 1896, the Chicago College of Pharmacy joined forces with the university, officially becoming the School of Pharmacy of the University of Illinois. Over the next couple decades, the University of Illinois family also gained a College of Medicine and a College of Dentistry, both up in the Windy City.
Sometime around 1905, letters and publications from University of Illinois administrators gradually started including “Urbana” with the university’s name, probably to distinguish the university’s main campus from its growing medical presence in Chicago. This riled up citizens and business owners of Champaign, who wanted their name on the university that spilled into their city. Champaignians published multiple op-eds in local newspapers arguing that Champaign and Urbana should split the bill. Urbanans rightly pointed out that the bulk of the university was in Urbana, including its administrative offices (and thus the university’s mailing address).
The Urbana vs. Champaign debate heated up, and in September 1906, the university’s Board of Trustees held an actual meeting to resolve it. What came out of this meeting was the name “Urbana-Champaign”—with Urbana first and foremost, like the university itself. Soon after, “Urbana-Champaign” began appearing on official university correspondence, and over the course of the next few decades, it became a commonplace way to refer to the campus. But it wasn’t until 1969 that the university officially codified its new name, the University of Illinois at Urbana-Champaign.
The Illini Union, as seen from the Main Quad (in Urbana, not Champaign)
If you solved my latest crossword, or if you’re from the area, or if you know too much, you’d know that the metro area including the twin cities of Urbana and Champaign is called Champaign–Urbana (or C‑U, or Chambana, or Shampoo–Banana), not Urbana–Champaign. That’s because Champaign has pretty much always been more populous than Urbana, and metro areas are conventionally named with the more populous cities first, like Dallas–Fort Worth or New York–Newark–Jersey City.
So we have a university campus called Urbana-Champaign, in Champaign–Urbana. And you’re just gonna have to deal with it.
You might have noticed another difference between Urbana-Champaign and Champaign–Urbana: Urbana-Champaign is written with a hyphen (-), while Champaign–Urbana is written with the slightly longer en dash (–). This isn’t a mistake, because if it was, I wouldn’t be pointing it out. So what’s going on here?
If you’re big into style guides, you might know that hyphens generally join two parts of one word or name (like post-punk or Anya Taylor-Joy), whereas en dashes join two associated but distinct things (like red–green colorblindness or the Spanish–American War). You can remember that hyphens are shorter, so they connect things more closely than en dashes do.
As far as I could tell, the campus name Urbana-Champaign has always used a hyphen in an official capacity, possibly because Urbana and Champaign are two continuous parts of one campus, or possibly because hyphens are easier to type than en dashes. However, this didn’t stop the Wikipedia article for the school from being titled University of Illinois at Urbana–Champaign (with an en dash), after a zealous editor decided it adhered to Wikipedia’s style guide in 2010. The article’s title stayed this way until 2021, when the hyphen triumphantly returned after a lengthy talk page discussion. As the user JustinMal1 put it, “In many ways, the campus is much like a marital union, and marital unions are hyphenated, not en dashed.”
The metro area Champaign–Urbana, on the other hand, takes an en dash, since Champaign and Urbana are two distinct entities that just so happen to be the metro’s two largest cities. Or if you’re typing on a typewriter, you’ll just have to settle for a hyphen.
Remember those medical schools in Chicago? In 1961, they officially became a new campus, called the University of Illinois at the Medical Center. Then in 1965, another Chicago campus was established, named the University of Illinois at Chicago Circle after a nearby freeway interchange. In 1982, these two Chicago campuses consolidated into the University of Illinois at Chicago (UIC), a proud member of the University of Illinois family.
And then there was little Sangamon State University, Illinois’s smallest state university in its capital city Springfield, which lies in Sangamon County. In 1995, Sangamon State University was incorporated into the University of Illinois family and renamed the University of Illinois at Springfield (UIS).
Since then, these three campuses—Urbana-Champaign, Chicago, and Springfield—have comprised the University of Illinois System, whose website is uillinois.edu, not to be confused with Urbana-Champaign’s illinois.edu, and whose legal name is the University of Illinois, not to be confused with the university formerly known as the University of Illinois.
The University of Illinois System was a well-oiled machine until 2009, when Springfield went rogue and axed the “at” in their name, becoming University of Illinois Springfield. The inconsistency remained for 11 years, with the other universities still “at Chicago” and “at Urbana-Champaign.”
Then something in 2020 gave Chicago and Urbana-Champaign some time for self-reflection. That fall, they finally followed Springfield’s lead, quietly removing the “at” and rebranding to the University of Illinois Chicago and the University of Illinois Urbana-Champaign. But not everyone got the memo.
Excerpt from the University of Illinois System style guide
It wasn’t until spring 2021 that the university’s Wikipedia article was moved to remove the “at,” as a result of the same talk page discussion that restored the hyphen. Even still, the press isn’t on the same page about the name of the University of Illinois Urbana-Champaign. You’ll still find the “at” in The New York Times, the Chicago Tribune, and even style guide goliath AP. If you even ask a current student at the university, chances are they won’t know the “at” was removed, since the university never formally announced it.
Well, consider this your announcement. There is no “at” in the University of Illinois Urbana-Champaign.
In casual conversation, reciting the 14-syllable University of Illinois Urbana-Champaign every time you refer to the school will get tiring. But lucky us, the university officially recognizes four nicknames for use on “second and subsequent references.” Let’s break them down:
This former name of the university still sticks around as an abbreviation of sorts, but the university has mixed feelings about it. Since it’s also the official name of the University of Illinois System, the Office of Public Affairs at the Urbana-Champaign campus declared as of 2018, “Do not use the name ‘University of Illinois’ to refer to this campus.”
But people do anyway. In fact, if you search “University of Illinois” on Wikipedia, it redirects to the Urbana-Champaign campus, not the system.
You might be thinking, aren’t there three Universities of Illinois? What do the Chicago and Springfield campuses think of this? Well, the Urbana-Champaign campus is the O.G., the flagship, and the system’s largest campus to this day, with about 56,000 students compared to UIC’s 34,000 and UIS’s 4,000. As an anonymous redditor posted last year, “nobody on this planet refers to UIC or UIS as ‘The University of Illinois,’” so if you trust that comment’s 50-something upvotes, I don’t think anybody’s feelings are being hurt. But “University of Illinois” is still kind of a mouthful.
This is probably the most common way to refer to the school if you’re in Illinois, talking to other people from Illinois. In the Chicago suburbs, where I’m from, it’s what everyone calls the school. It’s also an officially sanctioned shorthand for campus tour guides to use, and the Office of Public Affairs permits it “for in-state and alumni audiences.”
The mammoth statue at the university’s Natural History Building. I just think he’s neat.
Only problem is, if you say “U of I” anywhere outside of Illinois, you’ll be met with confused looks. As of now, Wikipedia lists seven different universities on the “U of I” disambiguation page, including neighboring state university and fellow Big Ten member University of Iowa. Not great! But luckily, there’s another option, and it’s the same number of letters.
Every good school has an acronym. Chicago has UIC, Springfield has UIS, and Urbana-Champaign has UIUC.
The acronym UIUC has been in use to some degree since the ’70s, especially by professors and nerds, and especially on the internet. Registered in 1985, uiuc.edu was one of the oldest .edu domains, serving as the university’s website and email domain until it moved to illinois.edu in a 2008 rebrand. UIUC is also the name of the university’s subreddit, which was at one point the largest university subreddit in the country (curse you r/berkeley).
But the acronym isn’t without its downsides. The university doesn’t really use it in any official marketing material, especially since the 2008 rebrand. The acronym is better suited for text than for speech, with the muddle of “you-eye-you-see” often indistinguishable from UIC when spoken aloud. Relatively new in the lifespan of the university, the acronym also leaves a bit of a generational gap. My grandparents, who attended the university in the 1950s, never called it UIUC, and my mom, who has lived in Illinois all her life, had never heard UIUC until I was applying there in high school.
At the end of the day, kids these days still call it UIUC, and if you say it to someone who has been in school in the past decade, they’ll probably know what you’re talking about. And it’s about time someone puts it in a crossword.
Appearances (or lack thereof) of UIUC in major crossword outlets, per Crossword Tracker
The university has leaned into this nickname since 2008, and for good reason. It’s iconic, it works in both text and speech, and it’s unambiguous (assuming you’re not talking about UIC, UIS, Illinois State, Illinois Tech, or Illinois College). As the flagship state university of Illinois, it’s metonymous with the state itself, like how Michigan and Minnesota also refer to their respective flagship schools.
The nickname “Illinois” will be especially recognizable to anyone who’s ever looked at the Big Ten standings or a March Madness bracket, since ESPN has no time to rattle off the university’s full name. It’s all over T-shirts and hoodies, it’s in every student’s email address, and it’s plastered at the top of the university’s website.
The Alma Mater statue, as pictured on illinois.edu. Not to be confused with the university’s alma mater song “Hail to the Orange,” which aptly ends “Victory, Illinois, Varsity.”
This school has gone through a lot of names in its 156 years, from Illinois Industrial University to the University of Illinois to the University of Illinois at Urbana-Champaign to the University of Illinois Urbana-Champaign. But today, if someone asks me where I go to school, my answer will be simple, and it’s been in the name all along: Illinois.
For more adjacent to the University of Illinois Urbana-Champaign, check out these Wikipedia articles I recently wrote on Pinto Bean and Unofficial!
***
“Hmm, I do have a bunch of time to kill,” I said, being in my last winter break of college.
A week later, New Year Zone was born. But to tell the full story, we have to go back.
Back in the 1980s, I came up with the fun idea to compile a list of locations worldwide that celebrated the New Year at each hour of the day on December 31. I kept the handwritten list in a safe place and pulled it out every New Year’s Eve for years. (Actually, I kept the list inside a Styrofoam skimmer hat that I’m pretty sure I got from Constructive Playthings (US Toy Company) when I ran a Purim carnival together with my friend Dave sometime in the early ’80s. Also, I hand-colored a half-inch dowel rod with festive stripes, and each year on New Year’s Eve, I’d happily wear the skimmer hat and carry the dowel rod (as though I was some sort of circus barker), celebrating the New Year each hour according to my list.)
Flash forward to December 31, 2021, when I mentioned to Adam that I wanted to create an app or website that would alert people each hour on the hour where it was a New Year all day on New Year’s Eve. Seed planted! A year later, Adam took the leading oar and made it happen. Thank you Adam. And welcome to NEW YEAR ZONE™!
That Larry guy, what a character! Anyway, New Year Zone is a website that counts down to the New Year in every time zone, so that you can celebrate every hour like my dad did in the ’80s. It was a joint project between the two of us: think of me as the lead engineer and designer, and my dad as the product manager. The site’s design takes inspiration from my dad’s striped dowel rod, but unlike his dowel rod, it changes colors for each new time zone!
I developed New Year Zone in TypeScript and React, with SCSS for styling and Framer Motion for animation. The project is built with Vite and hosted with Cloudflare Pages (which I gotta thank my friend Christian for helping me configure).
Building the site presented tons of fun little technical challenges, like how to synchronize 38 clocks, how to fit a timer perfectly to the width of the screen (it’s harder than it should be), or how to implement a Wordlesque sharing functionality. The countdowns use the widely loved JavaScript Date API, syncing with your device’s clock to accurately determine the current time and your local time zone.
As it turns out, when you make a website about time zones, you learn a lot about time zones along the way. Here are some of my favorite time zone facts:
from Wikimedia Commons
Now it’s your turn to be the New Year circus barker! Grab the nearest styrofoam hat and striped dowel rod, and tell your friends about New Year Zone. And wherever you are in the world, Happy New Year!
]]>When I unboxed the AirPods on February 24, 2020, in my Illinois dorm room, the first thing I listened to on them was Steely Dan’s album Aja, because of course it was. I’m no audiophile, but these were good headphones, and I was about to listen to a lot more music on them.
I like making lists of things, and this seemed like as good a time as any to start making a list of the albums I’ve listened to.
I was going to take this list seriously. These were the arbitrary rules I laid out for myself:
I would keep the list in Evernote at first, transitioning to Notion around the start of 2021.
I wanted some way to rate each album. I didn’t want there to be too much granularity, or else I’d spend less time listening and more time thinking “is this more of a 7.5 or a 7.6?”
So I devised a totally subjective, low-granularity, vibes-based rating system using moon phase emojis for some reason. Introducing the Moon System:
I would assign every album on the list one of these moons immediately after listening. I could go back and amend previous moon ratings, but only if I listened to the album again.
By its very nature, the Moon System skews positive. The half moon is in the middle, and it’s still pretty good. And every album I rate is one that I voluntarily listened all the way through, so the scale is biased toward albums I like anyway.
Two and a half years later, the Moon System is still going strong. I’ve gotten ridiculous mileage out of my AirPods (now nicknamed AdamPods in my phone). I’ve listened to over 950 albums on these badboys, on track to pass 1,000 by the end of the year. Of those, I’ve given 76 the coveted Full Moon.
Up until now, my album list has existed purely for my own record. Notion has proven great for keeping a giant list of albums, letting me sort and filter the list by artist, date, or moon rating. It’s perfect for getting a glance of which Beatles albums I like the most, or how many albums I listened to in October. But I’m not really doing anything with all this data.
It’s about time for that to change. I’m not releasing the entire list, because no one cares what albums I didn’t really like. But I do think there’s value in sharing my favorites. These are the ones I keep coming back to, the masterpieces, the S-tier, the all bangers, no skips, all bops, no flops, all killer, no filler:
Take these as my recommendations, and let me know if you find something in there you enjoy! Just like my list in Notion, you can sort the Full Moon Albums in various orders and filter them by genre and vibes. After curating this list, it feels like a well-rounded portrait of my taste in music. Of course, my taste is just my opinion, but I think there’s something for everyone in there.
I made the site in TypeScript with React and SCSS, along with some Python preprocessing that uses the Spotify API to fetch the cover art and Spotify link for each album. I used Vite to build the project, which I highly recommend for TypeScript React apps. For the curious, all the source code is here.
As my album list barrels into quadruple digits, I plan on updating Full Moon Albums regularly. Discovering new music is one of my greatest joys, and I’m hoping Full Moon Albums can share some of that joy with the world!
]]>Dec 25, 2022 update:
Today I logged album #1000, which was New York–London–Paris–Munich by M, one of my dad’s favorites that I thought was apropos for the milestone. I gave it a 🌖.
For the uninitiated, here’s how it works. Players take turns taking trickshots, and if one player makes it, the other player has to match that shot. If the other player misses, they gain a letter in the word H-O-R-S-E, starting with H. If you miss five shots that your opponent made and spell the whole word, you lose! It’s a timeless classic, enjoyed by everyone from middle schoolers to NBA All-Stars to Obama.
There’s nothing special about horses.[citation needed] You could just as easily play Z-E-B-R-A or H-U-M-A-N, and the game would play out exactly the same. Players at home might be familiar with P-I-G, a game that’s identical to H-O-R-S-E, except you lose after three letters instead of five. This is perfect if you’re in a time crunch, or you otherwise lack the commitment to play for five whole letters.
But what if you’re a real power player? What if you have all the time in the world? What if you want to play H-O-R-S-E, but make it even longer?
What’s the longest animal name you can play H-O-R-S-E with?
You can’t play H-I-P-P-O, for example. If a player gains a P, how do you know if they missed their third or fourth shot? In H-O-R-S-E, you can describe a game’s state just by saying “Alice is at R, Bob is at S.” But in H-I-P-P-O, you’d have to say “Alice is at H-I-P, Bob is at H-I-P-P.” That’s way too much work, and it’ll get really tiring for even longer animal names.
So we have a rule: no repeated letters allowed. In other words, the animal’s name has to be an isogram, a word where every letter appears exactly once (alternatively known as a heterogram, but isogram wins because it’s one of those nice self-descriptive wordplay terms—isogram is itself an isogram).
That means our question boils down to: what’s the longest animal name that’s an isogram?
I posed this question to my friends while we were waiting for our pub trivia sheet to be graded, and we contemplated it for a solid 15 minutes. The longest one we thought of that night at Murphy’s Pub was the 10-letter ANGLERFISH (hat tip to my friend Aakash for that). That would make for one long game of H-O-R-S-E!
But we can do better. It’s surprisingly hard to think of isograms, let alone ones longer than 10 letters. That’s where computers come in.
It seems like no one has tried to solve this problem before, possibly because it bears no societal implications.
Isograms are rare, especially long ones. As words get longer, they contain more letters,[citation needed] so there are fewer possible letters to add that aren’t already in the word somewhere. This means the probability that a randomly spelled word is an isogram decays roughly exponentially as its length increases. (Incidentally, this is just like the birthday problem, which asks about the probability of a random group of people having any shared birthdays. Except here, the people come from a planet with 26-day years.)
English words aren’t spelled with random letters, but they still more or less follow the same isogram distribution. If we look at a list of English words, we can see the steep falloff in isograms as word length increases. About 66% of five-letter words are isograms, versus about 5% of ten-letter words and 0.06% of fifteen-letter words—the only two being DERMATOGLYPHICS and UNCOPYRIGHTABLE—and there are none any longer:
Of course, we’re not looking for any old isogram, we want animal isograms! Rather than an English dictionary, we need a list of animal names. In particular, we want a massive list of animal names. More animals means more chances for rare long isograms to appear.
Once we have a list, finding its longest isogram is actually the easy part. We can just use Wordlisted, a webapp I made that lets you upload any wordlist, search it for words with various features (one of which is isograms, conveniently), and sort the results by length. Thanks, past me!
So here’s the plan: acquire a comprehensive list of animal names, and throw it into Wordlisted to find its longest isogram. Easy, right?
But uh, where do you get a comprehensive list of animal names?
If you google “list of animals,” one of the top hits is the aptly named website A-Z Animals. It’s a big honkin’ list of animals ranging from aardvark to zuchon, with fun facts about each one. This seems like a good starting place.
So I whipped up a Python script to scrape the A-Z Animals list using Beautiful Soup, and before I knew it I had a txt file of 2,093 animal names. Maybe not comprehensive, but pretty good. I tossed it into Wordlisted, and out popped a 12-letter isogram, the new front runner for longest H-O-R-S-E animal:
BREDL’S PYTHON
from A-Z Animals
How fitting! Python in, python out. Bredl’s python is a python species native to Australia, named after Australian snake guy Josef Bredl. According to A-Z Animals, “these snakes love to climb trees, and young snakes often hide high in the branches.”
That’s progress! But we can do better.
My next order of business was to consult wordplay programming maven Alex Boisvert of Crossword Nexus fame. I asked him the question at hand: what’s the best way to scrape a massive list of animal names? Spitballing, I suggested I could try WordNet, an enormous database of lexical items tied together with synonyms and other relations.
As it turned out, one of WordNet’s relations is hyponyms, words that describe a subset of other words—for example, square is a hyponym of rectangle, and waffle is a hyponym of food. That means finding a list of animals was as simple as finding every word in WordNet that’s a hyponym of animal. Naturally, Alex had a chunk of Python code sitting around that did exactly that: find every word in WordNet that’s a hyponym of some other word.
After a few minutes of figuring out how to download WordNet (it’s pretty easy with NLTK), a few seconds of running Alex’s code, and a few minutes of cleaning up the resulting data, I now had a txt file of 7,262 animals. A marked improvement over A-Z Animals!
With more animals came more isograms, and with more isograms came a new longest isogram. Introducing an animal you can play a 13-letter game of H-O-R-S-E with:
JUNCO HYEMALIS
from Wikimedia Commons
Junco hyemalis is the scientific name of the dark-eyed junco, a species of sparrow native to North America. That’s right, WordNet has both common names and scientific names. That feels fine to me, since this is all in the name of science.
It just so happens that Junco hyemalis contains exactly one of each vowel, making it supervocalic (another self-descriptive wordplay term). It even has a Y, making it euryvocalic (another!). In fact, you’ll see that long isograms tend to be super- or euryvocalic. As words get longer, they need more vowels, and since isograms can’t repeat old vowels, they’ll eventually need to use all five or six.
Unlike WordNet, A-Z Animals doesn’t list scientific names like Junco hyemalis, but it does list the dark-eyed junco! Their fun fact about our favorite euryvocalic isogram bird is: “they are called snowbirds because many subspecies reappear in the winter.”
Lovely! But we can do better.
I figured there had to be some way to scrape every animal from either Wikipedia or its sister database Wikidata.
So I reached out to Lucas Werkmeister, a Wikidata developer who I found through some shenanigans he did with depths of wikipedia. I asked him my question, and soon enough he sent me this query written in SPARQL (pronounced “sparkle”) that scours Wikidata for every species in the animal kingdom along with its taxon name (Wikidata’s term for scientific name). I tried running his query in the Wikidata Query Service, but the request timed out. This was both a problem, since I couldn’t get the results, and a good sign, since that meant there were a ton of results.
Thankfully, Lucas pointed me toward a Mastodon post that addressed this very issue. One reply to this post led me to QLever (pronounced “clever”), an open-source SPARQL engine developed at the University of Freiburg in Germany. Not only did QLever run Lucas’s Wikidata query without timing out, it ran it in like 2 seconds.
After downloading the data from QLever and cleaning it up in Python, I suddenly had a txt file of over 1,700,000 animal names, ranging from the absurdly-named beetle Aaaaba nodosus to the sponge-dwelling cnidarian Zyzzyzus warreni. Take that, A-Z Animals! Like WordNet, Wikidata stores both common names and scientific names, but unlike WordNet, Wikidata has several orders of magnitude more animals.
Plugging a list of 1,700,000 animals into Wordlisted was a real moment of truth, both as a search for isograms and as a stress test for Wordlisted. But this was a success on all counts, yielding a result in a league of its own. Against all odds, here’s a 16-letter isogram animal:
HABRONYX FULVIPES
Habronyx fulvipes is a species of wasp first described in 1965 by Henry Keith Townes, Setsuya Momoi, and Marjorie Townes. It has no Wikipedia page in English, but it does have one in Dutch and a few other languages. It also has no A-Z Animals page, which means I can’t tell you their fun fact, so instead I’ll come up with my own. My fun fact is, “its species name fulvipes is Latin for yellow legs, which is a reference to the wasp’s yellow legs.”
More importantly, Habronyx fulvipes is an ultra-rare 16-letter isogram (that also happens to be euryvocalic), longer than any isogram in the English dictionary and more than three times the length of H-O-R-S-E. For all intents and purposes, this is the answer to our question. But here are some honorable mentions:
Hopefully that should give you plenty of options. A little Wikidata query will go a long way!
Society has progressed past the need for H-O-R-S-E. Soon enough, all the cool kids will be on the court, sinking trickshot after trickshot, playing H-A-B-R-O-N-Y-X-F-U-L-V-I-P-E-S.
]]>Naturally, your first guess is 50, since you’re trying to cut the possibilities perfectly in half. I tell you it’s higher, so you guess 75. I tell you it’s lower. You keep splitting the possibilities in half, until eventually you’ve narrowed it down to one number (it was 59, good job). With this strategy, it turns out you can figure out any number from 1 to 100 in at most seven guesses. Or any number from 1 to a million in at most twenty.
In computer science, we’ve got a name for this narrowing-down procedure: binary search. As long as the possibilities are sorted in some way (like numbers in numerical order), we can home in on one at lightning speed just by repeatedly cutting the possibilities in half.
But guessing numbers isn’t very interesting. What if we could generalize binary search to more than just numbers—what if we could binary search through all of the things?
As it turns out, you can. It’s a little game called Twenty Questions.
We’ve all played it. At the core of Twenty Questions, you have an answerer and one or more guessers. The answerer thinks of a thing, and the guessers have to deduce that thing purely by asking the answerer yes–no questions. Traditionally, the guessers have a whopping twenty questions to guess the thing, otherwise they lose.
As a game, this doesn’t really seem fair. If the answerer is motivated to win, they can just think of a thing so obscure that the guessers couldn’t possibly guess it in twenty questions. And that’s super easy to do, since twenty questions isn’t enough to guess almost anything!
To understand why, we have to do some math. Asking a yes–no question is just like asking if a number is higher or lower: it gives you a bit of information that cuts the possibilities into two groups. That means two yes–no questions create four groups, three questions create eight groups, and in general, n yes–no questions create 2n groups. In particular, twenty questions is enough to distinguish 220, or 1,048,576 types of things—this is exactly why you can deduce any number from 1 to a million in twenty guesses.
But in Twenty Questions, the answerer doesn’t have to think of a number from 1 to a million, they can think of anything. And, well, there are way more than 1,048,576 things. For example, numbers are things, and we can conceive of way more than a million numbers (like negatives, or fractions, or 1,048,577), so twenty questions isn’t even enough for numbers. But chances are the answerer isn’t thinking of a number. What about every possible word, or location, or event, or type of pasta? It doesn’t take long to realize that the number of conceivable things is infinite. As John Green (or Georg Cantor) taught us, some infinities are bigger than others—and the number of things is a really big infinity.
You might be thinking, wait, if there are infinitely many things to narrow down, couldn’t the game take infinitely long? Theoretically, maybe, but in practice, no. This is kind of a paradox. Even though we can conceive of infinitely many things, any particular thing will be guessable in a finite number of questions. After all, the answerer can only reach so far into the infinite depths of the universe before they decide on a thing, and any particular thing will be expressable in a finite number of words. Now it might not take twenty questions—in fact, it almost always won’t—but the game will end eventually.
How would I know? Throughout my life, I’ve played a lot of Twenty Questions, whether it was with my cabinmates at overnight camp, with my friends in high school and college, or with random people on Sporcle’s Twenty Questions forum. Even with infinite questions, the games always find a way to end—everyone still calls it Twenty Questions, though, for old time’s sake. The way I see it, playing with no question cap distills the game to its purest form. Besides, it’s far more satisfying for everyone if the guessers can reach that “aha!” moment, rather than having the game be artificially cut short after a measly twenty questions.
But wait a second, if there’s no question cap, how is this even a game? There isn’t really a winner or a loser, unless the guessers give up, in which case the answerer is disappointed and everyone loses.
That’s because Twenty Questions is a cooperative, asymmetric game. It’s cooperative because everyone is ultimately working toward a common goal: deducing the answer. And it’s asymmetric because different players have different amounts of information: the answerer knows the thing, but the guessers don’t, and they gain more information about the thing as they ask more questions. Information asymmetry is unstable and makes us uncomfortable, but if the guessers figure out the thing, the asymmetry is eliminated and everyone’s happy. So how can they get there in as few questions as possible?
This goes back to binary search. Remember how if I’m thinking of a number from 1 to 100, you wanted to guess 50 to split the options perfectly in half? If you had instead guessed 90, the majority of the answers are lower, so most of the time I’ll say “lower” and you’ll still have 89 numbers to sift through. But by guessing 50, you minimize that majority, which ends up minimizing the expected number of remaining options. Splitting the options perfectly in half is the optimal strategy, ensuring that every guess gives you as much information as possible.
The same logic applies to Twenty Questions. In a truly optimal strategy, each question would split the remaining things perfectly in half. Realistically, this is pretty infeasible, unless you have a perfect mental taxonomy of everything in existence. But you can still use this as a heuristic to guide your question-choosing strategy.
For one, don’t start guessing individual things (like asking “is it a toothbrush?”) until you’ve already narrowed things down significantly. If the thing is right, then it’s your lucky day, but if not, it makes essentially no progress and leaves you in exactly the same place as you were before. It’s like playing the 1 to 100 game and guessing 100 first.
With that in mind, we can start thinking about what questions are effective in a game of Twenty Questions. I’ll start with the first question I ask in every single game, a question so important that many other questions make no sense without answering it first, a question so ubiquitous that my friends and I started abbreviating it simply as “is it T?”:
“Is it tangible?”
This question crucially distinguishes two types of things: ones that exist in a physical form (like objects, places, and living things), and ones that don’t (like ideas, actions, and events). This is about as good of a 50/50 split as it gets—each subgroup is its own little infinity.
So now you know whether it’s tangible. Great! Now what? It’s time to narrow things down.
These are often easier to figure out. We have a strong intuition for how to categorize and compare physical objects, so we can more closely approximate binary search with them (remember, binary search only works on sorted data). But there are a lot of tangible things—infinitely many! Here are some questions that roughly halve that infinity into two smaller infinities:
“Is it alive?”
People like to think of animals, plants, and especially other people, and luckily scientists have gotten really good at classifying these. If it’s living, you can start deducing the thing taxonomically (“Is it an animal? Is it a mammal? Is it a member of the family Mustelidae?”). And if not, you’ve just cut out a ton of possibilities.
“Is it countable?”
This is a little more subtle. It basically asks if the answer is more of an uncountable “stuff” (like sand or grape juice) or a countable “thing” (like a bench or a fanny pack). Lots of questions only make sense for stuffs or for things, so this is a helpful distinction moving forward.
“Is there more than one of it?”
Hey, it’s a question that only makes sense for things, not stuffs! If it’s a yes, you can start to ask questions about how many there are or where they’re usually found (inside, outside, in the bathroom, on the wall, etc.). But if it’s a no, you can start narrowing down on the thing’s location, which is usually a straightforward path to the endgame.
“Is it bigger than a breadbox?”
This question has been known to prompt confusions like “well, it’s bigger in one axis, but smaller in another” or “wait, what’s a breadbox?” But this is an unironically great question. In practice, about half of the physical things we think of are bigger than a breadbox, so this question is useful to gauge whether we’re dealing with more of a fridge-sized thing or a baseball-sized thing. Either way, it’s such a classic question that it’s irresistible to ask.
“Is it made of a common material?”
From here, if it’s a yes, you can go on to deduce the actual material, which is good to know. This is one of many “meta-questions” that I’ve found surprisingly useful—questions that don’t carry much information themselves, but can help guide the guesser’s train of thought and open the door for more informative questions. Other questions of this kin include “is it usually a certain color?,” “is it typically used by a certain type of person?,” and even questions like “would it be helpful to ask about X?”
Intangible things can be harder to conceptualize and categorize, which often makes them trickier to figure out. But don’t panic! These can make for very interesting games, and they’re doable as long as the guessers have a decent plan of attack.
Here’s one bad question people always seem to be tempted to ask: “Is it an idea?” Really, basically everything is an idea, so this doesn’t actually give you any information. Instead, I recommend trying some of these questions:
“Can you observe it with any of the other five senses?”
Sure, you can’t touch it. But can you see it, hear it, smell it, or taste it? Okay, probably not smell or taste, unless the answerer is thinking of “umami” or something. But plenty of intangible things are visible or audible (like the color purple, or the Nike Swoosh, or “Mambo No. 5”), so this is useful information.
“Is it something that happens?”
A lot of intangible things are events, phenomena, or actions: things that happen! If this is the case, you can go on to figure out where, when, and other conditions under which it happens.
“Is it fictional?”
Fictional characters and things are a surprisingly tough type of answer to figure out, since it’s easy to go down a rabbit hole of intangibility before realizing it’s just a tangible thing in an intangible universe. It’s not a bad idea to ask this early on just to crack open this case.
“Would you learn about it in a particular class?”
This is probably my favorite meta-question for intangible things, because it helps you deduce so many different things. If the answer here is yes, you can go on to deduce the class, which can be a helpful frame of reference. This is huge for mathematical objects, scientific phenomena, historical events, and lots of other categories of things.
“Does it involve a particular tangible thing?”
Many intangible things can’t exist without something tangible. For example, tennis isn’t tangible, but it involves tennis balls and tennis rackets. A high five isn’t tangible, but it involves human hands. In a lot of cases, this question can reduce a hard intangible Twenty Questions game into a much easier tangible game. Once you figure out that tangible thing, all that’s left is to deduce how exactly it’s involved. Does it require more than one of them? Does the thing move? The world is your oyster.
Now we’ve talked a lot about the guessers’ strategy. But that’s just one side of the story! There’s also a surprising amount of strategy involved in being the answerer.
In many ways, the answerer is the leader of the Twenty Questions game. Like the creator of an escape room, or the dungeon master in Dungeons & Dragons, their ultimate goal is to make the game fair and fun for the players. The answerer has two main responsibilities: answering the questions, and coming up with a thing in the first place.
When it comes to answering questions, you might think the answerer’s role is pretty clear-cut: just say yes or no. But in practice, many yes–no questions are more accurately and fairly answered with more nuance than that. In cases where “yes” or “no” don’t tell the full story, it’s in everyone’s best interest to give answers like “it depends,” “usually,” “irrelevant,” or “I can’t answer that because your question entails a false assumption about the answer.”
I have to get this off my chest. Look, I love word games as much as anyone else. But at its core, Twenty Questions is a thing game, not a word game, and there are a few problems with thinking of it as such.
First off, if the answerer is just thinking of a word, then the guessers can just deduce the answer by rote binary search, asking “is the first letter in the first half of the alphabet?” and “is the first letter between H and M?” and so on. Now if this is your definition of fun, then have at it. But to me, these kinds of questions collapse the game into a triviality, eliminating any notion of strategy or creativity. If the guessers get so frustrated that they resort to these questions, I try not to answer them and instead steer them in the right direction.
But in most cases, it doesn’t even make sense to reduce the thing to the word that represents it. What we actually care about is the referent, the thing the word actually refers to, not the word itself. This is because one word can often have multiple possible referents, and the game falls apart unless the thing is just one of them. For instance, one of my friends once chose the thing “mark” (since my buddy Mark was right there). The thing is, “mark” can mean a lot of things—it has dozens of definitions listed on Merriam-Webster as a noun alone—and my friend wasn’t thinking of one in particular. This meant pretty much all his answers were wishy-washy and unclear, and it didn’t take long for us guessers to rage-quit.
Sometimes the distinction between meanings is more subtle, but equally game-ruining. Let’s say the answerer is thinking of McDonald’s. Now, is it tangible? The correct answer is: it depends. They might be thinking of a physical McDonald’s restaurant building, which is tangible, or they might be thinking of the corporate entity McDonald’s, which is intangible. It’s subtle, but the distinction is there. This is what linguists call polysemy, when one word has multiple meanings that are closely related, but distinct.
So, when you’re the answerer, make sure to think of a thing, not just a word. A neat side effect of this is that the guessers don’t have to name the thing verbatim, as long as it refers to the same thing. If the answerer thinks of a roundabout, the game still ends if someone guesses a “rotary” or a “traffic circle.”
Alright, the only exception here is if the answerer’s thing is literally a word, in which case the best way to deduce it is by first figuring out that it’s a word, then binary searching the alphabet. But I’ve tried this before and I can’t say I recommend it. You can imagine my friends’ excitement when they found out my thing was “the word because.”
Finally, this brings us to the strategy behind selecting a thing. There are a few factors to consider here.
If the answerer chooses a ludicrously obscure thing, the guessers will never figure it out, which is no fun. But if the thing is so straightforward that the guessers will quickly get it, there’s less of an “aha!” moment, which makes the game less satisfying. That means selecting a thing is a balancing act: think of something that’s a little out-there, but not too out-there.
In my experience, the most interesting things for an answerer to choose are ones that are well-known but difficult to categorize. Something like a person, food, animal, or location is a perfectly cromulent thing, but once the guessers figure out its category, deducing the thing suddenly becomes straightforward. But if the thing isn’t in a clear-cut category, it takes a little more mental gymnastics to get there.
Here’s an assortment of things I’ve found that strike that balance—familiar, but challenging to deduce—and make for a certifiably fun game of Twenty Questions:
Now it’s your turn! Go play Twenty Questions with your friends (but with infinite questions, of course), and give some of the above things a try or come up with your own. You’ll quickly be reminded how elegant the game is: few rules, zero equipment, and truly infinite replay value. It feels inevitable—if you restarted society from scratch, someone somewhere would eventually reinvent Twenty Questions. It’s what I like to call a “nothing game,” in that you need absolutely nothing to play it, just some friends and a little bit of patience.
I might go into Twenty Questions variants and other “nothing games” in future posts. But while you’re waiting, there’s no better way to pass the time than a good old fashioned game of Twenty Questions.
]]>