Last December I participated in the 2014 London Music Hack Day. These events are always a lot of fun, lots of great people hacking on lots of great things, a list of which you can find on Hacker League

My hack for this edition was a rhyming Twitterbot (@RhymeBustingBot) made using the MusixMatch and Echonest API’s and the Natural Language Toolkit (NLTK) Python library. The bot was MusixMatch’s favourite hack and won their hack day prize (thanks guys!)

Here’s how it works:

  • Find two random rappers using the Echonest api, one based on familiarity and the other on “hottness” (this was meant to pitch a new school rapper against an old school one but in practice it turns out to be pretty random).
  • Find the lyrics to two random songs featuring those rappers using the Musixmatch api.
  • Using the NLTK library, more specifically the Carnegie Mellon Pronounciation dictionary, see if there is a line in the song by rapper a that rhymes with a line from the song by rapper b.
  • If any such rhymes are found, tweet the “best” lines with links to the original lyrics.

Here are two of my favorite rhymes that the bot has unearthed:

Pharcyde vs. House of Pain

ODB takes on the crown of the Fresh Prince

The bot was inspired by the ever awesome @Pentametron who’s writing the worlds longest poems using Tweets. Its rhyming skills are way better than the RhymeBustingBots, combine this with its use of meter and the random nature of the tweets and you get a pretty funny bot.

Finding rhymes

If you are technically minded you can take a look at the code that makes up the RhymeBustingBot on Github. Here’s a few words on how it works.

The NLTK includes the CMU pronouncing dictionary, and we use that to compare the pronounciations of the last word of each line. Starting from the end of each word, count back and see how the wowels and the consonants match up.

Let’s take a classic Biggie line to illustrate:

It was all a dream, I used to read Word Up! magazine
Salt-n-Pepa and Heavy D up in the limousine

Here’s the pronounciation of the two rhyming words according to the Carnegie Mellon dict

  • Magazine: [u’M’, u’AE1’, u’G’, u’AH0’, u’Z’, u’IY2’, u’N’]
  • Limousine: [u’L’, u’IH1’, u’M’, u’AH0’, u’Z’, u’IY2’, u’N’]

The letters outline the different sounds used and the numbers are used to indicate where the emphasis is placed. You can see how the n, iy, z and ah sounds match up in this instance and that the emphasis is the same on the wowels too. Based on this the rhyme finding algorithm assigns this rhyme a value of 4.

False positives!

That is all good and well, but the bot does have a problem with false positives.1 Like this tweet, where the bot seems to think “bacon” and “woman” make a good rhyme. Maybe a drawling southern rapper could sell that as a great rhyme but to me it doesn’t really hold. The bot however, sees both these words as ending with an AH-N sound and therefore as a rhyme with a rating of two, making it the best rhyme found in these two songs:

  • Bacon - [u’B’, u’EY1’, u’K’, u’AH0’, u’N’]
  • Woman - [u’W’, u’UH1’, u’M’, u’AH0’, u’N’]

I’m not sure how to improve on this, other than to start white- or blacklisting rhymes or provide the bot with some editorial oversight (which seems to defeat the whole point).

Regardless, putting this thing together was a fun project and got me a bit more familiar with natural language processing. Also, every once in a while, the bot stumbles upon a very random but good rhyme which always makes me chuckle.

  1. As I noted during my demo fail at MHD: “60% of the time it works every time” (a well placed Anchorman quote will get you out of almost any trouble)