Matthew Corrigan

Web & Game Developer

How to Imitate Trump With Markov Chains

Published Aug 31, 2017Last updated Feb 27, 2018

Have you ever wanted to sound presidential? Today, we’ll learn how Markov chains can help us do that. We’ll be using data from many of Donald J Trump’s speeches to help our program speak like him. Before we get started: This article is meant to be a lighthearted and practical introduction to Markov chains, not a heavy political commentary.

What is a Markov chain?

Markov chains were invented by a mathematician named Andrey Markov. They’re commonly used in probability to determine the chance an event will happen, given that another event already happened. Here’s a useful diagram to explain how they work:

Credit goes to http://techeffigytutorials.blogspot.com

In this diagram, we see three possible events: cloudy, rain, and sunny. This chart displays the probabilities of the next event, given one of those 3 events happened. For example, if the weather today was cloudy, then we could use these probabilities to predict the weather for tomorrow. If we follow the arrows, we can see there’s a 10% chance it will be cloudy again, a 50% chance it will rain, and a 40% chance it will be sunny.

So we understand how Markov chains work, but how are they created? How do we know that there’s a 10% chance it will be cloudy again? We do this by looking at prior events. For example, in this data set we would look at what the weather was after every cloudy day. If we saw that 1 out of 10 times, the next day was cloudy, then we can say that there is a 10% chance the weather will be cloudy the next day. We repeat this for all variations of weather to build a full list, or chain, of probabilities.

Adding in Some Context

We’re going to use a Markov chain to generate unique sentences that sound like they could come from Donald Trump. In order to do this, we will use text from some speeches he has given, and then create a Markov chain from them. To create this Markov chain, we’ll have to split the text into words, or as we’ll call them, tokens. These tokens will be the “events” in our Markov chain. Next, we’ll choose a starting word to start our chain. To get the next word, we find the probabilities of all the words that come after that starting word, and then we repeat that process until we reach the end of a sentence.

For example, if our starting word is “I”, then we find all the occurrences of words following I. Maybe “will” comes after it 5 times, “am” comes after it 3 times, and “support” comes after it 2 times. In total, there are 10, so we can easily calculate percentages. After the word “I”, there is a 50% chance the next word is “will”, a 30% chance the next word is “am”, and a 20% chance the next word is “support”. We choose one of these words randomly, giving them a higher chance of being chosen the higher percentage they have.

After we choose the second word, we repeat that process to find the word that comes after that, and the word that comes after that, etc. We stop after we find a word that contains a period, and that means we have generated a sentence!

Adding in Some JavaScript

To make our lives easier, we’re going to use a library called nlp, which will help us split the text into tokens(words). Let’s start off by creating our HTML file, and linking to this nlp library.

CODE SAMPLE 1

Great! We’re all set up to start making some Markov chains. Let’s initialize the library, and create some variables to hold all our text and tokens(we’ll focus on just the script tag for now).

CODE SAMPLE 2

Next, we’ll create a function to fill up the tokens variable with the words from trump_speeches.

CODE SAMPLE 3

Let’s break this down so we can see how this code works. First off, we create the nlp object that we need by calling nlp.text() on our data. The next step is to split this into tokens so we can use it to create our Markov chain. We do this on line 5 by calling the .terms() method on our object. Lastly, we loop through the term objects and add their text to our tokens array by checking the .text attribute.

If you remember from the explanation above, we now have to choose a starting word. We’ll do this randomly by using Math.random().

CODE SAMPLE 4

All that function does is choose a random item from our list of tokens. We’ll call this function later on when we want to create our sentence. For now, let’s move on to creating the actual Markov chain.

Adding in Some Probabilities

Now that we have our starting word, we need to find the next word. If you recall, we do this by checking all the occurrences of the starting word, and creating a list of words that follow it. We then select a random word from this list. We can do this using the code below.

CODE SAMPLE 5

Let’s break it down!

Line 3: Creates a list of words after the current word
Lines 4: Loop through the tokens
Lines 5-6: If we find the current word, add the following word to nextWords
Lines 10–11: Return a random word from the nextWords list

You might be wondering why we didn’t calculate any probabilities. The reason why we didn’t is because we don’t have to! If a word shows up more than once after our current word, it will be added to the list more than once. More occurrences of this next word means a higher chance that it will be selected when we return a random word from the list. Simple, isn’t it?

Putting it All Together

Let’s call all the functions we made in one giant function. Here’s what it looks like:

CODE SAMPLE 5

What we’re doing here is we’re splitting the text into tokens using createTokens, choosing a starting word with chooseStartingToken, and then repeatedly using findNextWord until we hit a period. We wrap it up by calling document.write() to output our text.

And that should do the trick! Here’s what all our code looks like:

CODE SAMPLE 6

And there you have it. Open that file in any browser, and see what it writes! Most of the time, it will be gibberish, because our program has no concept of grammar. However, many of Trump’s quotes are hard to understand as well(check this out). Here are some of my favorites that the program generated:

He said, “You know, we’re illegal executive order on the candidates, they make it for president.

Tiffany, Evanka did a lot of Common Core.

We will do very, very well.

If you enjoyed reading this article, make sure to check out another one I wrote on Machine Learning with Python.

Programming Statistics Machine learning Trump

Report

Enjoy this post? Give Matthew Corrigan a like if it's helpful.

Matthew Corrigan

Web & Game Developer

I've been programming for six years. I love to work with Python, Java, and all the front end web development languages. Over the last several months I've started learning more about Machine Learning and Artificial Intelligence. I ...

Discover and read more posts from Matthew Corrigan

get started

7Replies

Bram Patelski

8 years ago

Funny results: “great cheerleader for me.”

Matthew Corrigan

8 years ago

Hmm it looks like the source code disappeared… Ill try to republish the article with the source code.

Johnny B. (Ionică Bizău)

8 years ago

You can edit the existing article. Thanks!

Matthew Corrigan

8 years ago

Yep, sorry about that! Just added the code.

Johnny B. (Ionică Bizău)

8 years ago

Amazing! Thanks for this!

Btw, you can add code blocks in the article. So people would not have to open the gist urls.

Ian Wang

8 years ago

FYI, you can use @[alt text](your-source-url) to embed media like Gist, YouTube, Twitter, and more.

Johnny B. (Ionică Bizău)

8 years ago

Sorry if I’m missing something, but where is the source code?

Nice article!

Show more replies