How I made a talking emoji using regular emojis and JavaScript

Published Dec 18, 2017Last updated Jan 20, 2018
How I made a talking emoji using regular emojis and JavaScript

arnolod ipsum.gif

Today, while I was working, someone sent an interesting little script in a chat group:
Original script on jsbin. The original code was written by Martin Kleppe— kudos to him for the short implementation.

setInterval(_=>{
  document.body.innerHTML = "<h1>" + [
    ..."😮😀😁😐😑😬"
  ][new Date%6]
},99)

The code is very simple, but the effect is very interesting. The emoji looks like it is talking.

Now we are going to see what the code does. The first part of the code [ ..."😮😀😁😐😑😬" ] transforms the string of emojis into an array. In this way, we can select a singular element from that array.

You are probably wondering why we transform a string into an array to select a character from it. This is because an emoji isn't a singular character, but normally four bytes (an emoji, in reality, looks like this: "\xF0\x9F\x98\x81").

With this technique, it is possible to make the JavaScript engine preserve the structure of the Unicode character and split the list of emojis in the right way.

[ ..."😮😀😁😐😑😬" ][0] // => 😮
"😮😀😁😐😑😬"[0] // => � (55357) wrong value
"😮😀😁😐😑😬".codePointAt(0) // => 128558 correct value
String.fromCodePoint("😮😀😁😐😑😬".codePointAt(0)) // => 😮

As you can see, we have two valid ways to access the right emoji, and the array one is a lot shorter and easier to remember.

The array method works thanks to the iterator implementation of the String object. Instead iterating through every byte like ""[n], it iterates between every code point (a unicode character is a code point).

Now that the string is an array of well-separated emojis, it is possible to select them one by one to get a working emoji. The emoji is selected sequentially using [new Date%6]. This code returns a different sequential emoji every millisecond, the %6 makes sure that we don't go out of bounds.

4%6 == 4
5%6 == 5
6%6 == 0
7%6 == 1

Finally, the document.body.innerHTML = "<h1>" + sets the emoji as the only content of the page.

But, as you can see, the emoji doesn't change so fast and doesn't change every millisecond, but is still sequential.

This is because

setInterval(_=>{
},99)

executes the function every 99, and by making 99 milliseconds pass every time, we make the %6 every time a number decreased by one.

x = 0
x += 99 // === 99
x %= 10 // === 9
x += 99 // === 108
x %= 10 // === 8
x += 99 // === 107
x %= 10 // === 7

Now that we understand how that code works, we can start extending it. The feature I think of immediately is to make a text appear below the emoji:
http://jsbin.com/sohirid/1/edit?js,output.

const message = "Hello!";
const delay = 200;
setInterval(_=>{
  document.body.innerHTML = "<h1>" + [
    ..."😮😀😁😐😑😬"
  ][Math.floor(new Date / delay)%6] + "</h1><h2>" + message.substr(0, Math.floor(new Date / delay)%(message.length+1))
},delay)

The previous code was going into reverse. I needed to avoid that, so I need to have a number that increases. To obtain that, I use Math.floor (new Date/delay). Instead of returning the current milliseconds, it returns the current tenth of a second.

setInterval(_=>{
console.log(Math.floor(new Date / delay))
},delay)
// 15133766833
// 15133766834
// 15133766835
// 15133766836
// ...

Now that the number is progressive, I just need to limit it to a range. For the emojis, I need to stick to the number of emojis (6), and for the text, I need the length of text + 1 (I use +1 to show the latest character, remember length%legth == 0).

The result is kind of nice: the emoji moves its mouth and a text appears below. But the movement of the mouth is totally unrelated. I've been watching dubbed movies since I was born, and the ones badly dubbed ones have always irritated me, so I have to do something to make that emoji animate in a better way.

The first thing I searched was an image that illustrates the various movements of the mouth for each letter. After Googling for five minutes, I finally find the right image:

31aa050c23bc12b5c9c5aba749b34a41--animation-tools-animation-reference.jpg

Time to map the various emojis to each letter. For that, I need a nice page with all of the emojis that I can possibly need: https://emojipedia.org/apple/.

After selecting one for each unique mouth movement in the picture, I proceed to create the map:

const emojiMap = {
  "😮": ["o", "e"],
  "😐": ["b", "p", "m"],
  "🙂": ["c", "g", "j", "k", "n", "r", "s", "t", "v", "x", "z"],
  "😲": ["d", "l"],
  "😯": ["q", "u", "w", "y"],
  "😀": ["a", "i"]
}
const defaultEmoji = "😐"

A default emoji for non letters characters ( " ", "!", "," ... ) — now I just need to change the code to retrieve the right emoji.

http://jsbin.com/babivo/1/edit?js,output

setInterval(_=>{
  const character = messagemessage.toLowerCase()[Math.floor(new Date / delay)%(message.length+1)]
  document.body.innerHTML = "<h1>" + (Object.keys(emojiMap).find(emoji => emojiMap[emoji].includes(character)) || defaultEmoji) + 
    "</h1><h2>" + message.substr(0, Math.floor(new Date / delay)%(message.length+1))
},delay)

This new code is subdivided in two parts:

  1. Finding the character the emoji is currently pronouncing:
const character = messagemessage.toLowerCase()[Math.floor(new Date / delay)%(message.length+1)]

The code is simple, because we are using the same code logic that shows the message but selecting only one character. The main difference is the message.toLowerCase(), because I need it to be case insensitive when I'm checking if the character matches with the ones in my emoji map.

  1. Selecting the right emoji:
(Object.keys(emojiMap).find(emoji => emojiMap[emoji].includes(character)) || defaultEmoji)

This code first transforms the emoji map into an emoji array. In this way, I can use the emojis as keys and check each value one by one. For example:

const character = "n"
["o", "e"].includes("n") // false
["b", "p", "m"].includes("n") // false
["c", "g", "j", "k", "n", "r", "s", "t", "v", "x", "z"].includes("n") // true
["d", "l"].includes("n") // false
["q", "u", "w", "y"].includes("n") // false
["a", "i"].includes("n") // false

The find function will check the values one by one until the function I execute for that value returns true. In our case, the function I'm using is emoji => emojiMap[emoji].includes(character). That simply checks if the set of characters for that emoji includes the character I'm searching for.

If the find doesn't find anything, it will return undefined, that is a false value. By using || defaultEmoji, I can make my code returns defaultEmoji when find doesn't find anything.

The functionality now works, the emoji speaks correctly, but on the phone it looks terrible (on Android). I want it to work even on mobile, so I need to make the code return an emoji that is equal on every platform. To do that, I'm going to use twemoji.

Twemoji is a library from Twitter to use their emojis everywhere. The library is simple to use, has a method named parse that parses a string of text, and returns a string of HTML where every emoji is an image.

http://jsbin.com/dolayav/1/edit?js,output

That is perfect for now, and very simple to implement. I first include the script in the page, and I make a small change to convert my emojis into images:

document.body.innerHTML = "<h1>" + twemoji.parse(Object.keys(emojiMap).find(emoji => emojiMap[emoji].includes(character)) || defaultEmoji)

The image is a little bit low quality. Maybe we can do something. Looking at the documentation, I notice I can use SVGs.

Let's use them:

http://jsbin.com/suroqa/1/edit

document.body.innerHTML = "<h1>" + twemoji.parse(Object.keys(emojiMap).find(emoji => emojiMap[emoji].includes(character)) || defaultEmoji, {
  folder: 'svg',
  ext: '.svg'
})

The emoji now looks fine on the mobile phone, but there's too much text. To solve that, I split the text I'm showing into words, and I show only the last two:

http://jsbin.com/yegisak/1/edit

The trick is simple. To split the words, I use .split(' '), which breaks a string into an array of strings separated by a space.

const words = message.substr(0, Math.floor(new Date / delay)%(message.length+1)).split(' ')

Then, I obtain the current word the emoji is saying using .pop()

const current = words.pop()

I add everything to the page, popping out the word before the latest one.

 document.body.innerHTML = "<h1>" + ... + 
    "</h1><h3>" + words.pop() + "</h3><h2>" + current

Finally, the only minor thing is left is that on the image with the mouth position, there are not only singular characters, but also combinations, like sh and th.

It would be amazing if the script could catch them.

https://jsbin.com/hejotaw/1/edit?js,output

To catch those two character combinations, I need to look at not only the current character, but also the previous one and the next one.

const character = message.toLowerCase()[index]
const previousDouble = index > 0 ? message.toLowerCase().substr(index - 1, index + 1) : ""
const nextDouble = message.toLowerCase().substr(index, index + 2)

The code I just wrote creates three variables: the current character, the current character together with the previous one, and the current character together with the next one.

const message = "arnold"

const index = 1
// character -> "r"
// previousDouble -> "ar"
// nextDouble -> "rn"
const index = 2
// character -> "n"
// previousDouble -> "rn"
// nextDouble -> "no"

Now that we have all of these three variables, we need to give priority to a couple of characters. To do that, we first search for the previousDouble on the emoji map. If we find nothing, we search for the nextDouble and finally for the current character.

emojis.find(e => emojiMap[e].indexOf(previousDouble) !== -1) 
|| emojis.find(e => emojiMap[e].indexOf(nextDouble) !== -1) 
|| emojis.find(e => emojiMap[e].indexOf(character) !== -1)

Finally, a little bit of cosmetic improvement and the talking emoji is done!

There's still a lot of room for improvement, here a couple of ideas if you want to extend this script further:

  1. Use the Web Speech API to make the emoji talk for real. This will have several challanges, for example sync the "lips movement" with the voice.
    Difficulty: 4/5
  2. Use D3 to animate the transition from one emoji to the another making it more natural and realistic.
    Difficulty: 3/5
  3. Stop using emoji and use more expressive images, also recognise more diphthongs. This is easier because the code needs nearly 0 changes.
    Difficulty: 1/5
Discover and read more posts from Maurizio Carboni
get started