Rohit Agrawal

Software Developer | Data Scientist | UI/UX Designer

How and why I built Snapchat Filter System

Published Jul 23, 2019

About me

I am an experienced data scientist and web developer with over 4+ years of experience. I specialize in using data analytics to help businesses make smarter decisions and thus help them scale. Moreover, I provide end-to-end website development services in order to help you establish an online presence and thus maximize your business's reach and expand your customer base. Additionally, I can help you integrate AI services into your stack regardless of the size of your organization. I always try to embed my applications with modern and minimalistic UI design principles which helps your customer correlate your brand with quality.

The problem I wanted to solve

I had been reading about Facial Keypoint Regression during one of my courses on Coursera about computer vision. When I read about it, I was instantly able to relate it to the way Snapchat created its addictive filter system. So, I wanted to

Tech stack

I downloaded a keypoint regression dataset off of Kaggle and created a CNN based model using Keras. Finally, I implemented the trained model using OpenCV.

The process of building Snapchat Filter System

The Dataset

Dataset
The dataset I used is the following: Facial Keypoints Detection provided by Dr Yoshua Bengio of the University of Montreal.
Each predicted keypoint is specified by an (x,y) real-valued pair in the space of pixel indices. There are 15 key points, which represent the different elements of the face. The input image is given in the last field of the data files, and consists of a list of pixels (ordered by row), as integers in (0,255). The images are 96x96 pixels.
Now that we have a good idea about the kind of data we are dealing with, we need to preprocess it so that we can use it as inputs to our model.

Step 1: Data Preprocessing and other shenanigans

The above dataset has two files that we need to concern ourselves with — training.csv and test.csv. The training file has 31 columns: 30 columns for the keypoint coordinates, and the last column containing the image data in a string format. It contains 7049 samples, however, many of these examples have ‘NaN’ values for some key points which make things tough for us. So we shall only consider the samples without any NaN values

Everything well and good? Not really, no. It seems that there were only 2140 samples which didn’t contain any NaN values. These are way fewer samples to train a generalized and accurate model. So to create more data, we need to augment our current data.

Data Augmentation is basically a technique used to generate more data from existing data, by using techniques like scaling, translation, rotation, etc. In this case, I mirrored each image and its corresponding key points, because techniques like scaling and rotation might have distorted the face images and would have thus screwed up the model. Finally, I combined the original data with the new augmented data to get a total of 4280 samples.

Step 2: Model architecture and Training

Now let’s dive into the Deep Learning section of the project. Our aim is to predict coordinate values for each key point for an unseen face, hence it’s a regression problem. Since we are working with images, a Convolutional Neural Network is a pretty obvious choice for feature extraction. These extracted features are then passed to a fully connected neural network which outputs the coordinates. The final Dense layer needs to 30 neurons because we need to 30 values(15 pairs of (x,y) coordinates).

‘ReLu’ activations are used after each Convolutional and Dense layer, except for the last Dense layer since these are the coordinate values we require as output
Dropout Regularization is used to prevent overfitting
Max Pooling is added for Dimensionality Reduction

The model was able to reach a minimum loss of ~0.0113, and accuracy of ~80%, which I thought was decent enough. Here are a few results from the model performance on the test set:

Step 3: Put the model into action

We got our model working, so all we gotta do now is use OpenCV to do the following:

Get image frames from the webcam
Detect region of the face in each image frame because the other sections of the image are useless to the model (I used the Frontal Face Haar Cascade to crop out the region of the face)
Preprocess this cropped region by — converting to grayscale, normalizing, and reshaping
Pass the preprocessed image as input to the model
Get predictions for the key points and use them to position different filters on the face
I did not have any particular filters in mind when I began testing. I came up with the idea for the project around 22 December 2018, and being a huge Christmas fanboy like any other normal human being, I decided to go with the following filters:

I used particular key points for the scaling and positioning of each of the above filters:

Glasses Filter: The distance between the left-eye-left-keypoint and the right-eye-right-keypoint is used for the scaling. The brow-keypoint and left-eye-left-keypoint are used for the positioning of the glasses
Beard Filter: The distance between the left-lip-keypoint and the right-lip-keypoint is used for the scaling. The top-lip-keypoint and left-lip-keypoint are used for the positioning of the beard
Hat Filter: The width of the face is used for the scaling. The brow-keypoint and left-eye-left-keypoint are used for the positioning of the hat

Result

What I learned

One of the most important things about the project was the fact that I took a concept and made something practical out of it. I could have just mugged it up and forgotten it about a month later, but actually trying to implement it by getting my hands dirty proved to be immensely beneficial and helped me get a much clearer understanding of the project.

Future Work and Conclusion

Although the project works pretty well, I did discover a few shortcomings which make it a little shy of perfect:

Not the most accurate model. Although 80% is pretty decent in my opinion, it still has a lot of room for improvement.
This current implementation works only for the selected set of filters because I had to do some manual tweaking for more accurate positioning and scaling.
The process of applying the filter to the image is pretty computationally inefficient because to overlay the .png filter image onto the webcam image based on the alpha channel, I had to apply the filter pixel-by-pixel wherever the alpha was not equal to 0. This sometimes leads to the program crashing when it detects more than one face in the image.

The complete code for the project is on my Github: Github Link

If you’d like to improve upon the project, or if you have any suggestions for me to solve the above issue, be sure to leave a response below and generate a pull request on the Github repo. Thanks for stopping by, hope you enjoyed the read.
Ciao!

Computer vision Opencv Deep learning Keras Machine learning

Report

Enjoy this post? Give Rohit Agrawal a like if it's helpful.

Rohit Agrawal

Software Developer | Data Scientist | UI/UX Designer

Discover and read more posts from Rohit Agrawal

get started

Be the first to share your opinion

GitHub flavored markdown supported

submit

Show more replies