Arjun S Kashyap

Coding things up since 15 years. Learnt things the heard way and have found simpler ways to explain concepts and make things easier!

Precision and recall — a simplified view - Towards Data Science

Published Dec 29, 2019

Understanding precision and recall is essential in perfecting any machine learning model. It’s a skill that’s needed to fine-tune the model to produce accurate results. Few models would require more precision while a few might require more recall. We shall also discuss what we need when at the end of the article. This article aims to reduce precision and recall to apples and oranges. I literally mean it, so let’s take an example of an apple classifier. Whose main aim is to classify apples and oranges.

So, assume there is this huge farm filled with apple and orange trees. The owner of the farm wants to build a classifier that would rightly predict apples and oranges so that he could categorize them and sell. Also, the price of apples is much higher compared to oranges, so he specifically wants to build a classifier that would detect apples. In that pursuit, he builds a detector that classifies apples and oranges to which he sent a random sample of 13 fruits to classify. Because he was more focused on predicting apples (as they are costly) the model classifies apples as positives and oranges as negatives.

He made a chart as below to check how well the model had performed:
confusion_matrix_with_values

True positives: These are the apples that the model rightly predicted.
False positives: There are the oranges that the model predicted as apples.
False negatives: There are the apples that the model predicted as oranges.
True negatives: There are the oranges that the model rightly predicted.

From the chart we can draw the below inferences:

Model classified 2 oranges as apples
Model classified 3 apples as oranges
Model classified 5 apples rightly
Model classified 3 oranges rightly

Let’s modify the diagram to understand the precision and recall properly
confusion_matrix_modified_with_values
The left-hand side of the image contains the prediction results on actual apples. The right-hand side of the image contains prediction results on actual oranges.

The above image helps us gain a different insight into the model’s predictions:

Out of 8 values that it classified as apples, only 5 are real apples, 3 are oranges.
Out of 5 values that it classified as oranges, only 2 are real oranges, 3 are apples.

Now, let’s dive into precision and recall. Concerning the above image.

Precision

Precision
It is the quantity of the right predictions that the model made. In simpler words, it is:
Number of apples predicted correctly by the model / Number of apples and oranges predicted correctly by the model. It doesn’t consider the wrong predictions done by the model.

The formula for precision:

No. of true positives/ (No. of true positives + No. of false positives)

Precision for apple predictor: 5/(5+2) = 5/7 = 0.714

Recall

Recall
It is the quantity of right predictions the model made concerning the total positive values present. In simpler words, it is:
Number of apples predicted correctly by the model/Total number of apples
The total number of apples is the number of apples sent to the system i.e., 8.

It considers the wrong prediction made by the model.

The formula for recall:
No. of true positives/(No. of false negatives + No. of true positives)

For the above example, it is: 5/(5+3) = 5/8 = 0.625

So, we know that the model created by the owner of the farm has high precision but low recall!

When do we need high precision or high recall?

Models need high recall when you need output-sensitive predictions. For example, predicting cancer or predicting terrorists needs a high recall, in other words, you need to cover false negatives as well. It is ok if a non-cancer tumor is flagged as cancerous but a cancerous tumor should not be labeled non-cancerous.

Similarly, we need high precision in places such as recommendation engines, spam mail detection, etc. Where you don’t care about false negatives but focus more on true positives and false positives. It is ok if spam comes into the inbox folder but a really important mail shouldn’t go into the spam folder.

Concerning the example mentioned at the start, for the farm owner to maximize profit, he shouldn’t let apples be misclassified as oranges. So, the model needs a high recall of apples.

How to tune a machine learning model to adjust to high precision or recall?

If it is a neural network, assign a proper loss function that is sensitive to changes and doesn’t round off the values unnecessarily. The main key is the threshold value that you assign to your final layer of the neural network. If it is a case of binary classification, the threshold needs to be set in such a way as to maximize recall or precision, whichever needed. If you want to maximize recall, set the threshold below 0.5 i.e., somewhere around 0.2. For example, greater than 0.3 is an apple, 0.1 is not an apple. This will increase the recall of the system.

For precision, the threshold can be set to a much higher value, such as 0.6 or 0.7. This way you can tune the precision and recall of a neural network.
If it is any other machine learning model, you would need to tune the hyper-parameters and probability threshold to achieve higher precision or recall.

Thank you for reading! ☺

Machine learning Precision Recall Hyperparameters

Report

Enjoy this post? Give Arjun S Kashyap a like if it's helpful.

Arjun S Kashyap

Coding things up since 15 years. Learnt things the heard way and have found simpler ways to explain concepts and make things easier!

I believe in creating **your own version** of any concept to learn it in-depth and master it. I will help you do that and more. Teaching is more than just a passion for me. Got a university 1st rank in computer science with a GPA...

Discover and read more posts from Arjun S Kashyap

get started