Write a post

Image Aesthetic Assessment

Published Jul 15, 2017
Image Aesthetic Assessment

'Beauty is really in the eye of the beholder'

Image aesthetics assessment is an attempt to define the beauty of an Image.
While everyone has different tastes, there are universally accepted norms when it comes to beauty – things which everyone pretty much agrees are beautiful, like sunsets or sunrises over the mountains or the ocean.

Beautiful Image

Some of the visual feature that come handy are

  • edge distributions,
  • color histogram,
  • Some photographic rules like rue of thirds also determines the beauty of an image.

Defining image quality with visual features like other manually curated features are limited in the scope.

The two photographer's story.

Great Shot!! So what?
Beautiful Image low quality Image

The Image is of same place taken with different lightening, angle, adjusted contrast. And it is obvious that image on the left has better aesthetic attire.

Significance of Image Aesthetics

For a platform especially that serves media content, one of the crucial aspect is to show high quality content. With social sites and the given ‘selfie’ trend, we are generating huge amount of data in the form of either images or videos.
Having a track on the quality will always be helpful.

Curated Content User Generated Content
Curated Content User Generated Content

Can we model such Human Perception?

Deep learning

The topic needs no introduction. It’s a revolution especially in the image classification domain since the last 5 years. With “Alexnet” winning the Image-Net competition, improving error rate with a huge margin acted as spark in the field. Since then, CNN has many state of the arts on its name.

Network architecture of Alexnet.

Alexnet architecture

The first layer is input, where input in fed to the network. We can see there pooling operations, convolution operations finally followed by a fully connected layer
and final softmax layer so that we get values as the probability for each class we label.

Fixed size input constraint

Input Layer
Input Layer

We always resize the input feature vector. If the image is larger, image is cropped
or pad image if image dimensions are smaller, to get a fixed size input to fed the network

The Mountains Qutub Minar
Mountains Qutub Minar

The above two images are beautiful in their original aspect ratio.
What happens if we re-size the image to a fixed size of 224 * 224?
Certainly the image will loose all it’s original aesthetic value!
From Landscape to Squared size. All damage is done. The original image composition is lost when image is re-sized.

Demystifying the Network Architecture

Network Architecture

Let’s unveil the hidden layers! So, we can see that after the input , there are few layers of Operations.
The operations are either Max-pooling or convolving with a filter i.e. Convolution.
So why fixed size of input is required at all then?
It’s because of the Fully Connected Layer just before the outputs.
Fully Connected Layers are in the network for non-linear combination of feature extracted before in convolution network.

Let's understand bit by bit.

Max Pooling

Max pooling are there for Down-sampling the feature space while maintain the spatial information
Max Pooling in action

Max Pooling

Spatial Pyramid Pooling

In spp, an image is divided into bins. Each bin is pooled in its turn. As the number of bins are fixed,
we always get the Fixed Shape Output.

Spp operation in action

Spatial Pyramid Pooling

Spatial Pyramid Pooling

Spp Network Architecture

Spp Network Architecture

The first network is the traditional CNN , we can see the Max-pool layer just before the fully connected layer.
In the second architecture, the last max pooling layer is replaced by a Spp layer.
With the Fixed Bin size (1,2,4) we make sure that the fully connected layer gets the fixed shape input.

Spp Network Architecture

Training the Spp-Net

Training the Spp-Net on live-dataset, very small dataset, about 1K images total, model achieved the accuracy of 75% on training data, 83% on the test data.

Accuracy Training Loss
Spp-Net Accuracy Spp-Net Training Loss


With Spp in Network

  • Model learns the scale invariant feature like SIFT(traditional image processing algorithm).
  • One of the challenge in text classification with Deep learning is the fixed size feature vector representation of sentence.

Interesting Results

Blurred Cropped Image, Model predicted high score of 0.46
Blurred Image
Complete Image, Model predicted high score of 0.94
Complete Image


This blog orginally appears here
With that I would like to wrap up. Any Questions ?

Discover and read more posts from Amit Kushwaha
get started
Enjoy this post?

Leave a like and comment for Amit

Be the first to share your opinion

Subscribe to our weekly newsletter