Codementor Events

Denormalization: How, When and Why (part2)

Published Oct 04, 2017Last updated Apr 02, 2018
Denormalization: How, When and Why (part2)

Yesterday, October 3rd, 2017 the firebase team announced the addition of a new member to the firebase family. The Cloud Firestore, a fully-managed NoSQL document database for mobile and web app development. It's designed to easily store and sync app data at global scale, it also resolves data modeling limitations of the real-time database in an elegant way. it's now available in beta.

In the previous post, we saw how to use the data fanout technique to reduce the size of data snapshots sent to the client app. This also directly influences the amount of time the user has to wait for their data to load since no unnecessary data is being loaded. This is a good first step, but our work is not complete though as doing just this introduces a new problem which affects the read performance of the database. Let's find out why?

Why

We broke up our data to create collections at the root of the database for easier access and we kept references to related data using a foreign key. This enabled us to access the related data by using a join like this:

 DatabaseReference mPassengerReference;
 DatabaseReference mUserdetailReference;
  mPassengerReference = FirebaseDatabase.getInstance().getReference().child("assets/pmt1");
  ChildEventListener passengerListener = new ChildEventListener() {
     @Override
     public void onChildAdded(DataSnapshot dataSnapshot,
     				String previousChildName){
         // it called for both users on the list
         mPassengerReference = FirebaseDatabase.getInstance().getReference().child("assets/" 
        + datasnapshot.key);
         mPassengerReference.addListenerForSingleValueEvent(new  ValueEventListener() {
              @Override
               public void onDataChange(DataSnapshot dataSnapshot) {
                    // Obtain user data
               }
        });
    }
    ...
    ...
    ...
    ...
 };
 mPassengerReference.addChildEventListener(passengerListener);

This means that we nested listeners to access individual details of each passenger since we only saved a reference to it in the passengers object

"passengers": { 
  	 "pmt1": {
      	 "jane": true
         "john": true
      }
      "pmt2": { ... },
      "pmt3": { ... }
   }

It's okay to access the data this way if you are not doing this query often or if it's a small set of data. But once the data grows larger and this query is made often, it's a good idea to eliminate the inner listener and read the data once. like this:

 mPassengerReference = FirebaseDatabase.getInstance().getReference().child("assets/pmt1");
 ChildEventListener passengerListener = new ChildEventListener() {
     @Override
     public void onChildAdded(DataSnapshot dataSnapshot,
     				String previousChildName){
           //fetch all passenger data
    }
    ...
    ...
    ...
    ...
 };
 mPassengerReference.addChildEventListener(passengerListener);

To fetch all passengers data like this means a single read will contain all passengers details. This will involve denormaliztion of our current data model.

Denormalization is the duplication of same data at different locations in our database where they are needed in order to simplify or reduce querying.

How

Our current data looks this:

{
  "assets": {
    "pmt1": {
      "name": "pearl transit 01",
      "lat": 6.5244,
      "long": 3.3792,
    },
    "pmt2": { ... },
    "pmt3": { ... }
  }
  
  "passengers": { 
  	 "pmt1": {
      	 "jane": true
         "john": true
      }
      "pmt2": { ... },
      "pmt3": { ... }
   }
   
   "users": {
   	  "jane": {
         "name": "Jane Doe",
         "age": 24
      }
      "john": {
         "name": "John Doe",
         "age": 24
      }
   }   
}

To denormalize this data, the passengers node has to be self contained and able to provide the full details of each passenger like this:

{
  "assets": {
    "pmt1": {
      "name": "pearl transit 01",
      "lat": 6.5244,
      "long": 3.3792,
    },
    "pmt2": { ... },
    "pmt3": { ... }
  }
  
  "passengers": { 
      "pmt1": {
      	 "jane": {
            "name": "Jane Doe",
            "age": 24
      	 }
      	 "john": {
         	"name": "John Doe",
         	"age": 24
      	 }
      }
      "pmt2": { ... },
      "pmt3": { ... }
   }
   
   "users": {
   	  "jane": {
         "name": "Jane Doe",
         "age": 24
      }
      "john": {
         "name": "John Doe",
         "age": 24
      }
   }   
}

Notice the passengers node now contains full details of each passenger and this data is now duplicated in both the passengers and user node. This ensures that we only attach listeners once to get our full list of passengers and we don't have to do a complex join anymore.

One major concern with denormalization is the consistency of data, I mean if we are duplicating user data all over the place what happens if the user modifies their data at one location, eg John Doe changes his name to John KIng because he has found out who he really is. Making this change at one location does not automatically propagate it to the others. So how we ensure that all user data stays consistent at all nodes. As we have come to expect, the firebase team provides a technique called multi-path update.

Multi-path updates brings consistency to denormalization, it provides an atomic operation in which it updates all the user information at all nodes successfully or it fails and doesn't update any, no incomplete updates. We do this in java by using a HashMap

Following that we now know Johns' fullname, updating his data at all nodes goes like:

 DatabaseReference mRootReference;
 mRootReference = FirebaseDatabase.getInstance().getReference();
 Map updateObj = new HashMap();
 updateObj.put("users/john/name", "John KIng");
 updateObj.put("passengers/pmt1/john/name", "John King");\

 mRootReference.updatechildren(updateObj);

And with that, our data is updated at all nodes at once.

Denormalization is an overkill for simple applications with little data sets. A good question that can arise is when do i know to denormalize. Let's find out.

When

A general rule of thumb here is to "structure your data according to your view". Assuming, we are building an application that displays a list of assets whose destinations are Lagos and this application has a huge data set.
Instead of structuring our data like this:

"assets": {
    "pmt1": {
      "name": "pearl transit 01",
      "destination": "Kaduna"
    },
    "pmt1": {
      "name": "pearl transit 01",
      "destination": "Enugu"
    },
    "pmt1": {
      "name": "pearl transit 01",
      "destination": "Lagos"
    }
  }

and writing queries like this to sort our data:

 DatabaseReference mAssetReference;
 mAssetReference = FirebaseDatabase.getInstance().getReference().child("assets");
 Query query = mAssetReference.child(assetId).orderByChild("destination").equalTo("Lagos");
 query.addValueEventListener.....

We can simply denormalize our data according to our by view by creating another root node of assert destination like this:

"assets": {
    "pmt1": {
      "name": "pearl transit 01",
      "destination": "Kaduna"
    },
    "pmt2": {
      "name": "pearl transit 02",
      "destination": "Lagos"
    }
    "pmt3": {
      "name": "pearl transit 03",
      "destination": "Enugu"
    }
  }

"asset_destination": {
    "KD": {
        "pmt1": {
             "name": "pearl transit 01",
             "destination": "Kaduna"
         }
     }
     "LG": {
        "pmt2": {
             "name": "pearl transit 02",
             "destination": "Lagos"
         }
     }
     "EN": {
        "pmt3": {
             "name": "pearl transit 03",
             "destination": "Enugu"
         }
     }
 }

And when we need a list of assets with destination Lagos, we can simply read from its node like this:

 DatabaseReference destinationRef;
 destinationRef = FirebaseDatabase.getInstance().getReference().child("asset_destination");
 
 Query query = destinationRef.Child("LG");
 query.addValueEventListener.....

And that's it, we have successfully simplied the query through denormaliztion and our application don't have to keep sorting our huge data set for assets with location Lagos.

Discover and read more posts from Michael Nwankwo
get started
post commentsBe the first to share your opinion
williams313
4 months ago

"Exploring the intricate world of Denormalization in ‘How, When, and Why (part2)’ while keeping an eye out for the elusive Spelling Bee pangram hint! What clever words have you discovered? Share your insights and let’s unravel the data mysteries together!
https://sbhinter.com/spelling-bee-hints/

Ali Hassan
4 months ago

Denormalization in databases is a bit like finding a faster route to fetch information. It’s a technique used to speed up data retrieval, especially for complex queries or real-time analytics. This process simplifies things by reducing the number of joins needed, making it great for scenarios where quick data access is crucial. However, it comes with trade-offs like increased storage needs and a bit more maintenance to ensure everything stays consistent. Overall, it’s a trade-off between faster data retrieval and managing a more intricate system!

william anderson
4 months ago

The SOA-C02 exam, designed for architects and developers, aims to assess their proficiency in designing distributed systems on AWS. This exam is not only a testament to an individual’s knowledge but also an acknowledgment of their ability to address real-world challenges in cloud architecture.

https://www.myexamcollection.com/SOA-C02-vce-questions.htm

Show more replies