Denormalization: How, When and Why (part2)

Published Oct 04, 2017Last updated Oct 05, 2017
Denormalization: How, When and Why (part2)

Yesterday, October 3rd, 2017 the firebase team announced the addition of a new member to the firebase family. The Cloud Firestore, a fully-managed NoSQL document database for mobile and web app development. It's designed to easily store and sync app data at global scale, it also resolves data modeling limitations of the real-time database in an elegant way. it's now available in beta.

In the previous post, we saw how to use the data fanout technique to reduce the size of data snapshots sent to the client app. This also directly influences the amount of time the user has to wait for their data to load since no unnecessary data is being loaded. This is a good first step, but our work is not complete though as doing just this introduces a new problem which affects the read performance of the database. Let's find out why?

Why

We broke up our data to create collections at the root of the database for easier access and we kept references to related data using a foreign key. This enabled us to access the related data by using a join like this:

 DatabaseReference mPassengerReference;
 DatabaseReference mUserdetailReference;
  mPassengerReference = FirebaseDatabase.getInstance().getReference().child("assets/pmt1");
  ChildEventListener passengerListener = new ChildEventListener() {
     @Override
     public void onChildAdded(DataSnapshot dataSnapshot,
     				String previousChildName){
         // it called for both users on the list
         mPassengerReference = FirebaseDatabase.getInstance().getReference().child("assets/" 
        + datasnapshot.key);
         mPassengerReference.addListenerForSingleValueEvent(new  ValueEventListener() {
              @Override
               public void onDataChange(DataSnapshot dataSnapshot) {
                    // Obtain user data
               }
        });
    }
    ...
    ...
    ...
    ...
 };
 mPassengerReference.addChildEventListener(passengerListener);

This means that we nested listeners to access individual details of each passenger since we only saved a reference to it in the passengers object

"passengers": { 
  	 "pmt1": {
      	 "jane": true
         "john": true
      }
      "pmt2": { ... },
      "pmt3": { ... }
   }

It's okay to access the data this way if you are not doing this query often or if it's a small set of data. But once the data grows larger and this query is made often, it's a good idea to eliminate the inner listener and read the data once. like this:

 mPassengerReference = FirebaseDatabase.getInstance().getReference().child("assets/pmt1");
 ChildEventListener passengerListener = new ChildEventListener() {
     @Override
     public void onChildAdded(DataSnapshot dataSnapshot,
     				String previousChildName){
           //fetch all passenger data
    }
    ...
    ...
    ...
    ...
 };
 mPassengerReference.addChildEventListener(passengerListener);

To fetch all passengers data like this means a single read will contain all passengers details. This will involve denormaliztion of our current data model.

Denormalization is the duplication of same data at different locations in our database where they are needed in order to simplify or reduce querying.

How

Our current data looks this:

{
  "assets": {
    "pmt1": {
      "name": "pearl transit 01",
      "lat": 6.5244,
      "long": 3.3792,
    },
    "pmt2": { ... },
    "pmt3": { ... }
  }
  
  "passengers": { 
  	 "pmt1": {
      	 "jane": true
         "john": true
      }
      "pmt2": { ... },
      "pmt3": { ... }
   }
   
   "users": {
   	  "jane": {
         "name": "Jane Doe",
         "age": 24
      }
      "john": {
         "name": "John Doe",
         "age": 24
      }
   }   
}

To denormalize this data, the passengers node has to be self contained and able to provide the full details of each passenger like this:

{
  "assets": {
    "pmt1": {
      "name": "pearl transit 01",
      "lat": 6.5244,
      "long": 3.3792,
    },
    "pmt2": { ... },
    "pmt3": { ... }
  }
  
  "passengers": { 
      "pmt1": {
      	 "jane": {
            "name": "Jane Doe",
            "age": 24
      	 }
      	 "john": {
         	"name": "John Doe",
         	"age": 24
      	 }
      }
      "pmt2": { ... },
      "pmt3": { ... }
   }
   
   "users": {
   	  "jane": {
         "name": "Jane Doe",
         "age": 24
      }
      "john": {
         "name": "John Doe",
         "age": 24
      }
   }   
}

Notice the passengers node now contains full details of each passenger and this data is now duplicated in both the passengers and user node. This ensures that we only attach listeners once to get our full list of passengers and we don't have to do a complex join anymore.

One major concern with denormalization is the consistency of data, I mean if we are duplicating user data all over the place what happens if the user modifies their data at one location, eg John Doe changes his name to John KIng because he has found out who he really is. Making this change at one location does not automatically propagate it to the others. So how we ensure that all user data stays consistent at all nodes. As we have come to expect, the firebase team provides a technique called multi-path update.

Multi-path updates brings consistency to denormalization, it provides an atomic operation in which it updates all the user information at all nodes successfully or it fails and doesn't update any, no incomplete updates. We do this in java by using a HashMap

Following that we now know Johns' fullname, updating his data at all nodes goes like:

 DatabaseReference mRootReference;
 mRootReference = FirebaseDatabase.getInstance().getReference();
 Map updateObj = new HashMap();
 updateObj.put("users/john/name", "John KIng");
 updateObj.put("passengers/pmt1/john/name", "John King");\

 mRootReference.updatechildren(updateObj);

And with that, our data is updated at all nodes at once.

Denormalization is an overkill for simple applications with little data sets. A good question that can arise is when do i know to denormalize. Let's find out.

When

A general rule of thumb here is to "structure your data according to your view". Assuming, we are building an application that displays a list of assets whose destinations are Lagos and this application has a huge data set.
Instead of structuring our data like this:

"assets": {
    "pmt1": {
      "name": "pearl transit 01",
      "destination": "Kaduna"
    },
    "pmt1": {
      "name": "pearl transit 01",
      "destination": "Enugu"
    },
    "pmt1": {
      "name": "pearl transit 01",
      "destination": "Lagos"
    }
  }

and writing queries like this to sort our data:

 DatabaseReference mAssetReference;
 mAssetReference = FirebaseDatabase.getInstance().getReference().child("assets");
 Query query = mAssetReference.child(assetId).orderByChild("destination").equalTo("Lagos");
 query.addValueEventListener.....

We can simply denormalize our data according to our by view by creating another root node of assert destination like this:

"assets": {
    "pmt1": {
      "name": "pearl transit 01",
      "destination": "Kaduna"
    },
    "pmt2": {
      "name": "pearl transit 02",
      "destination": "Lagos"
    }
    "pmt3": {
      "name": "pearl transit 03",
      "destination": "Enugu"
    }
  }

"asset_destination": {
    "KD": {
        "pmt1": {
             "name": "pearl transit 01",
             "destination": "Kaduna"
         }
     }
     "LG": {
        "pmt2": {
             "name": "pearl transit 02",
             "destination": "Lagos"
         }
     }
     "EN": {
        "pmt3": {
             "name": "pearl transit 03",
             "destination": "Enugu"
         }
     }
 }

And when we need a list of assets with destination Lagos, we can simply read from its node like this:

 DatabaseReference destinationRef;
 destinationRef = FirebaseDatabase.getInstance().getReference().child("asset_destination");
 
 Query query = destinationRef.Child("LG");
 query.addValueEventListener.....

And that's it, we have successfully simplied the query through denormaliztion and our application don't have to keep sorting our huge data set for assets with location Lagos.

Discover and read more posts from Nwankwo .C. Michael
get started
Enjoy this post?

Leave a like and comment for Nwankwo

1
2
2Replies
Patrick Courtnage
14 days ago

This was really helpful. Does Firestore also handle multi-path updates the same as the real-time database?

Nwankwo .C. Michael
14 days ago

Hi Patrick, glad you found it helpful. Am not sure of its full features for now but it could. I will be making a post on it soon. Stay tuned