Published May 07, 2017Last updated May 17, 2017

Normalization originated from the idea that certain table designs (and the design of the relationships between them), can increase the efficiency of data flow; influence the length and complexity of queries, as well as the ability of the query to produce the results you really need.

Consider the manager of a workshop wants to add a list of materials to a table:

He knows that each material has certain qualities and use and that he wants to store them in the best possible way. So a Database administrator may for example be given a list of inventory and that is all. But a list of inventory does not tell, perhaps, which tools should to be used with which machine. Or where they are located in the workshop. Let us start with considering what we want to know about our environment (in this case); and include this in our data design:


FIRSTLY - The rules for first normal form dictate that each data table must represent a single subject. So perhaps we may start by thinking about a process and its parts, or different business documents, or the people involved - "machines" or "materials" and "workshop staff" or perhaps "vehicles". The things, people, objects or subjects involved in a function.

A table about employees contain employee data. A table about tools contain data about tools. And those tables contain data ONLY about their specific subject matter.

SECONDLY - data items are not to be unnecessarily repeated. On closer inspection, this would make sense, as we would perhaps not like to have to make updates to multiple tables if a price of an object or service changes.

THIRDLY - A primary key is established in relational data structure so that all other items in the table may be made reference to by it; and thus, this usually ends up being some form of record ID (e.g. an employee number), but this depends on the design or information available.

So if it is as simple as this; why is normalisation such a scary concept; and why the hell are there four normal forms in my text reference?


Second normal form is a technique that may assist you to identify columns in an existing table that may be primary keys, but were not identified as such before. This data may thus then be split out into their own tables, with their own primary keys. The main idea is that data fields in a table are dependent on each other. And those within a table that have strong dependencies on each other, belong together. Like a 'employee name', 'surname' and 'address' in a table with 'warehouse storage area' (as work area), 'access code', and 'access level', may work better as two tables - e.g. 'employee' and 'security', because of the way the fields depend on each other. NOTE: a table in second normal form is usually already in first normal form.


What is a transitive dependency? These are fields that over time; as data changes; cause data anomalies. It can be anything that multiple records in a table have in common, and that if one record is updated, without updating the others, may cause confusion or corrupt calculations - for example a table 'jobcards', that initially contain fields 'employee num', 'job class', 'charge'. By changing a 'charge' amount for one field, you might later forget to update others. Thus 'job class' and 'charge' probably works better in their own table.


Third normal form without multiple sets of multivalued dependencies:
A single tool may have multiple uses. If more than one category exists for using a tool; we may want to consider creating a categories table; splitting all category related data out into it; and then making reference to it with a foreign key in the tools table.

Normalization may in fact allow us to design better databases. Yes; it very well may.

Discover and read more posts from Theresa Mostert
get started