It makes cents: data representation choices
How should you store the value of 5%? Or $10 in expenses?
These seemingly trivial little questions can cause development pain and allow errors to seep into a program.
Let's start with money.
Monetary values are represented in all kinds of ways. One million dollars might
be represented as "$1M" or "1,000,000 USD" or "$1,000,000.00" Smaller numbers
can likewise be represented in different ways, and as these three examples show
its not just the currency symbols that affect this.
Hopefully your first thought is to store a number. Storing "$1M" might be
acceptable if you need some additional cached value, but the source of truth
should be a number. This makes different representation simple as well as allows
But you probably already had a numeric format in mind.
That leaves floats and integers, right? Floats allow you to store data past the
decimal point and integers can be used for both whole number amounts and numbers
with fractional values by padding the stored amount, e.g. multiplying by 100. So
which is the better choice?
The punchline is neither. Use a decimal instance.
Float values are subject to unexpected rounding and imprecise results, which is
not acceptable for most financial calculations or representations. And while the
padded integer trick works, it requires not only that you enforce a specific
level of precision (using a coefficient of 100 you won't be able to represent
half cents) but that you always remember the conversion.
For Python users, use
Surely percentages are simpler. 99.9% should be stored as
Not so fast! A percentage is a decimal value, a fraction of 100. Chances are
pretty good that if you're storing percentage values you're making some kind
of calculation with them. In this case you'll be transforming these values
everytime you make that calculation.
Instead, store the underlying value, that is
Yep, you're going to have to transform this when you represent it or store user
input. The difference between this conversion and the one mentioned with money
is that the former padded integer conversion involves converting to an
unexpected format whereas encountering a percetage value as a decimal is
expected. So you remove the unexpected conversion from all calculations and
add in understandable conversion to and from the user.
Store as a decimal, using the actual percentage value.
This one has to be silly obvious. You store a number with whatever sign it has
in the calculation. Right?
Here there isn't a right or wrong answer, rather you should consider the
calculations you'll perform with the numbers (if any). For instance, it might
make sense to store an expense as a negative number. This would allow you to
simply sum expesnes and income values, for example, and get the result.
Of course computationally it's trivial to negate these if they're stored as
positive values. The question is, again, what is expected? If you find yourself
frequently or always negating a value for either calculations or
representations, it might make more sense to store with that sign.
The commmon thread
In each case we're presented with a choice: modify the value for calculations or
for representation. For instance, if you need to use a threshold percentage
value, storing this as a percentage value,
0.78 instead of
78 means there's
fewer opportunities for confusion and hence errors when updating calculations.
This does mean that the value a user inputs and the value presented to a user
might need to be transformed. And this might even be more work. However the risk
to code maintenance and correctness is greater.
A rule of thumb I use is how much sense would it make seeing this value in a
database and then performing calculations in SQL with it. Are you dividing
numbers by some arbitrary coefficient to get something sensible to add? That's a
sign you probably have a representation format where you want a storage format.