My Logging Practices
Anyone who has ever worked on a legacy system (you know, that code you 'just' wrote three months ago) knows the importance of good logs.
Seeing the right log message when looking for a bug could save you hours of debugging time. Distracting log messages cluttering the log could achieve the opposite effect.
Let's start with some common pitfalls to see what could go wrong:
Types of Log Files
At the extreme, a 'void' log file might look something like this:
2017-08-25-19:13:44 Info Process started successfully!
That's it. That was the entire log file.
I'm pretty sure no one has ever said "Oh boy, I wish my log would look like that one!"
If you hear anyone saying that, you are looking at a sadistic maniac and you should back away slowly...
The void might take other forms, like writing the same obsolete log message that may have been relevant years ago, but right now is just noise.
The Astronaut - 'Houston, we have a problem'
2017-08-25-19:13:44 Error Process starting
2017-08-25-19:13:44 Error Initializing map of value
2017-08-25-19:13:44 Error Values initialized
2017-08-25-19:13:44 Error Usimg values for random calculation: 438, 276, 95, 8731...
2017-08-25-19:13:44 Error Process started successfully!
At the complete opposite of the Void is the Astronaut, which insists on reporting everything to the log (most of the time reported as errors).
An experienced developer might derive some value from these errors while they look at the relevant code. But what about a junior developer? A developer from another team? Operations (Ops) guy? A manager? The log will not make sense for them.
The Astronaut sometimes manifests itself after a manager decides that "we need to log EVERYTHING." Developers therefore do as they are told and log what they consider everything.
Another great benefit of the Astronaut is the number of angry phone calls in the middle of the night, when the program puts a high contention on the IO and takes much more storage than needed. The Ops team LOVES that!
Our last category of logs are the Philosophers. Everything that gets written is either so abstract or generic that it could fit in pretty much any context.
The Philosopher has an interesting source — generic messages such as:
2017-08-25-19:13:44 Info File saved successfully
2017-08-25-19:13:44 Info Timer activated
2017-08-25-19:13:44 Error - received invalid request
2017-08-25-19:13:44 Error - NullPointerException
These are actually low level details!
- I don't care that the file was written successfully. I care that the backup processes ended and where the backup is saved. That is the business logic I'm after. The file is just the low level mechanism.
- Normally, I would not care that a request was invalid. I could only care about the request details and why the request was not valid.
That isn't to say NullPointerException isn't helpful in a log, but I would much rather see the business logic that failed. Google NullPointerException + popular framework and you will find many places that it's helped people.
That being said, normally NullPointerException has a stack trace attached and requires more data for the reader to work with to be a helpful error.
Things to Remember
Talking about the bad is relatively easy. Let's talk about the good.
I have a few guidelines that helped me, but they are incomplete.
There is much more to the subject that I intend to write about in future posts.
Don't take any of these guidelines as rock solid rules.
Production Logs Should Not Contain Only Errors
Remember that not all bugs result in an error. When bug hunting (or just analyzing the logs), seeing (or not seeing) events will allow you to compare with previous versions.
Well written log message should also help with understanding execution flow.
The Log Is Not Meant For You!
You may be the one writing the log, but just like with code, you are not the only reader.
When in production, the first people to see logs are the Ops team or on-call developers.
You don't want be called every time something breaks. Hence, you should write the log as clearly as possible for them.
You can even ask them to review your log files — they may have interesting insights...
In my opinion, you should make your logs useful for the Operations team.
I'll write more on that in a future blog post.
Test Your Log!
Logging is just like any other production code — it needs to be tested.
- Are the log messages correct?
- Are the log messages clear and short?
- Are the log messages spamming?
- Are the log messages sufficient?
- Are the log messages secure? (should not contain any sensitive information)
- Conformity to other log messages in your app/process.
When starting out, you want to consult others.
The log may look good to you, but horrible for others.
Maximize Information While Minimizing Logging
As I stated above, in my opinion, you should aim for your logs to help
the Operations team. That means you should minimize logging
and, instead, write more data to other instruments that are less verbose.
The log should be used as a last resort. Maximize the usage of graphs,
pie charts, and other visual aids.
These tools must support a timeline — you will find out interesting stuff:
- Why is there a network peak at exactly 18:54 everyday?
- Is it normal that this process is running at 100% CPU?
- The new manager asked to stop everything and work late hours
to investigate a bug that has appeared to exist for several years.
I've witnessed all the above, which could have easily been resolved with a basic timeline presentation.
Don't choose a tool that does not give you a good timeline.
Logging is an important part of your project. Just like with code, you should have a clear methodology and peer review the output.
I couldn't find any silver bullet for logging (again, just like with code),
but I think I've noticed repeating patterns that are harmful/useful,
which I'm going to continue writing about in future posts.
This post is originally published by the author here. This version has been edited for clarity and may appear different from the original post.