To log or not to log?
Logging - the when, the where, and the what.
2023-01-06 Update! I’ve decided to expand on what you should consider logging and add a new section on what to avoid logging at the end of this post. Check it out!
_2023-04-26 Update! A recent study on logging levels came to my attention, and I have updated this post’s introduction accordingly.
Yes, you should log, and you know it. :-) You have this piece of information, and you are deciding if that is the right moment to log, if that’s the right place to add the log call, or if that particular information should be logged at all.
This is the standard guideline I adopt to decide whether I should log something or not.
But first, we should discuss what logging levels mean because the framework or library you use may have different names for these levels. Thankfully, in a recent research publication, the authors have identified 19 severity levels in 27 studies, 40 logging libraries, and practitioners’ views on what is called a “multivocal systematic mapping”:
E. Mendes and F. Petrillo, “Log severity levels matter: A multivocal mapping,” 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS), Hainan, China, 2021, pp. 1002-1013. DOI: 10.1109/QRS54544.2021.00109
They arrived at 6 fundamental levels. Surprisingly (or not), these are the same 6 levels I adopted on this post originally. :-)
And thankfully, they have also mapped the other levels to these 6, based on their meaning/usage. So if your framework or library uses different levels, you can use this mapping as a guide:
Note: For more tips about information to log related to exceptions, visit this other post about Exception Handling.
Log Levels Usage
Fatal
One or more key business functionalities are not working, and the whole system doesn’t fulfill the business functionalities. In other words, use it for severe errors that cause premature termination. Use it for catastrophic errors (errors that may cause the application to crash).
Error
One or more functionalities are not working, preventing some functionalities from working correctly. In comparison to Fatal, use Error for other runtime errors or unexpected conditions. Use it for errors that will prevent a request from being completed. Usually, a result of an exception that could not be completely treated/worked around.
Warn
Unexpected behavior happened inside the application, but it is continuing its work, and the key business features are operating as expected. Use it for deprecated APIs, poor use of API, ‘almost’ errors, other runtime situations that are undesirable or unexpected, but not necessarily “wrong”. These can indicate that a problem has happened and the system recovered (like an exception that was worked around) or flag something that could be a problem depending on the circumstances. If the circumstances that determine it to be a problem can be determined from the code, choose to log an error if they are present instead.
Info
An event happened, but it is purely informative and can be ignored during normal operations. Use it for interesting runtime events (startup/shutdown of services, etc.).
Debug
Debug is a log level used for events considered helpful during the software debugging when more granular information is needed. Use it for detailed information on the flow through the system. This is expected to be very verbose. It should be enabled during testing but not in production except in extreme circumstances. In production, it should be enabled for limited scopes (specific packages/classes) and limited time.
Trace
A log level describing events showing step-by-step execution of your code that can be ignored during the standard operation but may be useful during extended debugging sessions. Use it, for example, to annotate each step in the algorithm or each individual query with parameters in your code. It is generally not ok to have it enabled in environments other than dev, but exceptions can be considered.
Log Placement
Fatal
Usually, inside catch blocks. When inside catch blocks, if the exception is not being rethrown (encapsulated or as-is), it is acceptable to log the exception at this time.
Error
Usually, inside catch blocks. When inside catch blocks, if the exception is not being rethrown (encapsulated or as-is), it is acceptable to log the exception at this time.
Warn
Unexpected behavior happened inside the application, but it is continuing its work, and the key business features are operating as expected. Use it for deprecated APIs, poor use of API, ‘almost’ errors, other runtime situations that are undesirable or unexpected, but not necessarily “wrong” These can indicate that a problem has happened and the system recovered (like an exception that was worked around) or to flag something that could be a problem depending on the circumstances. If the circumstances that determine it to be a problem can be determined from the code, choose to log an error if they are present instead. It can often be placed in catch blocks, but being a warning means it is not a blocking issue, so the exception should be either rethrown (encapsulated or as-is) or worked around. If worked around, the exception can be logged optionally. If rethrown, it shouldn’t, since another catch block will deal with it and it may result in duplicate stack traces for the same event.
Info
Usually found before or after (or both) specific method calls or at the beginning and/or end of the execution of a method.
Debug/Trace
Anywhere.
Log information
Log information usually contains exceptions/stack traces, messages, and parameters/attributes values. Avoid logging any sensitive information from users, like information that would allow correlating data to their identities.
Security
I recommend storing security logs separately, as they may increase your log files too much, too fast, and they are not always relevant. Always add a string category to help you filter them out if you decide to keep them with other categories of logging. Here are some suggestions:
- Logon success, failures, and logout.
- Password changes and resets.
- Authorization failures.
These last two are better stored with audit logs. However, if you don’t have audit logs, consider adding them under the security category:
- Users creation, deletion, and suspension.
- Changes to access permissions, roles, and user groups.
- Impacting, critical actions (you may want to consult business analysts and product owners to identify these).
At any level
- User Ids. It’s always important to capture who did it. It will help later with customer incidents.
- Thankfully, logging frameworks usually take care of timestamps (preferrably UTC+0) and code location (the file and line where the logging entry is), but if it’s not being handled by default, please consider doing it. If you are worried about exposing your code structure, replace the location with an UUID that you can search for.
- Relevant method parameters. You know the line where you are adding the log call, so you must know some parameter values that could have led the flow to that point. You may not want this at the “info” level, but do consider it for all the others.
Fatal
Log exceptions, the parameters that caused the exception or fatal condition to occur and, if any attempt was made to recover, log the attempt (with parameters) and results.
Error
Log exceptions, the parameters that caused the exception or condition to occur and, if any attempt was made to recover, log the attempt (with parameters) and results.
Warn
Log exceptions, the parameters that caused the exception or condition to occur and, if any attempt was made to recover, log the attempt (with parameters) and results. If this warning flags a possible problem, log when this should be considered a problem and/or when it shouldn’t.
Info
Messages that are usually unrelated to exceptions. Avoid logging stack traces using info, unless highly relevant.
Debug/Trace
Anything.
Audit Logs are not Technical Logs
I’ve seen some engineers considering using logging frameworks to implement audit log requirements. While I believe it is not fundamentally wrong, some caution is advised.
Audit logs are a business requirement. They are to be analyzed by auditors with business knowledge, and no technical knowledge should be required. They are not expected to scavenge the information they need between lines of warnings and stack traces. Furthermore, the audit logs are subject to business requirements changes to conform to business needs. So it may be required to change templates of messages or be accessible from user interfaces (not text files).
Suppose you consider using a logging framework to implement an audit log requirement. In that case, I’d strongly suggest you:
- Store the logs in a database, making it easier to develop a UI for the auditors to access and query over.
- Encapsulate the framework to ensure the usage of string templates in your messages. It will make it easier if you need to modify them later without making the change in every place where the event is being logged.
- Considering a scenario where you may have multiple services, I’d have either a specific service to handle that or at least a shared component that encapsulates the framework and ensures consistency between logged events.
- You may want to make use of annotations with AOP (aspect-oriented programming) to trigger the logging events in certain methods or classes without compromising too much of the code legibility.
Avoid logging these things
- Personal Identifiable Information (A.K.A. PII): Remember NEVER to log sensitive information about your users or even information that would allow someone to identify the user. If you are unsure about the information being personal identification or sensitive, ask the organization. Some obvious information is Social Insurance/Security Number (SIN/SSN), email addresses, names, usernames (login), addresses, and combinations of data, like last name + date of birth.
- Of course, NEVER log passwords, even if there is a login attempt with a wrong password. Multiple wrong attempts can easily disclose the correct one.
- Health or financial-related data.
I know it may be difficult to work when some of this information is missing. I have written a post about Working with limited production data access that may be of your interest.
If you have any more suggestions around auditing logs or logging usage in general, sound off in the comments! :-)
If you like this post, please share it (you can use the buttons in the end of this post). It will help me a lot and keep me motivated to write more. Also, subscribe to get notified of new posts when they come out.