5 Logging subsystem provides facilities for capturing, recording and
6 reporting events which occur within the OpenDaylight system.
7 These events are the primary means of system diagnostics and troubleshooting,
8 serving a wide audience including automated monitoring systems, operators,
9 administrators, support personnel and development engineers.
11 In order to provide a 'single system' experience, all software components should
12 follow same basic rules for interfacing with the logging system.
13 While it is not practical to force these rules on the various third parties,
14 they should to be followed by all newly-developed components.
19 Java provides a diverse set of logging APIs and implementations.
20 OpenDaylight has selected `SLF4J API <http://www.slf4j.org/apidocs/index.html>`__,
21 with the implementation being provided by the Karaf container.
23 The primary reasons for this decision are:
25 - proper split between API and implementation, allowing us to change
26 the implementation without impacting code
27 - ability to provide legacy APIs (JUL, JCL, etc.) for third-party code
32 SLF4J defines five levels of logging messages, ranging from **TRACE** to
34 The guidelines for their use are very vague and in the grand UNIX tradition mix
35 severity of the message with its granularity.
36 The following summary defines the basic rules on when to use a specific level,
37 who the target audience is
38 and how any message of the specified level will be interpreted.
40 Note that all events with level **INFO or higher present an API-like contract**
41 of the system from the integration point-of-view:
42 if they change, third-party systems such as monitoring may need to be updated to
43 work correctly with the new system release.
44 The message text on these levels should be understandable to people with
45 networking and systems administration background, so any language assuming
46 knowledge of programming in general, or Java in particular,
47 should be avoided if at all possible.
49 Messages on **DEBUG** and higher present are part of interface contract with
50 support entities, e.g. if they are changed operator and troubleshooting manuals,
51 as well as knowledge-base systems may need to be updated to correctly interpret
52 the information conveyed.
53 On the **DEBUG** level, messages may assume a slight level of familiarity with
54 general programming concepts. Terminology specific to any programming
55 language should be avoided if possible.
60 This level serves as the general error facility.
61 It should be used whenever the software encounters an unexpected error which
62 prevents further processing.
63 The reporting component may attempt recovery which may affect system
64 availability or performance for a period of time, or it may forcefully terminate
65 if it is unable to recover.
66 This level is always reported for all software components.
68 There are three examples when this level should be used:
70 #. Incorrigible internal state inconsistency, such as when a JVM
71 reports an OutOfMemoryError. The top-level handler would log this
72 condition and force a JVM exit, as it is not capable of continuing
74 #. Internal state inconsistency, which is correctable by flushing and
75 re-creating the state. In this case the component would log an event,
76 which would indicate what assertion has been violated and that the
77 state has been flushed to recover.
78 #. Request-level processing error, e.g. the application encounters an
79 error which is preventing a particular request from completing, but
80 there is no indication of systematic failure which would prevent
81 other requests from being successfully processed.
83 The primary audience are monitoring systems and system operators,
84 as the events reported have impact on operational performance of the system.
89 This level serves for events which indicate irregular circumstances,
90 from which we have a clearly-defined recovery strategy which has no impact on
91 system availability or performance as seen by the reporting component.
92 Events on this level are usually reported.
94 A typical example for a candidate event is when a software component detects
95 inconsistency within an external data feed and it performs a corrective action
97 Let's say we process a list of key/value pairs and encounter a duplicate key:
98 we can either overwrite the old occurance, ignore the new occurance or abort.
99 If we take any of the first two actions, we should report a **WARN** event.
100 If we take the third, we should report an **ERROR** event.
102 The primary audience of these events are automated systems, operators and
103 administrators, as this level of messages indicate non-optimal system operation
104 (e.g. data feeds could use normalization) or may forebode a future failure.
109 This level serves for events which constitute major state changes within
110 a software component -- such as initialization, shutdown, persistent
111 resource allocation, etc. -- which are part of normal operations.
112 Events on this level are typically reported for non-library components.
114 Each software component should log at least four events on this level:
116 #. when it starts its initialization
117 #. when it becomes operational,
118 #. when it starts orderly shutdown,
119 #. just before it terminates normally
121 The primary audience of these events are operators and administrators,
122 who use them to confirm major interactions (such as restarting
123 components) have occurred within the system.
128 This is the coarse diagnostic level, serving for events which indicate
129 internal state transitions and detail what processing occurs.
130 Events on this level are usually not reported, but are enabled when a subsystem
131 code flows need to be examined for troubleshooting purposes.
133 Placement and amount of events generated at this level are at the discretion of
134 the development engineers, as both relate directly to the component logic.
135 The amount of detail in these events has to be limited to a single line of
136 information useful for pinning down a misbehaving software component in a
137 seemingly normal system and should be directly cross-referenceable to either
138 previous **DEBUG** events or component configuration.
139 A **hard requirement** is that there has to be at least one flow control
140 statement (such as if, for, while) or a call to a different software component
141 between two **DEBUG** event triggers.
143 Primary audience of these events are administrators and support personnel
144 diagnosing operational issues, mainly in real-time as they occur on the system.
149 This is the fine-grained diagnostic level, serving for events which
150 indicate internal state transitions in full detail.
151 Events on this level are not reported, but have to be explicitly enabled and may
152 be collected for support purposes.
154 Placement, amount and contents of these events is completely at the description
155 of development engineers.
156 These events are completely release-specific, may change even between minor
158 Examples of events reported at this level would be method entry and exit,
159 possibly including detailed input arguments, and dumps of internal data as it is
162 Primary audience of these events are senior support personnel and development
163 engineers diagnosing operational irregularities which relate directly to code
164 structure, mainly offline after being captured on a live system.
169 Each class containing calls to the logging subsystem **MUST** have its own
170 logger, which is not shared with any other class.
171 The variable holding reference to this logger **MUST** me named **'LOG'**.
172 The easiest way to ensure you have the proper logger declaration is to use
173 the following pattern:
177 import org.slf4j.Logger;
178 import org.slf4j.LoggerFactory;
181 private static final Logger LOG = LoggerFactory.getLogger(Foo.class);
185 Use parameterized logging
186 ~~~~~~~~~~~~~~~~~~~~~~~~~
188 Using dynamically-constructed message strings contributes to major overhead as
189 the message string has to be constructed before the call to logging method is
190 performed, thus forcing overhead even if the constructed string is not used
191 (for example **DEBUG** level is not enabled).
193 Another issue with dynamically-constructed message strings is that they cannot
194 be easily extracted by static source code analysis -- a process critical for
195 creating message catalogue of a particular software release, which in turn is
196 needed for things like support knowledge bases, internationalization, etc.
198 While the former concern is addressed by Logger classes exposing methods such as
199 ``LOG.isDebugEnabled()``, the second concern can only be alleviated by using
200 explicit String literals when calling the Logger methods.
201 The correct way to address both concerns is to use parameterized logging as
202 described at http://www.slf4j.org/faq.html#logging_performance.
203 The basic pattern to follow is this:
208 private static final Logger LOG = LoggerFactory.getLogger(Foo.class);
210 // GOOD: string literal, no dynamic objects
211 public void good_method(Object arg) {
212 LOG.debug("Method called with arg {}", arg);
215 // BAD: string varies with argument
216 public bad_method1(Object arg) {
217 LOG.debug("Method called with arg " + arg);
221 public void bad_method2(Object arg) {
222 if (LOG.isDebugEnabled()) {
223 LOG.debug("Method called with arg {}", arg);
227 // BAD: wrong level of language, this would be okay on TRACE
228 public bad_method3(Object arg) {
229 LOG.debug("arg is {}", arg);
233 There is one thing that needs to be noted in this style, which is that logging
234 an exception is properly supported if you supply it as the last argument,
235 but you have to **MAKE SURE IT IS NOT HINTED TO IN THE MESSAGE STRING**:
240 private static final Logger LOG = LoggerFactory.getLogger(Foo.class);
242 // GOOD: note how there is no "{}" for ex
243 public void good_method(Object arg) {
247 } catch (SomeException ex) {
248 LOG.warn("Failed to do something with {}, continuing", arg, ex);
253 // - exception is interpreted as an object
254 // - exception chaining cause is lost
255 // - stack trace is lost
256 public void bad_method(Object arg) {
260 } catch (SomeException ex) {
261 LOG.warn("Failed to do something with {} because {}, continuing", arg, ex);
267 Avoid calls to the methods is{Trace|Debug|Info|Warn|Error}Enabled()
268 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
270 While it is true that methods such as ``isDebugEnabled()`` & Co. eliminate the
271 minor overhead associated with the variadic method call, the burden on
272 the developer is not acceptable simply because there are much better
273 methods of automatic control of this overhead, without having any impact
274 on the source code (or even the class files).
275 One of them is JIT-level optimizations stemming from the ability to inline calls
277 The other is the set of interfaces from ``java.lang.instrument`` package,
278 which can be used to completely eliminate the call overhead by removing all
279 calls to ``LOG.debug()`` from the class bytecode based on the logger configuration.
281 The exception to this rule can be if you log something that has some cost to
282 calculate - and if that log statement runs a lot.
283 Something (made up) like:
287 for (int i = 0; i < 100000; i++) {
288 if (LOG.isDebugEnabled()) {
289 LOG.debug("The size is: {}", expensiveMethodToCalculateSize());
293 Note that you can and always should pass Object and thus never ``toString()`` your
294 objects passed to a Logger.
295 For example, this is WRONG:
299 List<Interface> interfaces;
300 if (LOG.isDebugEnabled()) {
301 LOG.info("Interfaces: {}", interfaces.toString());
304 and instead you can simply do:
308 LOG.info("Interfaces: {}", interfaces); // no need to guard this with isDebugEnabled!
310 Provide useful event context
311 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
313 Each logging call should provide useful context in which it occurred.
314 This is not usually the case with a lot of Java-based software, notably even
315 with some JVM implementations.
316 Here are some typical anti-patterns which contribute to mitigated ability to
317 diagnose problems when they happen:
322 private static final Logger LOG = LoggerFactory.getLogger(Foo.class);
325 // - no context provided
326 // - non-constant message string
327 // - assumes useful toString()
328 public bad_method1(Object arg) {
329 LOG.debug(arg.toString());
333 // - no context provided
334 public bad_method2(Object arg) {
335 LOG.debug("{}", arg);
339 // - silently ignoring errors!!!
340 public bad_method3(Object arg) {
344 } catch (SomeException ex) {
349 // - message is not constant
350 // - no context is provided
351 // - ex.getCause() is lost
352 // - call stack is lost
353 public void bad_method4(Object arg) {
357 } catch (SomeException ex) {
358 LOG.warn(ex.getMessage());
363 // - message is not constant
364 // - no context is provided
365 // - ex.getCause() is probably lost
366 // - call stack is probably lost
367 // - assumes useful toString()
368 public void bad_method5(Object arg) {
372 } catch (SomeException ex) {
373 LOG.warn(ex.toString());
378 // - no useful context is provided
379 // - ex.getCause() is probably lost
380 // - call stack is probably lost
381 // - administrators don't know what an Exception is!
382 public void bad_method6(Object arg) {
386 } catch (SomeException ex) {
387 LOG.warn("Exception {}", ex);
392 The proper fix for these anti-patterns is to always provide key information
393 in the logging event:
396 - how badly it went wrong
397 - in case we recover, shortly describe how (especially on **WARN**
404 private static final Logger LOG = LoggerFactory.getLogger(Foo.class);
408 // - we explain what we tried to do
409 // - we pass along information we have about the failure
410 // - we explain that we recovered from the failure
411 public void good_method1(Object arg) {
415 } catch (SomeException ex) {
416 LOG.warn("Failed to do something with {}, ignoring it", arg, ex);
422 // - we explain what we tried to do
423 // - we pass along information we have about the failure
424 // - we escalate the failure to our caller
425 // - we also 'chain' the exception so it is not lost and can be
427 public void good_method2(Object arg) {
431 } catch (SomeException ex) {
432 LOG.error("Failed to do something with {}", arg, ex);
433 throw new RuntimeException("Failed to do something", ex);