docs/developer-guides/logging-subsystem.rst

   1 #################
   2 Logging subsystem
   3 #################
   4
   5 Logging subsystem provides facilities for capturing, recording and
   6 reporting events which occur within the OpenDaylight system.
   7 These events are the primary means of system diagnostics and troubleshooting,
   8 serving a wide audience including automated monitoring systems, operators,
   9 administrators, support personnel and development engineers.
  10
  11 In order to provide a 'single system' experience, all software components should
  12 follow same basic rules for interfacing with the logging system.
  13 While it is not practical to force these rules on the various third parties,
  14 they should to be followed by all newly-developed components.
  15
  16 Logging in Java
  17 ~~~~~~~~~~~~~~~
  18
  19 Java provides a diverse set of logging APIs and implementations.
  20 OpenDaylight has selected `SLF4J API <http://www.slf4j.org/apidocs/index.html>`__,
  21 with the implementation being provided by the Karaf container.
  22
  23 The primary reasons for this decision are:
  24
  25 -  proper split between API and implementation, allowing us to change
  26    the implementation without impacting code
  27 -  ability to provide legacy APIs (JUL, JCL, etc.) for third-party code
  28
  29 Message levels
  30 ~~~~~~~~~~~~~~
  31
  32 SLF4J defines five levels of logging messages, ranging from **TRACE** to
  33 **ERROR**.
  34 The guidelines for their use are very vague and in the grand UNIX tradition mix
  35 severity of the message with its granularity.
  36 The following summary defines the basic rules on when to use a specific level,
  37 who the target audience is
  38 and how any message of the specified level will be interpreted.
  39
  40 Note that all events with level **INFO or higher present an API-like contract**
  41 of the system from the integration point-of-view:
  42 if they change, third-party systems such as monitoring may need to be updated to
  43 work correctly with the new system release.
  44 The message text on these levels should be understandable to people with
  45 networking and systems administration background, so any language assuming
  46 knowledge of programming in general, or Java in particular,
  47 should be avoided if at all possible.
  48
  49 Messages on **DEBUG** and higher present are part of interface contract with
  50 support entities, e.g. if they are changed operator and troubleshooting manuals,
  51 as well as knowledge-base systems may need to be updated to correctly interpret
  52 the information conveyed.
  53 On the **DEBUG** level, messages may assume a slight level of familiarity with
  54 general programming concepts. Terminology specific to any programming
  55 language should be avoided if possible.
  56
  57 ERROR
  58 ^^^^^
  59
  60 This level serves as the general error facility.
  61 It should be used whenever the software encounters an unexpected error which
  62 prevents further processing.
  63 The reporting component may attempt recovery which may affect system
  64 availability or performance for a period of time, or it may forcefully terminate
  65 if it is unable to recover.
  66 This level is always reported for all software components.
  67
  68 There are three examples when this level should be used:
  69
  70 #. Incorrigible internal state inconsistency, such as when a JVM
  71    reports an OutOfMemoryError. The top-level handler would log this
  72    condition and force a JVM exit, as it is not capable of continuing
  73    execution.
  74 #. Internal state inconsistency, which is correctable by flushing and
  75    re-creating the state. In this case the component would log an event,
  76    which would indicate what assertion has been violated and that the
  77    state has been flushed to recover.
  78 #. Request-level processing error, e.g. the application encounters an
  79    error which is preventing a particular request from completing, but
  80    there is no indication of systematic failure which would prevent
  81    other requests from being successfully processed.
  82
  83 The primary audience are monitoring systems and system operators,
  84 as the events reported have impact on operational performance of the system.
  85
  86 WARN
  87 ^^^^
  88
  89 This level serves for events which indicate irregular circumstances,
  90 from which we have a clearly-defined recovery strategy which has no impact on
  91 system availability or performance as seen by the reporting component.
  92 Events on this level are usually reported.
  93
  94 A typical example for a candidate event is when a software component detects
  95 inconsistency within an external data feed and it performs a corrective action
  96 to compensate for it.
  97 Let's say we process a list of key/value pairs and encounter a duplicate key:
  98 we can either overwrite the old occurance, ignore the new occurance or abort.
  99 If we take any of the first two actions, we should report a **WARN** event.
 100 If we take the third, we should report an **ERROR** event.
 101
 102 The primary audience of these events are automated systems, operators and
 103 administrators, as this level of messages indicate non-optimal system operation
 104 (e.g. data feeds could use normalization) or may forebode a future failure.
 105
 106 INFO
 107 ^^^^
 108
 109 This level serves for events which constitute major state changes within
 110 a software component -- such as initialization, shutdown, persistent
 111 resource allocation, etc. -- which are part of normal operations.
 112 Events on this level are typically reported for non-library components.
 113
 114 Each software component should log at least four events on this level:
 115
 116 #. when it starts its initialization
 117 #. when it becomes operational,
 118 #. when it starts orderly shutdown,
 119 #. just before it terminates normally
 120
 121 The primary audience of these events are operators and administrators,
 122 who use them to confirm major interactions (such as restarting
 123 components) have occurred within the system.
 124
 125 DEBUG
 126 ^^^^^
 127
 128 This is the coarse diagnostic level, serving for events which indicate
 129 internal state transitions and detail what processing occurs.
 130 Events on this level are usually not reported, but are enabled when a subsystem
 131 code flows need to be examined for troubleshooting purposes.
 132
 133 Placement and amount of events generated at this level are at the discretion of
 134 the development engineers, as both relate directly to the component logic.
 135 The amount of detail in these events has to be limited to a single line of
 136 information useful for pinning down a misbehaving software component in a
 137 seemingly normal system and should be directly cross-referenceable to either
 138 previous **DEBUG** events or component configuration.
 139 A **hard requirement** is that there has to be at least one flow control
 140 statement (such as if, for, while) or a call to a different software component
 141 between two **DEBUG** event triggers.
 142
 143 Primary audience of these events are administrators and support personnel
 144 diagnosing operational issues, mainly in real-time as they occur on the system.
 145
 146 TRACE
 147 ^^^^^
 148
 149 This is the fine-grained diagnostic level, serving for events which
 150 indicate internal state transitions in full detail.
 151 Events on this level are not reported, but have to be explicitly enabled and may
 152 be collected for support purposes.
 153
 154 Placement, amount and contents of these events is completely at the description
 155 of development engineers.
 156 These events are completely release-specific, may change even between minor
 157 releases.
 158 Examples of events reported at this level would be method entry and exit,
 159 possibly including detailed input arguments, and dumps of internal data as it is
 160 being modified.
 161
 162 Primary audience of these events are senior support personnel and development
 163 engineers diagnosing operational irregularities which relate directly to code
 164 structure, mainly offline after being captured on a live system.
 165
 166 Logger instances
 167 ~~~~~~~~~~~~~~~~
 168
 169 Each class containing calls to the logging subsystem **MUST** have its own
 170 logger, which is not shared with any other class.
 171 The variable holding reference to this logger **MUST** me named **'LOG'**.
 172 The easiest way to ensure you have the proper logger declaration is to use
 173 the following pattern:
 174
 175 .. code:: java
 176
 177    import org.slf4j.Logger;
 178    import org.slf4j.LoggerFactory;
 179
 180    class Foo {
 181        private static final Logger LOG = LoggerFactory.getLogger(Foo.class);
 182        ...
 183    }
 184
 185 Use parameterized logging
 186 ~~~~~~~~~~~~~~~~~~~~~~~~~
 187
 188 Using dynamically-constructed message strings contributes to major overhead as
 189 the message string has to be constructed before the call to logging method is
 190 performed, thus forcing overhead even if the constructed string is not used
 191 (for example **DEBUG** level is not enabled).
 192
 193 Another issue with dynamically-constructed message strings is that they cannot
 194 be easily extracted by static source code analysis -- a process critical for
 195 creating message catalogue of a particular software release, which in turn is
 196 needed for things like support knowledge bases, internationalization, etc.
 197
 198 While the former concern is addressed by Logger classes exposing methods such as
 199 ``LOG.isDebugEnabled()``, the second concern can only be alleviated by using
 200 explicit String literals when calling the Logger methods.
 201 The correct way to address both concerns is to use parameterized logging as
 202 described at http://www.slf4j.org/faq.html#logging_performance.
 203 The basic pattern to follow is this:
 204
 205 .. code:: java
 206
 207    class Foo {
 208        private static final Logger LOG = LoggerFactory.getLogger(Foo.class);
 209
 210        // GOOD: string literal, no dynamic objects
 211        public void good_method(Object arg) {
 212            LOG.debug("Method called with arg {}", arg);
 213        }
 214
 215        // BAD: string varies with argument
 216        public bad_method1(Object arg) {
 217            LOG.debug("Method called with arg " + arg);
 218        }
 219
 220        // BAD: code clutter
 221        public void bad_method2(Object arg) {
 222            if (LOG.isDebugEnabled()) {
 223                LOG.debug("Method called with arg {}", arg);
 224            }
 225        }
 226
 227        // BAD: wrong level of language, this would be okay on TRACE
 228        public bad_method3(Object arg) {
 229            LOG.debug("arg is {}", arg);
 230        }
 231    }
 232
 233 There is one thing that needs to be noted in this style, which is that logging
 234 an exception is properly supported if you supply it as the last argument,
 235 but you have to **MAKE SURE IT IS NOT HINTED TO IN THE MESSAGE STRING**:
 236
 237 .. code:: java
 238
 239    class Foo {
 240        private static final Logger LOG = LoggerFactory.getLogger(Foo.class);
 241
 242        // GOOD: note how there is no "{}" for ex
 243        public void good_method(Object arg) {
 244            try {
 245                doSomething(arg);
 246                ...
 247            } catch (SomeException ex) {
 248                LOG.warn("Failed to do something with {}, continuing", arg, ex);
 249            }
 250        }
 251
 252        // BAD:
 253        // - exception is interpreted as an object
 254        // - exception chaining cause is lost
 255        // - stack trace is lost
 256        public void bad_method(Object arg) {
 257            try {
 258                doSomething(arg);
 259                ...
 260            } catch (SomeException ex) {
 261                LOG.warn("Failed to do something with {} because {}, continuing", arg, ex);
 262            }
 263        }
 264    }
 265
 266
 267 Avoid calls to the methods is{Trace|Debug|Info|Warn|Error}Enabled()
 268 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 269
 270 While it is true that methods such as ``isDebugEnabled()`` & Co. eliminate the
 271 minor overhead associated with the variadic method call, the burden on
 272 the developer is not acceptable simply because there are much better
 273 methods of automatic control of this overhead, without having any impact
 274 on the source code (or even the class files).
 275 One of them is JIT-level optimizations stemming from the ability to inline calls
 276 to ``LOG.debug()``.
 277 The other is the set of interfaces from ``java.lang.instrument`` package,
 278 which can be used to completely eliminate the call overhead by removing all
 279 calls to ``LOG.debug()`` from the class bytecode based on the logger configuration.
 280
 281 The exception to this rule can be if you log something that has some cost to
 282 calculate - and if that log statement runs a lot.
 283 Something (made up) like:
 284
 285 .. code:: java
 286
 287    for (int i = 0; i < 100000; i++) {
 288        if (LOG.isDebugEnabled()) {
 289            LOG.debug("The size is: {}", expensiveMethodToCalculateSize());
 290        }
 291    }
 292
 293 Note that you can and always should pass Object and thus never ``toString()`` your
 294 objects passed to a Logger.
 295 For example, this is WRONG:
 296
 297 .. code:: java
 298
 299    List<Interface> interfaces;
 300    if (LOG.isDebugEnabled()) {
 301        LOG.info("Interfaces: {}", interfaces.toString());
 302    }
 303
 304 and instead you can simply do:
 305
 306 .. code:: java
 307
 308    LOG.info("Interfaces: {}", interfaces); // no need to guard this with isDebugEnabled!
 309
 310 Provide useful event context
 311 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 312
 313 Each logging call should provide useful context in which it occurred.
 314 This is not usually the case with a lot of Java-based software, notably even
 315 with some JVM implementations.
 316 Here are some typical anti-patterns which contribute to mitigated ability to
 317 diagnose problems when they happen:
 318
 319 .. code:: java
 320
 321    class Foo {
 322        private static final Logger LOG = LoggerFactory.getLogger(Foo.class);
 323
 324        // VERY BAD:
 325        // - no context provided
 326        // - non-constant message string
 327        // - assumes useful toString()
 328        public bad_method1(Object arg) {
 329            LOG.debug(arg.toString());
 330        }
 331
 332        // VERY BAD:
 333        // - no context provided
 334        public bad_method2(Object arg) {
 335            LOG.debug("{}", arg);
 336        }
 337
 338        // COMPLETELY BAD:
 339        // - silently ignoring errors!!!
 340        public bad_method3(Object arg) {
 341            try {
 342                doSomething(arg);
 343                ...
 344            } catch (SomeException ex) {
 345            }
 346        }
 347
 348        // EXTREMELY BAD:
 349        // - message is not constant
 350        // - no context is provided
 351        // - ex.getCause() is lost
 352        // - call stack is lost
 353        public void bad_method4(Object arg) {
 354            try {
 355                doSomething(arg);
 356                ...
 357            } catch (SomeException ex) {
 358                LOG.warn(ex.getMessage());
 359            }
 360        }
 361
 362        // EXTREMELY BAD:
 363        // - message is not constant
 364        // - no context is provided
 365        // - ex.getCause() is probably lost
 366        // - call stack is probably lost
 367        // - assumes useful toString()
 368        public void bad_method5(Object arg) {
 369            try {
 370                doSomething(arg);
 371                ...
 372            } catch (SomeException ex) {
 373                LOG.warn(ex.toString());
 374            }
 375        }
 376
 377        // VERY BAD:
 378        // - no useful context is provided
 379        // - ex.getCause() is probably lost
 380        // - call stack is probably lost
 381        // - administrators don't know what an Exception is!
 382        public void bad_method6(Object arg) {
 383            try {
 384                doSomething(arg);
 385                ...
 386            } catch (SomeException ex) {
 387                LOG.warn("Exception {}", ex);
 388            }
 389        }
 390    }
 391
 392 The proper fix for these anti-patterns is to always provide key information
 393 in the logging event:
 394
 395 -  what went wrong
 396 -  how badly it went wrong
 397 -  in case we recover, shortly describe how (especially on **WARN**
 398    level)
 399
 400
 401 .. code:: java
 402
 403    class Foo {
 404        private static final Logger LOG = LoggerFactory.getLogger(Foo.class);
 405
 406        // GOOD:
 407        // - string literal
 408        // - we explain what we tried to do
 409        // - we pass along information we have about the failure
 410        // - we explain that we recovered from the failure
 411        public void good_method1(Object arg) {
 412            try {
 413                doSomething(arg);
 414                ...
 415            } catch (SomeException ex) {
 416                LOG.warn("Failed to do something with {}, ignoring it", arg, ex);
 417            }
 418        }
 419
 420        // GOOD:
 421        // - string literal
 422        // - we explain what we tried to do
 423        // - we pass along information we have about the failure
 424        // - we escalate the failure to our caller
 425        // - we also 'chain' the exception so it is not lost and can be
 426        // correlated
 427        public void good_method2(Object arg) {
 428            try {
 429                doSomething(arg);
 430                ...
 431            } catch (SomeException ex) {
 432                LOG.error("Failed to do something with {}", arg, ex);
 433                throw new RuntimeException("Failed to do something", ex);
 434            }
 435        }
 436    }