Users should strive to implement canonical log lines (when it makes sense).
Note: the below is a quick primer on the topic; see the original Stripe blog post for more details.
canonical log lines: authoritative line for a particular request, in the same vein that the IETF’s canonical link relation specifies an authoritative URL.
In addition to their normal log traces, requests (or some other unit of work that’s executing) also emit ONE LONG LOG LINE at the end that pulls all its key telemetry into one place. i.e.,
[2019-03-18 22:48:️32.999] "canonical-log-line" alloc_count=9123 auth_type=api_key database_queries=34 duration=0.009 http_method=POST http_path=/v1/charges http_status=200 key_id=mk_123 permissions_used=account_write rate_allowed=true rate_quota=100 rate_remaining=99 request_id=req_123 team=acquiring user_id=usr_123†
- † uuid (aka guid) may be useful
i.e.,
user_id = uuid.uuid4()
See also: When are you truly forced to use UUID as part of the design?
The above sample shows the kind of information that a canonical line might contain:
- HTTP request:
verb
path
response status
- Authenticated user and related information, e.g.,
authentication method (API key, password)
ID of the API key they used
- Rate limiters which allowed the request and related statistics, e.g.,
allotted quota
what portion remains
- Timing information, e.g.,
total request duration
time spent in database queries
- Number of:
database queries issued
objects allocated by the VM
etc.
Canonical lines are an ergonomic feature.
By colocating everything that’s important to us, we make it accessible through queries that are easy for people to write, even under the duress of a production incident.
Because the underlying logging system doesn’t need to piece together multiple log lines at query time they’re also cheap for computers to retrieve and aggregate, which makes them fast to use.
The wide variety of information being logged provides almost limitless flexibility in what can be queried. This is especially valuable during the discovery phase of an incident where it’s understood that something’s wrong, but it’s still a mystery as to what.
Getting insight into e.g., a rate limiting problem, becomes as simple as (e.g. using Splunk and its built-in query language):
canonical-log-line rate_allowed=false | stats count by user_id
Package Contents¶
Functions¶
Convenience function to get the local logging configuration dictionary, |
|
Convenience function that returns a logger |
|
|
Attributes¶
CamelCase alias for structlog_sentry_logger.get_logger. |
- structlog_sentry_logger.getLogger¶
CamelCase alias for structlog_sentry_logger.get_logger.
- structlog_sentry_logger.get_config_dict()[source]¶
Convenience function to get the local logging configuration dictionary, e.g., to help configure loggers from other libraries.
Returns: The logging configuration dictionary that would be used to configure the Python logging library component of the logger
- Return type
- structlog_sentry_logger.get_logger()[source]¶
Convenience function that returns a logger
Returns: A proxy that creates a correctly configured logger bound to the __name__ of the calling module
- Return type
Any
- structlog_sentry_logger.get_namespaced_module_name(__file__)[source]¶
- Parameters
__file__ (Union[pathlib.Path, str]) –
- Return type