Gaining insight into your system health
https://cgos.github.io/observability-presentation
Adrian Cole
Peter Bourgon
Brendan D. Gregg
Yuri Shkuro
Ben Sigelman
Cindy Sridharan
The Dapper paper from
2010
Gestion de la demande
Offre de service – Confluence
Annonces – Yammer (IT Monitoring)
In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.
Observability is an attribute meaning the system is emitting a signal.
Monitoring is an action taken from a human or a machine based on an event.
Ref: Peter
Bourgon
https://12factor.net/logs
- Logs are the stream of aggregated, time-ordered events collected from the output streams of all running processes.
- Logs in their raw form are typically a text format with one event per line (exceptions may span multiple lines).
- Logs have no fixed beginning or end, but flow continuously as long as the app is operating.
An app should not attempt to write to or manage logfiles.
Instead, each running process writes its event stream, unbuffered, to stdout.
https://12factor.net/logs
Ref: fluentd
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.LoggerFactory;
public class Wombat {
final Logger logger =
LoggerFactory.getLogger(Wombat.class);
Integer t;
Integer oldT;
public void setTemperature(Integer temperature) {
oldT = t;
t = temperature;
logger.debug("Temperature set to {}", t, oldT);
if(temperature.intValue() > 50) {
logger.info("Temperature has risen above 50 degrees.");
}
}
}
Metrics are a numeric representation of data measured over intervals of time. Metrics can harness the power of mathematical modeling and prediction to derive knowledge of the behavior of a system over intervals of time in the present and future.Ref: Cindy Sridharan
A time series is simply a series of data points ordered in time.
In a time series, time is often the independent variable and the goal is usually to make
a
forecast for the future
Strava
Distributed tracing is a method used to profile and monitor applications,https://opentracing.io
especially those built using a microservices architecture. Distributed tracing helps pinpoint where failures occur and what causes poor performance.