noel-gallagher

Observability


Monitoring


What do we want to monitor?


What do we want to alert on?


Logs

Capturing detailed information


ELK stack


Distributed Tracing

Ability to track end-to-end an action

For example: A user submits a request which travels through many backend services

The request goes from A->B->C, each service A,B,C is an individual span, while a trace could be A->B or A->C or B-C


Grafana


Prometheus:

collects and stores metrics as time series data


prometheus architecture


Core metrics:

counter: incremental

guage: increase/decrease (ie cpu usage, memory usage, dynamic data)

histogram: distribution of values (ie latency P95, P98, P99)


Integration