Gathering all kinds of telemetry data is key to operating reliable distributed systems at scale. Once you have set up your monitoring systems and recorded all relevant data, the challenge becomes to make sense of it and extract valuable information. Some key questions become:
- How to interpret the telemetry data that is emitted from the systems you are running?
- How to measure the quality of APIs you provide and consume?
- How to aggregate metrics from single nodes to service-level views?
In this workshop we will address these questions with statistical methods like: data visualisation, averages, percentiles, outlier-analysis, histograms, regressions, robustness, and mergeability. We will cover the material from a theoretical and a practical perspective. Bring pen and paper and a laptop!