SREcon18 EMEA has ended

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Track 4 [clear filter]
Wednesday, August 29

11:20 CEST

Data Visualization for SREs—An Essential Skill for Quick Debugging
Limited Capacity filling up

SREs are software engineers with a broad skill set who work with systems in general. Depending on the type of work and teams, our time is usually spent correlating incidental data to conclude the causes of issues. While we use ELK, splunk, etc. to visualize our logs; it’s an essential skill to parse log file by hand and visualize it to make useful observations quickly. Many times, we end up writing APIs and command line shortcuts to accelerate our debugging. We can make use of some of the techniques I’ll show you to visualize this data quickly.

WHY this talk?

SREs are usually from more architectural/back-end backgrounds and generally lack working with front-end and visualization. The techniques I’ll show will hopefully be helpful to SREs in day-to-day scenarios.


Yash Shah

Yash Shah is a site reliability engineer @LinkedIn.

Wednesday August 29, 2018 11:20 - 12:30 CEST
4 - Leibniz Room

14:00 CEST

SRE Classroom, Or, How to Design a Distributed System in 3 Hours
Limited Capacity full
Adding this to your schedule will put you on the waitlist.

This workshop ties together academic and practical aspects of systems engineering, with an emphasis on applying principles of systems design to a production service. We will analyze the service to quantify its performance, and iteratively improve the design. Participants will work together in small groups to sketch out the design, identify components and their relationships, and to assess the suitability of the design to the system’s Service Level Objective (SLO).

Participants will have a system design and bill of materials at the conclusion of this workshop.

Participants will not need laptops or specific coding experience; participants will need enthusiasm for collaborating in small groups, and for discussion-based problem-solving. Participants will come away with an understanding of the principles of iterative systems engineering, popularly known as “Non-abstract large systems design.”

This workshop covers material critical for SRE, an increasingly-broad field that combines software engineering and systems design.


Fabian Geisberger

Fabian is a Site Reliability Engineer at Google in New York, where he currently works on monitoring systems. He previously worked on the Ganeti SRE team, the Production Monitoring team, and several other Google services. Fabian received a Masters (Diploma) in Computer Science from... Read More →
avatar for Salim Virji

Salim Virji

Google LLC
Salim Virji is a Site Reliability Engineer at Google, where he has worked on distributed compute, consensus, and storage systems.

Wednesday August 29, 2018 14:00 - 17:30 CEST
4 - Leibniz Room
Thursday, August 30

09:00 CEST

The EU's New Data Protection Law—A Survival Guide
Limited Capacity seats available

What data do you hold?

Are you processing the data, or controlling it?

Do you have the consents to use that data like that?

Do you have a register of all that data and every way you use it, and what for?

Can you find every piece of data you hold that relates to an individual, copy it and send it to them—for free—within 30 days?

What happens when they say they want it erased?

The General Data Protection Directive came into force on the 25th May 2018. New powers mean regulators can impose fines for breaches up to 4% of annual turnover. But that’s not the only thing that will drive compliance- the data supply chain and a fresh threat of litigation are both driving change in organisations as well. This workshop is for anyone trying to make sure that their organisation isn't in breach, and can deal with requests related to the GDPR.

GDPR isn't just a compliance project. It's a business culture change project. Let's struggle our way through together.

This will be an audience-driven workshop session, so bring your hardest questions.


Simon McGarr

Data Compliance Europe
Simon McGarr is recognised as one of Ireland’s leading experts in Data Protection. A practising solicitor, Data Protection consultant and external DPO, he has lectured in the Law Society, regularly appears on national media discussing data issues and was recently invited by the... Read More →
avatar for Laura Nolan

Laura Nolan

Laura Nolan is a software engineer whose fascination with failure and fragility in systems drew her into the field of Site Reliability Engineering. She is a contributor to "Site Reliability Engineering: How Google Runs Production Systems" and "Seeking SRE", and writes a quarterly... Read More →

Thursday August 30, 2018 09:00 - 12:30 CEST
4 - Leibniz Room

14:00 CEST

The Art of Debugging
Limited Capacity full
Adding this to your schedule will put you on the waitlist.

Are you one of those "gifted debuggers" that everyone turns to when they need to solve a difficult problem? Great! This workshop isn't for you. For the rest of us, debugging is often considered a mysterious trait that some engineers were born with, but alas, some simply aren't. This workshop is here to bust that myth.

In this workshop, we will practice a well-structured debugging methodology—conducting debugging "katas" with the aim of mastering debugging technique.

Let's stop using trial and error (and other witchcraft tactics) to find the cause(s) of our problems!

This workshop is for junior and senior engineers interested in improving their debugging methodology. Despite debugging being a very common activity, in real word scenarios noise, cognitive biases, system complexity, and production pressures can easily lead us astray. By training yourself in debugging methodology you can improve your real world performance under these harsh conditions.

avatar for Nati Cohen

Nati Cohen

HERE Mobility
Nati Cohen is a Production Engineer at Here Technologies and a Teaching Assistant at the Interdisciplinary Center Herzliya. Previous experience includes: operations consulting, software development, *nix administration and security research in the Intelligence Corps as well as in... Read More →
avatar for Avishai Ish-Shalom

Avishai Ish-Shalom

Engineer in Residence, Aleph VC
Avishai is a veteran operations and software engineer with years of high scale production experience. At present, Avishai helps growing startups and the Israeli high-tech eco-system as Engineer in Residence in Aleph VC fund. In his spare time, Avishai is spreading weird ideas and... Read More →

Thursday August 30, 2018 14:00 - 17:30 CEST
4 - Leibniz Room
Friday, August 31

09:00 CEST

Building Blocks of Distributed Systems
Limited Capacity full
Adding this to your schedule will put you on the waitlist.

All distributed systems make tradeoffs and compromises. Different designs behave very differently with respect to cost, performance, and how they behave under failure conditions.

It's important to understand the tradeoffs that the building blocks in your systems make, and the implications this has for your system as a whole. In this workshop we'll look at several examples of different real-world distributed systems and discuss their strengths and shortcomings.

This workshop will include some practical elements. You will be given some system designs to read and to evaluate, and then we'll discuss the implications of each design together as a group.

avatar for John Looney

John Looney

Production Engineering Manager, Facebook
John Looney has been an SRE since 2005, working with large distributed systems for Google and Facebook. He enjoys teaching SRE concepts with concrete examples. His day job is supporting teams that manage and deploy operating systems and firmware for Facebook.

Friday August 31, 2018 09:00 - 12:40 CEST
4 - Leibniz Room

14:00 CEST

Lessons Learned from Our Main Database Migrations at Facebook
At Facebook, we created a new MySQL storage engine called MyRocks. Our objective was to migrate one of our main databases (UDB) from compressed InnoDB to MyRocks and reduce the amount of storage and number of servers used by half. In August 2017, we finished converting from InnoDB to MyRocks in UDB. The migration was very carefully planned and executed, and it took nearly a year, but that was not the end of the migration. SREs needed to continue to operate MyRocks databases reliably. It was also important to find any production issue and to mitigate or fix it before it became critical. Since MyRocks was a new database, we encountered several issues after running in production. In this session, I will introduce several interesting production issues that we have faced, and how we have fixed them. Some of the issues were very hard to predict. These will be interesting for attendees to learn too.

Attendees will learn the following topics.

  • What is MyRocks, and why it was beneficial for large services like Facebook
  • What should be considered for production database migration
  • How migration should be executed
  • Learning 4-6 real production issues

avatar for Yoshinori Matsunobu

Yoshinori Matsunobu

Yoshinori Matsunobu is a Production Engineer at Facebook, and is leading MyRocks project and deployment. Yoshinori has been around the MySQL community for over 10 years. He was a senior consultant at MySQL Inc. from 2006 to 2010. Yoshinori created a couple of useful open source product/tools... Read More →

Friday August 31, 2018 14:00 - 14:50 CEST
4 - Leibniz Room

14:50 CEST

Have You Tried Turning It Off (and *Not* On Again)?
Productivity may not be a simple function of person-hours, but having zero person-hours available because the persons are preoccupied with legacy services will kill a launch or landing. Turning legacy off is important for sub-linear headcount growth.

The most promising targets for turndown are:

  • Old systems riddled with technical debt;
  • Internal tools and services nobody cares for anymore;
  • Yesterday's formerly-shiny ${THING} that ${THING2} has replaced;
  • Services at least one or two layers beneath what really matters to the organisation.

But there are always blockers even when the goal is clear and the motives are strong:

  • The system is in the critical path for something critical;
  • Other systems rely on it in some long-forgotten way;
  • The organisation's current "Death Star megaproject" is using it and you can't shift their migration timeline;
  • It offers some useful feature that the replacement doesn't have.

In this talk I will explain why turning things off (for good) is desirable, describe a few services my team was responsible for turning down, how they related to other services with illustrations, what made turning down possible, and how we got there in the end (or didn't get there).


Josh Deprez

Google Australia
Josh is a senior SRE at Google, where he is TL in a team internally renowned for turning things off. He has a PhD in Mathematics, a fact which is not relevant to the talk topic.

Friday August 31, 2018 14:50 - 15:30 CEST
4 - Leibniz Room