Real-World SRE
上QQ阅读APP看书,第一时间看更新

Chapter 2. Monitoring

Monitoring is defined by Oxford Dictionaries (https://en.oxforddictionaries.com/definition/monitor) as to "observe and check the progress or quality of (something) over a period; keep under systematic review." This definition points out two crucial details—firstly, you need to define what quality is and make sure that your system is making progress toward, or staying within, a limit of quality. Secondly, you need to be systematic about this work—you should not be randomly looking at your system. Instead, your approach should be consistent. The need for systematic measurements is one reason that your dentist asks you to come in every six months, or a reason why some insurance companies ask you to get a dedicated primary care doctor.

In this chapter, we will be focusing on the tools and methodology of monitoring modern web services. The chapter will include thoughts on what data to collect, how to collect that data, how to store that data, and how to display that data for developers and those who will find it useful. We will also talk about communication about monitoring, why monitoring is essential, and how to get everyone in a company invested in monitoring.