LEADER 00000cam a22004697a 4500 003 OCoLC 005 20240129213017.0 006 m o d 007 cr |n||||||||| 008 230409s2023 enk o 000 0 eng d 019 1378205075 020 9781804614563|q(electronic bk.) 020 1804614564|q(electronic bk.) 035 (OCoLC)1375289911|z(OCoLC)1378205075 037 9781803239224|bO'Reilly Media 037 10251327|bIEEE 040 YDX|beng|cYDX|dUKAHL|dORMDA|dOCLCF|dIEEEE|dOCLCO 049 INap 082 04 620/.00452 082 04 620/.00452|223/eng/20230509 099 eBook O'Reilly for Public Libraries 100 1 Proffitt, Jeremy,|eauthor. 245 10 BECOMING A ROCKSTAR SRE|h[electronic resource] : |belectrify your site reliability engineering mindset to build reliable, resilient, and efficient systems /|cJeremy Proffitt, Rod Anami.|h[O'Reilly electronic resources] 250 1st edition. 260 [England] :|bPACKT PUBLISHING LIMITED,|c2023. 300 1 online resource 505 0 Table of Contents SRE Job Role - Activities and Responsibilities Fundamental Numbers - Reliability Statistics Imperfect Habits - Duct Tape Architecture and Spaghetti Code Essential Observability - Metrics, Events, Logs, and Traces (MELT) Resolution Path - Master Troubleshooting Operational Framework - Managing Infrastructure and Systems Data Consumed - Observability Data Science Reliable Architecture - Systems Strategy and Design Valued Automation - Toil Discovery and Elimination Exposing Pipelines - GitOps and Testing Essentials Worker Bees - Orchestrations of Serverless, Containers, and Kubernetes Final Exam - Tests and Capacity Planning First Thing - Runbooks and Low Noise Outage Notifications Rapid Response - Outage Management Techniques Postmortem Candor - Long-Term Resolution Chaos Injector - Advanced Systems Stability Interview Advice - Hiring and Being Hired Appendix A The Site Reliability Engineer Manifesto Appendix B The 12-Factor App Questionnaire. 520 Excel in site reliability engineering by learning from field-driven lessons on observability and reliability in code, architecture, process, systems management, costs, and people to minimize downtime and enhance developers' output Purchase of the print or Kindle book includes a free eBook in the PDF format Key Features Understand the goals of an SRE in terms of reliability, efficiency, and constant improvement Master highly resilient architecture in server, serverless, and containerized workloads Learn the why and when of employing Kubernetes, GitHub, Prometheus, Grafana, Terraform, Python, Argo CD, and GitOps Book Description Site reliability engineering is all about continuous improvement, finding the balance between business and product demands while working within technological limitations to drive higher revenue. But quantifying and understanding reliability, handling resources, and meeting developer requirements can sometimes be overwhelming. With a focus on reliability from an infrastructure and coding perspective, Becoming a Rockstar SRE brings forth the site reliability engineer (SRE) persona using real-world examples. This book will acquaint you the role of an SRE, followed by the why and how of site reliability engineering. It walks you through the jobs of an SRE, from the automation of CI/CD pipelines and reducing toil to reliability best practices. You'll learn what creates bad code and how to circumvent it with reliable design and patterns. The book also guides you through interacting and negotiating with businesses and vendors on various technical matters and exploring observability, outages, and why and how to craft an excellent runbook. Finally, you'll learn how to elevate your site reliability engineering career, including certifications and interview tips and questions. By the end of this book, you'll be able to identify and measure reliability, reduce downtime, troubleshoot outages, and enhance productivity to become a true rockstar SRE! What you will learn Get insights into the SRE role and its evolution, starting from Google's original vision Understand the key terms, such as golden signals, SLO, SLI, MTBF, MTTR, and MTTD Overcome the challenges in adopting site reliability engineering Employ reliable architecture and deployments with serverless, containerization, and release strategies Identify monitoring targets and determine observability strategy Reduce toil and leverage root cause analysis to enhance efficiency and reliability Realize how business decisions can impact quality and reliability Who this book is for This book is for IT professionals, including developers looking to advance into an SRE role, system administrators mastering technologies, and executives experiencing repeated downtime in their organizations. Anyone interested in bringing reliability and automation to their organization to drive down customer impact and revenue loss while increasing development throughput will find this book useful. A basic understanding of API and web architecture and some experience with cloud computing and services will assist with understanding the concepts covered. 590 O'Reilly|bO'Reilly Online Learning: Academic/Public Library Edition 650 0 Reliability (Engineering) 650 0 Computer engineering. 650 6 Fiabilité. 650 6 Ordinateurs|xConception et construction. 650 7 Computer engineering|2fast 650 7 Reliability (Engineering)|2fast 700 1 Anami, Rod,|eauthor. 776 08 |iPrint version:|z9781804614563 776 08 |iPrint version:|z1803239220|z9781803239224 |w(OCoLC)1361689285 856 40 |uhttps://ezproxy.naperville-lib.org/login?url=https:// learning.oreilly.com/library/view/~/9781803239224/?ar |zAvailable on O'Reilly for Public Libraries 938 YBP Library Services|bYANK|n19667518 938 Askews and Holts Library Services|bASKH|nAH41068694 994 92|bJFN