(This DevOps Has IT Heroes Sleeping Through the Night article was originally published in Wired’s Innovation Insights by James Brown, CXO at JumpCloud.)
IT admins and DevOps pros are used to fighting fires. They are up at all hours dealing with problems, outages, and failures. Unfortunately, that’s part of their job description. When the proverbial stuff hits the fan, they are the ones that get called. As more businesses have moved to online commerce, delivery, and operations, the problem has only gotten worse. It used to be that you ran your own data centers, network, servers, and applications. If any one of those things on the stack broke, you were on-call and on-site trying to fix it. The whole organization counted on these modern-day firefighters. The best in the business had a swagger about them. They knew that they could save the day when somebody made a mistake, a piece of equipment broke, or a downstream provider was off-line. These IT rockstars would dive right in, assess the situation, quickly poke around to gather some data, and get to work on resolving the problem. A culture was built up in IT around saving the day – or night as was usually the case. It was an adrenaline rush to walk in and be the hero.
How DevOps is changing IT
DevOps is challenging that notion in a serious way. These days, heroes are sleeping through the night. Our favorite protagonist is leveraging this new methodology to build better, more resilient systems so that, even with major failures, the company still operates. Many are familiar with Netflix’s infamous resiliency testing (and their open source tools, The Simian Army, and the infamous Chaos Monkey) where they introduce random events and errors into their production environment just to ensure that their systems are built well enough to withstand these types of failures. For example they will simulate a network cable unplugged or a disk array being turned off or even a network connection offline. Any of these failures historically would have been catastrophic, but these days it’s just a blip on an otherwise normal day.
Why are the best in the business getting their 8 hours a night of sleep? There are several solid reasons:
- DevOps is changing the way IT runs its business.
- There is more planning, and shorter-term milestones helping to increase quality.
- DevOps and IT pros are monitoring more of the stack more deeply.
- They’re generating more telemetry than ever, which is translating into more valuable insight.
- Finally, they’re managing their infrastructure more tightly to help avoid problems before they occur.
DevOps as a practice is built on short-term, iterative releases of functionality integrated into the entire organization. Pushes to production are short – in some companies, often multiple times a day. Functionality can be updated, changed, and released in a matter of hours. This – by its very definition – requires greater collaboration, increased automated testing, and checks and balances from security and operations. In short, DevOps is instilling higher release quality through its methodology. It is just not possible to manually test all areas anymore in the time frames before release. Testing, quality, security, and scalability all must be built into the process rather than as a feature. DevOps and IT leaders are therefore knowingly increasing the quality of their infrastructure.
Wise IT personnel know that things break, all the time and all over the place. So they don’t try to prevent every failure, they just try to know that one is about to happen through intelligent monitoring. DevOps as a culture and methodology loves data. The more telemetry that these folks can generate and analyze, the better insight that they have into their infrastructure. The better the insight, the more time they have before they have to deal with an issue that could take down their systems. Dovetailed with short, iterative sprint cycles, a budding problem can be quickly addressed before it becomes a major issue. Deep monitoring can go a long way towards keeping our heroes sleeping soundly.
Modern day IT admins and DevOps pros are students of management. They know that orchestrating their resources to produce the right results is critical to avoiding failures. Executing on key processes on a regular basis and having those planned, logged, and audited is a prerequisite to success. Server management tools that help build and execute tasks such as managing users, maintenance of servers/systems, security patches, logging and analysis, and others avoid future problems. An ounce of prevention is often worth pounds of cure as far as leading DevOps and IT admins are concerned. With amazing tools and technology to support their server orchestration efforts, there’s good reason why the best are savvy about what resources are available to them. Fifteen years ago, at the height of the dot com bubble, system administrators were burning the candle at both ends. With no cloud, Agile, or DevOps to help them, they were making it happen through sheer force of will and effort. As far as modern IT is concerned, those days are gone, and that’s for the best. The most forward-thinking pros are leveraging a better methodology in DevOps, a thirst for monitoring data, and the best practices available for managing servers and systems, to literally automating their IT infrastructure to be more resilient and stronger.