DevOps is evolving with some potentially very harmful choices embedded in it. Among these are poor adoption of sound computer science, little thought to the maintainability of DevOps code, and choices of tools based solely on productivity without concern for maintainability. Will this cause DevOps to fail to live up to its potential?
DevOps is a rapidly growing philosophy within the IT community, espousing close collaboration between software developers and system operations. DevOps promotes that everyone involved in the design and construction of software should think of the production environment as their target. DevOps is how companies like Netflix and Amazon are able to maintain and enhance their software with few hiccups while serving tens of millions of people at a time. DevOps also has been touted as the new agile—as practiced by those who “do it right,” anyway.
The refrain “They did not do it right” is often used when Scrum has been implemented in an organization but the results are not what was expected. We now understand the concept of “being agile” versus “doing agile”: To “do agile” is to mindlessly follow the practices defined by an agile methodology such as Scrum—with poor results. In contrast, to “be agile” is to truly embrace agile's intentions and apply it in a way that leads to actual increased agility, quality, and value.
Is DevOps headed for a similar struggle to “do it right”? I think the direction it’s moving might cause it to not deliver on its potential. For example, among many DevOps engineers I hear constant complaints of unreliability and a continuous stream of problems with different causes. It is also clear that DevOps tools have completely ignored the hard-won lessons of computer science. DevOps users often build homegrown frameworks of great complexity, ignoring the agile value of simplicity—a value that is also paramount to reliability. Those who choose the tools seem to make their choices based on personal productivity rather than lifecycle maintainability. Finally, DevOps teams have recreated the batch job, with all of its shortcomings.
Let’s examine these challenges in more detail.
A Pattern of Unreliability
Between 2001 and 2006 I specialized in consulting for clients who had been experiencing chronic problems with IT systems but had been unable to find a common cause to fix. The root problems had to do with systems that had been created in a hurry by very smart individuals who were key to the system’s maintenance but were assigned to other efforts, leaving the systems to grow unreliable. Over time, the frequency of anomalies would increase.
These systems were not poorly designed, but they were complex—in one case, there was a PERL loop that was nested twelve levels deep!—and wholly without comments of intent within the code.
I see the same thing developing today in IT groups that are trying to practice DevOps: lots of homegrown frameworks of great complexity, with little documentation and lots of tribal knowledge needed to maintain them. This is a prescription for great headaches down the road as the gurus who created these frameworks move on.
Lessons of Computer Science Ignored
Most of today's DevOps tools rely on Ruby. Ruby is a language designed and maintained by an individual known as Matz. Recently I had to build a multithreading application and was shocked to discover that Ruby threading is broken. Ruby has threads, but they are not truly concurrent because (as of this writing) Ruby maintains a “global interpreter lock” to serialize the actions of the threads. The threads operate like an old time-sharing system, going one at a time. Sun got Java threads working in short order. Ruby is now two decades old, and threads still don't work? To be fair, Java had the support of Sun, but that is kind of the point: Java was nurtured into a robust enterprise language, with sufficient resources to achieve that.