DevOps is evolving with some potentially very harmful choices embedded in it. Among these are poor adoption of sound computer science, little thought to the maintainability of DevOps code, and choices of tools based solely on productivity without concern for maintainability. Will this cause DevOps to fail to live up to its potential?
DevOps is a rapidly growing philosophy within the IT community, espousing close collaboration between software developers and system operations. DevOps promotes that everyone involved in the design and construction of software should think of the production environment as their target. DevOps is how companies like Netflix and Amazon are able to maintain and enhance their software with few hiccups while serving tens of millions of people at a time. DevOps also has been touted as the new agile—as practiced by those who “do it right,” anyway.
The refrain “They did not do it right” is often used when Scrum has been implemented in an organization but the results are not what was expected. We now understand the concept of “being agile” versus “doing agile”: To “do agile” is to mindlessly follow the practices defined by an agile methodology such as Scrum—with poor results. In contrast, to “be agile” is to truly embrace agile's intentions and apply it in a way that leads to actual increased agility, quality, and value.
Is DevOps headed for a similar struggle to “do it right”? I think the direction it’s moving might cause it to not deliver on its potential. For example, among many DevOps engineers I hear constant complaints of unreliability and a continuous stream of problems with different causes. It is also clear that DevOps tools have completely ignored the hard-won lessons of computer science. DevOps users often build homegrown frameworks of great complexity, ignoring the agile value of simplicity—a value that is also paramount to reliability. Those who choose the tools seem to make their choices based on personal productivity rather than lifecycle maintainability. Finally, DevOps teams have recreated the batch job, with all of its shortcomings.
Let’s examine these challenges in more detail.
A Pattern of Unreliability
Between 2001 and 2006 I specialized in consulting for clients who had been experiencing chronic problems with IT systems but had been unable to find a common cause to fix. The root problems had to do with systems that had been created in a hurry by very smart individuals who were key to the system’s maintenance but were assigned to other efforts, leaving the systems to grow unreliable. Over time, the frequency of anomalies would increase.
These systems were not poorly designed, but they were complex—in one case, there was a PERL loop that was nested twelve levels deep!—and wholly without comments of intent within the code.
I see the same thing developing today in IT groups that are trying to practice DevOps: lots of homegrown frameworks of great complexity, with little documentation and lots of tribal knowledge needed to maintain them. This is a prescription for great headaches down the road as the gurus who created these frameworks move on.
Lessons of Computer Science Ignored
Most of today's DevOps tools rely on Ruby. Ruby is a language designed and maintained by an individual known as Matz. Recently I had to build a multithreading application and was shocked to discover that Ruby threading is broken. Ruby has threads, but they are not truly concurrent because (as of this writing) Ruby maintains a “global interpreter lock” to serialize the actions of the threads. The threads operate like an old time-sharing system, going one at a time. Sun got Java threads working in short order. Ruby is now two decades old, and threads still don't work? To be fair, Java had the support of Sun, but that is kind of the point: Java was nurtured into a robust enterprise language, with sufficient resources to achieve that.
The problem is that Ruby—the “new PERL”—is not well-suited to building systems that stay maintainable and reliable over time. Let me explain.
- Lack of Component Contracts
By the early 1980s the computing profession had realized that the freewheeling days of C programming and bash scripts had led to unmaintainable code, and one of the reasons was that these languages had no way to break a system up into pieces with well-defined behavioral boundaries. Software engineering pioneer David Parnas called the use of such boundaries “encapsulation.” Most languages developed during the following two decades therefore had mechanisms for encapsulation, and these languages were referred to as “object-oriented.” For example, both Java and C++ allow the programmer to define “interfaces” that specify a contract an object must adhere to.
This is relevant and important for DevOps, because in DevOps one has a lot of passing around of things. For example, the tools Vagrant and Chef are widely used as a scripting framework for provisioning virtual machines. In a Vagrant script, one often sets JSON attributes that get read by Chef scripts, or “recipes.” If any attribute value is missing or set incorrectly, things will blow up—at runtime. This is exactly the kind of thing that interfaces are intended to prevent. Chef recipes also often define a kind of makeshift interface in the form of an attribute folder, but this is, frankly, so FORTAN. It reminds me of common blocks. And here we are in 2014.
- Lack of Closure
Ruby has very limited closure features. Here, I am not referring to closures, which are essentially method objects and which Ruby does have; I am referring to the computer science and mathematical concept of closure, which is important for composability, a system design principle that is critical for system security and reliability. In this context, closure refers to the ability to create a boundary around something. Closure is extremely important for maintainability because it enables program components to define their own little playground, and programmers don't have to worry that everything in the code base might be affecting the code they are looking at.
Languages such as Java, Ada, and C++ have very robust closure features. For example, in Java you can import a package and reference things in it. Ruby can do this too. But in Java, if that package itself imports another package, you can't reference things in that other package unless you use a fully qualified name. Not so in Ruby; in Ruby, things can come in from all over the place, leading to a nightmare when tracking down origins. This is not good for maintainability.
I Want Candy
Management often feels that developers should be able to choose their own tools, but that does not mean there should be no oversight. Think about what developers consider when choosing a tool: What makes me most productive? (Not “What will make others who have to maintain my code more productive?”)
Today's DevOps teams in organizations around the world are picking tools for their organizations—foundational tools. And they are choosing based on what makes them most productive, and also what is popular today. They are not picking tools based on what will be maintainable.
The Batch Job Returns
I really thought the batch job was dead. Yet I am seeing it return in the form of the Jenkins job.
A developer tests something locally on his laptop, and it works. But that does not mean it will work in the cloud, so he submits a Jenkins job. He watches while Jenkins waits to obtain a slave to execute the job. Then finally it starts and finishes. It finds a syntax error in code that did not run locally. (Ruby doesn't catch errors in code that it does not execute—one of the joys of an interpreted language.) So the developer fixes the error, pushes the code to Git, and waits for Jenkins again. It’s a batch job!
It is time to reassess where we are headed with DevOps. Are we creating things that will become unmaintainable legacy in short order?
Perhaps it is time to start demanding that maintainability be a strong factor to consider when deciding whether to develop homegrown frameworks and choosing which tools to use.
Perhaps it is time to start demanding that sound computer science principles be applied in the growing amounts of code our DevOps teams are creating. DevOps is not system administration; it is agile software development that requires system administration knowledge. Are the right people doing it? That is, people who have experience building maintainable systems?
Perhaps it is time to start demanding simplicity and focusing on how well application teams can learn the DevOps methods themselves, with the DevOps team acting as coaches and instructors instead of savants and gurus who create magic frameworks.
Perhaps it is time to not repeat the mistakes of the 1970s, but this time in the new mainframe of the cloud.