Eitan Schichmanter’s team evolved its development process from a focus on build management to a more robust DevOps transformation, and they designed a custom gated check-in system to help. He shares how they tackled challenges and created an internal solution to address an important technical requirement.
One of the main challenges of any build system is returning fast feedback and retaining the integrity of the release branch. In DevOps, we call this a feedback loop. To achieve fast feedback, a continuous integration build system is created—getting the latest code on the branch, compiling it, running tests on the artifacts, deploying bits (where applicable), and running more advanced automatic tests on the deployed application.
There’s always the conflict between running as many tests as possible and retaining some semblance of fast feedback. (How fast is “fast”? Will you receive a reply from the build system that it is OK after five minutes? Fifteen minutes? Sixty-three minutes?)
Usually a compromise is the right course. This means running the minimal set of tests that ensure sanity and stability while keeping the full battery of tests for a later stage in the build lifecycle. This often occurs in either the continuous build phase or the continuous deployment phase.
However, to achieve the ability to retain the integrity of the release branch, a lot more thought needs to be invested in the process. As soon as a developer commits his code to the release branch, it becomes part of the release code, and if the ensuing build fails on that code, the release branch’s integrity is compromised.
The Gated Check-In System
There are many flavors, solutions, and implementations to the gated check-in, but the idea is simple: Stop the code from being committed to the release branch, verify it offline on a copy of the most recent release code, and pass it back only if the verification is successful.
They say necessity is the mother of invention, and we were certainly in need of a new solution. Working hard to add relevant tests and maintaining the release branch integrity became very important and time-consuming tasks for us.
We conducted internal statistics, and they revealed an alarming rate of 15 percent to 30 percent of failed builds on the release branch due to various reasons (code, test, and environment issues), all contributing to the poor stability of the release branch.
Moreover, this caused a severe decline in the developers’ velocity and led to receding trust in the build system. In cases where it was a code issue, once the code is already committed, a storm of “Who’s on it?” emails arises because the break is virally spread throughout the development teams.
Each has its pros and cons, but we couldn’t find a streamlined, simple, and developer-centric solution that is still open to any development stack and yet detached and nonintrusive.
After a lot of internal brainstorming we decided to develop our own internal solution. We wanted it to focus on aligning with the following aspects:
- Nonintrusive (doesn’t change the developer’s workflow)
- Easy to use and maintain (by developers and DevOps)
- Stand-alone and development-stack agnostic
- Lightweight and simple to enhance
We code-named our project “EverGreen” to emphasize our goal: keeping the release branch as green as possible. Evergreen is not a commercial HP product for sale. It is a tool that we developed for our own needs, and we are working on contributing it to the open source community. As you read through the details of how we created Evergreen, you may want to consider the challenges you are facing, and perhaps even solutions that you can create.
Into the Details
Our infrastructure technology stack is Git and Jenkins, so our starting point was this:
Users push their code to the Git provider and a build is triggered in Jenkins, failing 15 percent to 30 percent of the time.
Our solution is rather simple:
It has two server-side components:
Pre-Receive Push Hook—Intercepts the push operations from the developers
EverGreen Collector—A server-side component that:
- Receives the commit information from the hook
- Checks if it has information on the commit
- If not:
- Creates an ad-hoc temporary branch off the tip of the current latest code
- Runs a verification process (using REST) against Jenkins
- If the verification is successful (and warning is not a success criteria in our view), the collector then pushes the commit(s) back to the original protected branch. Otherwise, a notification is sent to the committer and he’s the only one affected by the failure.
The algorithm deals with multiple concurrent commits by adopting the optimistic algorithm approach. If we already have commit a running a verification in the system and commits b and c enter the system within the time span of the verification of commit a, the subsequent commits are merged on top of a (and not on top of the previous HEAD) and verification starts with the superset of the commits.
If one of the commits fails along the way, we fail that permutation and fall back to a previous permutation. (So if a + b + c fails, we try to build a + b if a is still in an unknown status.)
We developed a dedicated UI dashboard that shows the status of the various commits in the system:
This way, the developers are able to follow the commits and their progress and understand where they stand in terms of their individual commit.
To further the visibility and clarity of the verification build, a custom Jenkins plug-in was developed, showing exactly what is being built and verified:
Developing the system using best practices and utilizing latest technologies (the EG Collector is written in Java, and the UI is in AngularJS) yielded impressive results:
# of Commits
684 in 4 months
1.43 / 1.62
968 in 6 months
1.25 / 1.62
1001 in 6 months
1.31 / 1.76
581 in 5 months
1.32 / 1.62
661 in < 1 month
1.86 / 2.90
Average failure rate intercepted: 24.85%
Average commit/hour increase: 23%
The interception rate is the number of failed builds that were intercepted by EverGreen and have not been inserted into the protected (release) branch.From the table we see a few things:
This means the protected branch was at least 15 percent less prone to build breaks. Our overall average interception rate holds at 24.85 percent.
- Throughout the teams we see a distinct commit/hour ratio increase—the average is 23 percent. This can be attributed to two facts:
- Developers have more trust in the system, so they commit more often and their commits are more atomic, resulting in smaller, testable changes.
- Due to the build break nature (some of the breaks are not in code but in the environment), now developers push more to get their feedbacks sooner. This leads to an increased number of commits.
We’ve added this system to a number of groups and have had real success integrating it, receiving great results spanning geographies, cultures, and technologies.
The Road Ahead
We are in the process of contributing this software to the open source community so it will have its own lifecycle and will gain its own momentum. We have our backlog and receive feedback from our internal customers (the research and development teams working with EverGreen), so we’re on the fast track to improving and fine-tuning it to our needs.
It’s important to add that unlike Gerrit or Zuul, EverGreen doesn’t aim to be a code review system that does gated check-in. Instead it is a stand-alone, best-practice gated check-in system that gives developers the freedom of using their own code review tools without committing to a specific tool or system.
We believe in EverGreen and see its results in our daily work. We even use EverGreen on EverGreen to ensure our own commits are valid and tested before they make their way into our releases.
Adding a gateway system helps reduce code issues, ensuring peace and quiet for the developers and release managers. I hope this account inspired you to truly think about your team’s needs and consider whether a solution like this may help improve your code.