The matrix of infrastructures and platforms developers require presents a number of different options to set up and administrate the ecosystem. Rami Honig describes his company’s journey to create a robust development environment that is functional and collaborative.
This article describes my company’s journey to create a robust development environment using the tools and automation required for an efficient development process.
Bintray development started like many rogue projects: with one of JFrog’s founders turning an idea into code on his laptop in the small hours of the morning. As code developed into functionality, the company’s VP of development was brought on board, and a proof of concept was born. At that point, the company took a strategic decision to turn this into a product, and a team of three full-time developers was put together.
Looking forward, the development lead correctly predicted a challenge in getting everyone working in a productive and collaborative environment.
First, there was the list of infrastructures being used in Bintray’s classic three-tier architecture.
● Grails:Front-end web application
● Nginx: Web server and load balancer
● Grails: Communication protocols to the back end
● Redis: Application cache
● CouchDB: Manages checksum-based storage
● MongoDB: Database for all Bintray application entities
● Elasticsearch: Searches for files in the databases
● ObjectStore: Stores binaries indexed by their checksums
● Amazon CloudFront CDN: Speeds up downloads
As a company that encourages diversity and respects each developer’s preferences, we also realized that we would need to support all these infrastructures on different platforms. The three Bintray pioneers were already developing on Windows and Mac, and we knew that a Linux developer would be coming on board pretty soon. Each developer would need four or five different servers installed for their platform in order to write code.
This matrix of infrastructures and platforms presented a number of different options to set up and administrate the ecosystem.
The Central Approach
One approach is to install all the infrastructure components on central servers, then configure access for each developer with corresponding user accounts, database schemas, and permissions. A big advantage of this approach is that only one administrator (usually, the company’s system administrator) needs to be very familiar with all the details of configuring these services.
The problem with this approach is that it is easy for developers to interfere with each other’s work. Also, not all of the databases provide good support for separate schemas for each of the users. One developer could easily overwrite another developer’s data. Even if you implement a separate (but centralized) installation for each developer, this requires good connectivity between the developer’s computer and the central infrastructure, which is not always the case when developers work outside of their LAN—they experience poor or no Internet connection when working from home, slow connections via VPN, etc.
The Local Approach
Another approach is for each developer to install all the infrastructures locally on his or her computer. This would certainly prevent one developer overwriting another’s database, and connectivity is no longer an issue.
However, this approach is very time-consuming and requires the developer to install and configure four or five servers before writing one line of code. There are many technical details that a developer must be familiar with in order to maintain all those infrastructure components and get them all to work together. During ongoing development, developers frequently need to reinstall databases and other units that may get corrupted during development trials. The full onboarding process for a new developer could easily require a full week, and any developer who’s happy to get a new computer is quickly dismayed when she realizes she’s going to be out of commission for a week.
Once all the developers have everything installed, each may be running on a slightly different flavor of the ecosystem—and each of those flavors may be different from the production ecosystem on which the final product needs to run. Component versions may be different, patches and updates installed may be different, and even the same version of a component may behave differently on one platform compared to another. At some point in time, every developer will end up using that familiar excuse: “But it works on my computer.”
The Magical Solution of Virtualization
Virtualization solves the issues of both the central approach and the local approach. In our ecosystem we use three basic components to manage our infrastructures.
The first component is Chef, a systems integration framework that brings configuration management to infrastructures. This framework aptly defines a fundamental unit of configuration as a “cookbook” and uses scripts, also appropriately called “recipes,” to install required components on the target machine.
While Chef is usually run in a client-server configuration, there is an option to run recipes independently using Chef Solo (provided, of course, that you have the required recipes available locally on your disk). So we have our system administrator write the installation recipes, and we store those on GitHub with our source code. This was an important mind shift for us. It means that our system administrators still manage the environment on our development machines, but they do it by writing source code. These recipes, which are needed to build the environment on which our source code will be compiled and run, are stored with the application source code itself.
But this is not enough. Because each developer runs on his preferred platform, it means they have to run the right flavor of the cookbook for their computers—and they have to do it each time they want to install a component. This brings us to the second component of our virtualization solution: Vagrant. Vagrant uses Oracle’s VirtualBox to dynamically build lightweight and portable virtual machines. So we use Vagrant to run CentOS within VirtualBox, and because Vagrant knows how to run Chef recipes, it then installs all of our infrastructures on this virtual machine.
Once our system administrator writes the relevant installation scripts, we have a simple two-step process for the developer:
1. Vagrant boots up a CentOS in VirtualBox.
2. Vagrant runs the Chef recipes to install the databases and service RPMs from a private YUM repository.
For development, that’s enough, but to keep our integration and production environments coherently running the right version of everything, we use Jenkins, our third virtualization component. Our integration platform is not subject to the whims and preferences of any developer, so we can easily just decide that it will be Linux, and therefore we don’t need Vagrant. The Chef recipes we run for our developers within Vagrant will still work, so we do use those with Jenkins to pull the right sources and recipes that build our integration releases.
Our production platform adds another twist. We don’t want to run our production systems on internal hardware; we need them on professional infrastructures hosted through the cloud. But that challenge is easily met because all the infrastructures that we use are available as professional cloud services. While we prefer software as a service because it’s more tailored and cost-effective for our needs, some of the components we use (Grails and Redis) are only available on the cloud as an infrastructure. We use jclouds as an abstraction layer to remain independent of a particular cloud provider, currently running in a SoftLayer CentOS. Chef also offers its infrastructure on the cloud (running with the Chef client-server configuration rather than with Chef Solo), MongoHQ provides our Mongo database, and Cloudant runs our CouchDB on the cloud.
The benefits of using virtualization are clear:
1. No installation or maintenance hell
With a single command line, the developer can get a spanking-new system ready for coding.
2. Consistency between development and integration
Setting up the development and integration environments is very similar, and they use the same Chef recipes.
3. Easy to implement and distribute configuration changes
Any configuration changes can be quickly coded by the DevOps team who has the core knowledge to manage them, and distribution is a simple matter of uploading a new Chef recipe to GitHub.
4. A developer-friendly solution
Developers can make any changes they want to their databases or infrastructure. When anything gets corrupted or unusable, it’s easy to just kill the VirtualBox and fire up a new one.
There are also a couple of minor drawbacks:
1. Chef Solo is not the same as Chef client-server
Recipes written for Chef Solo don’t always work smoothly on the full client-server Chef configuration. We sometimes need to tweak the recipes we use on the development and integration environments to adjust them to our production environment. While this adds some complexity to the overall process, it’s a small hardship that’s not very difficult to overcome.
2. Black-box installation
Because installing all the infrastructure is automatic, it becomes a “black box” for developers and limits their knowledge about the fundamental components they use in developing their product.
Virtualization has proven very effective in our development efforts. By abstracting away the tweaking needed to adjust the various tools and infrastructures to the different platforms, setting up a new system becomes easy. The next step in our developmental evolution is that hot new kid on the virtualization block: Docker. As we start migrating our environment to this new technology, it’s rapidly becoming clear that the flexibility Docker offers, along with the fast and lightweight virtual machines it runs, is the way to go for multifaceted development.
The journey to a robust development environment can be very challenging, but by putting the pieces together correctly, you will give your team the tools needed to efficiently deliver high-quality code.