The range, type, and volume of data that companies now handle is growing exponentially. Ensuring that you have an understanding of where your data is—and where it came from—has become vital, particularly in light of increasing legislation and expectations around data protection. Regulations like HIPAA, the EU–US Privacy Shield, and the General Data Protection Regulation (GDPR) are already in place, and the California Consumer Privacy Act will come into play on January 1, 2020.
At the same time, the increasing adoption of DevOps is enabling companies to move from infrequent, big-bang releases to a constant stream of small releases in order to get features from the keyboards of developers into the hands of customers faster. That includes the database, because changes to front-end applications often require the back-end database to be updated as well. If continuously updating the database is excluded from your DevOps process, it hinders the ability to continuously deliver new functionality.
In order to be able to create, accurately test, and deploy database updates quickly and seamlessly, however, developers often need to work with copies of the production database that contain the very data that customers now expect to be protected. In a survey of almost five hundred SQL Server professionals, 83 percent of respondents said they want to use production data in development and testing, but they are naturally restricted due to concerns around data sensitivity, storage, and regulatory requirements. Without access to production-like data for testing, DevOps processes won’t produce releasable code.
It can appear that these two needs—a desire for end-to-end DevOps and the protection of sensitive data—are in conflict, but if done correctly, they can be two sides of the same coin. DevOps processes such as version control and delivery automation introduce the very measures needed to properly control, audit, and protect sensitive production data.
The key to keeping production data safe while using it during your DevOps process is to focus on four areas.
1. Discovery: Understand Where Your Data Lives
This may sound easy, but data within organizations tends to leak into lots of different places, so if it’s not protected during use, that can expose the organization to data privacy risks.
All customer data begins in production, but look to see where else it exists within your systems. Consider using automated data discovery tools to scan your network and cloud providers to get a full picture of all your data and where it is located.
Bear in mind that you have to be able to know the stages along the journey—it isn’t enough to know where data begins and where it ends up. For example, someone in the test team might set up a temporary testing database in Azure and then forget about it when they move on to something else. If this test database includes production data, it gives people inadvertent access to data in ways that haven’t been thought about or protected against.
The ultimate answer is to create a record of every server and backup, copy, and legacy system in order to gain a real understanding of where the data flowing through your delivery process is stored and used, as well as who has access to it. Use configuration management tools and DevOps automation to track where data resides, and set up access properly so data is not left unprotected.
2. Classification: Determine the Data’s Sensitivity
Once you’ve mapped out where your data resides, you need to understand its type. Is it sensitive material? Does it include personally identifiable information? By classifying your data by type, you can see what needs to be protected when it’s shared with those in your development and delivery process.
A key method for protecting sensitive data while still providing access to those who need it is through masking, or replacing sensitive data with realistic, anonymized test data.
When it comes to masking, it’s too time-consuming and complex to mask everything. Instead, you should drill down to the column level and work out what information must be masked for each purpose. For example, data on what has been ordered from a retailer can often be displayed, but customer names and personal information about the individual who made the purchase will need to be masked.
3. Protection: Understand Purpose and Needed Access
In most companies, different teams need access to different data in order to do their jobs. This means the approach to protecting that data may vary and needs to be controlled.
Development teams, for example, want realistic copies of the database to test their updates against so that breaking changes can be identified at the time they are made rather than later in the delivery pipeline. Here, all sensitive data should be masked, and access to this data must be tightly controlled. Once the masking rules for this type of database access have been decided, there are tools that can automate both the masking and provisioning of database copies with the appropriate access permissions, ensuring that the development team has what they need without exposing sensitive information to others.
Business intelligence teams, meanwhile, may want to analyze data for marketing, sales, or management requirements. Knowing what products have been bought, what other products are bought at the same time, how much customers are spending, and where customers are located would all be valuable information. In this instance, partial masking would be more appropriate so that information like the products that sell best in which zip codes can be seen, while individual customer names and addresses remain hidden.
4. Monitoring: Ensure Ongoing Protection
Database monitoring has typically focused on performance, as organizations want to know about issues that impact customer usability and overall performance, such as slower than usual queries, deadlocks and blocking processes, large file sizes, and growth trends.
New data privacy regulations mean database monitoring must be taken to an entirely new level. Organizations are now required to monitor and manage access, ensure data is available and identifiable, and report when any breaches occur. Organizations also need to know and have a record of which servers and what data are being managed, and they must be able to discover the reason for any issues quickly and accurately.
Should a data breach occur, it becomes even more crucial for appropriate monitoring to be in place, as organizations are obligated under privacy laws to disclose the nature of any breach, the categories and number of customers impacted, the likely consequence, and the measures taken to address the underlying issue.
All of these privacy concerns make an advanced monitoring solution coupled with strong configuration management a necessity. This enables organizations to both monitor availability of servers and databases containing personal data and be alerted to issues that could lead to a data breach before it happens.
Defending Data through DevOps
Data privacy and protection are becoming legal requirements for many organizations, and a moral duty for others. Regulations are obliging organizations to put controls and measures in place to protect personal data, and customers also are becoming more aware of their rights and freedoms.
As a direct consequence, organizations can no longer afford to ignore the threats to the personal data they hold, particularly when they are speeding up deployments through a DevOps process. Fortunately, DevOps practices such as automated configuration management and access control support the need to protect sensitive data. By identifying where data is and what it is, masking sensitive data appropriately, and monitoring databases in use, breaches and exposure can be minimized. Introducing DevOps processes to help protect your databases will both enable compliance and allow continuous delivery of value to customers.
I would also suggest understanding "compartmentalization" (i.e., don't create a large database with all of your customer info in it), and the programming concept of "privilege separation". See my article on these: http://www.transition2agile.com/2015/01/cloud-based-apps-are-extremely.html