Why NoSQL Matters and What Configuration Management Experts Need to Know About it

Summary:
NoSQL is a new approach to data storage that addresses many problems associated with relational databases. Many highly popular websites—including Facebook, Netflix, and Digg—are using NoSQL to crunch large volumes of data. Eugene Dvorkin gives us insight into why NoSQL is important and what CM experts need to know about it.

NoSQL is a new approach to data storage that addresses many problems associated with relational databases. Many highly popular websites—including Facebook, Netflix, and Digg—are using NoSQL to crunch large volumes of data.

Information and applications are now often delivered via different channels, including the web, mobile devices, and complex social networks. Many technology professionals refer to Big Data as the approach for handling these channels that are required to crunch huge amounts of information that enterprises want to capture and use. NoSQL is a good fit for storing and accessing such humongous volumes of data. Technology professionals often find that it can be challenging to tackle new technologies and approaches. This article will give you some insight into why NoSQL is important and what CM experts need to know about it.

Developers often find that new challenges require new approaches and creative technical solutions to help their organization achieve its goals. Most large systems rely upon systems to manage large stores of data, and for many years we software developers have developed applications using relational databases. For the most part, relational databases have served us well and will be with us for many years to come.

So what happens when people suddenly develop and use an alternative solution for database storage? To answer this question involving the use of a new way to store and retrieve data, we need to consider what’s wrong with the relational database model.

Let’s take a look at scalability. With the explosive growth of applications on the web, mobile devices, and social networking websites, we have seen a corresponding growth in traffic and users’ expectations for instant information. Scalability has become a very big challenge to application developers and architects and relational databases often cannot meet speed and performance requirements.

Relational databases generally require that you purchase bigger hardware when the database size exceeds a certain limit. As database grows, we have to add additional resources to single database server, usually CPUs and memory. Sometimes we move the whole database to a bigger, more expensive hardware. We usually call the relationship between the size of the database and the hardware requirements “vertical scalability” and my experience is that it can be very expensive. It’s called vertical scalability because we are scaling up by adding resources to a single node. It has been my experience that noSQL solutions can scale horizontally, via “sharding,” where each node processes and stores only a portion of the huge data set. You can think of this approach as a distributed database, which allows companies to collect huge amounts of data just by adding new nodes to the cluster. While scalability is essential, performance is even more important.

Performance is another issue that affects the relational database model. Today, users expect everything to work instantly. Most users feel that it’s unacceptable for a web page to take three seconds to load.

Software developers should know that joining two (or more) tables in relational databases can adversely impact performance. In fact, the more database “joins,” the harder the database has to work, which leads to poor performance. NoSQL solutions do not require table joins. Instead, they use either embedded documents, like mongoDB, or columns such as Hbase.

NoSQL is commonly used for special purposes such as modeling graphs—like a family tree—or displaying connections on Facebook. Relational databases can be used for the purpose of modeling graph relationships, but it will be much more complicated to model and code. On the other hand, NoSQL database Neo4J, for example, is commonly used to store and process graph relationships. Other unstructured data storage needs can sometimes be best handled by a document-based solution such as MongoDB. Many of these technologies are new and specialized, but you will find that they are often exactly what you need to work with in order to handle large amounts of data in an efficient way. It is common these days to architecture application that talks to different data storage based on a specific user case. For example, a product catalog for an e-commerce site can be stored in MongoDB, session data in Redis, while all transactions are in a MySQL database; we call this “polyglot persistence.” 

About the author

Eugene Dvorkin's picture
Eugene Dvorkin

Eugene Dvorkin is a software engineer with more than eighteen years of experience in the field. Currently, he is working for WebMD where he creates real-time event processing applications using MongoDB and big data architecture. He is also building a DevOps culture around the project. He spends his free time researching the latest software trends and tools, attending technology meetups, or taking pictures with his beloved Nikon camera. He learns by doing and loves to share his discovering with other engineers.