For those who don’t have time to read the full article, here’s a “too long; didn’t read” summary:
- NoSQL databases like MongoDB offer superior benefits when dealing with big data over SQL because of their flexible schema requirements
- However, SQL databases have been traditionally favored by most data managers for data analysis. Especially because most BI tools (e.g. Looker) will not let you query NoSQL databases.
- Nonetheless, people often need to do analysis on MongoDB data.
- Historically, it has been common to replicate MongoDB data into a SQL database for analytics using a process like ETL. Companies like Holistics and Fivetran do this. This practice is effective but costly and requires a lot of additional infrastructure.
- A new best practice is emerging in this space that gives analysts the best of both worlds – performing native MongoDB analytics with BI products like the one we created at Knowi.
There has been a lot of debate recently about whether you should perform data analysis directly on a MongoDB instance like you would with a traditional SQL based relational database. This has led to much stress for those trying to decide between SQL and NoSQL databases. For those with this dilemma, best practices should be defined to simplify the decision. Unfortunately, if you’re an analyst tasked with building a dashboard on data from a NoSQL database, you may not have much of a choice – you have to make it work, whether or not it’s best practice. Either way, you should be familiar with the options available when you are considering doing analytics on MongoDB.
In this article, we’ll briefly cover the distinct advantages of both document-based NoSQL databases like MongoDB as well as traditional Relational Database Management Systems (RDBMSs – we’ll call these relational databases or SQL databases). This will help us better understand the cause for concern about doing analytics on MongoDB. Then we’ll have a look at what new options are emerging for doing data analysis directly on MongoDB.
Want to jump in and get your hands dirty instead of reading an article? Check out the live demo on our MongoDB analytics page.
Advantages of SQL Databases
To summarize them in one sentence: traditional relational databases are highly structured databases consisting of normalized data tables which are related to each other by keys (ID fields) and may be queried using SQL, a structured query language that is nearly syntactically identical across various RDBMS providers.
What are the benefits of relational databases?
- They are highly structured – this makes them perfectly suited for applications like financial data, which should be highly normalized, consistent and clean.
- They are memory-efficient – since relational data is typically highly normalized, storage is not wasted on duplicated data.
- Data retrieval is easy – with SQL as a flexible, extensive query interface that is standard across suppliers, it is easy to query, transform, and join data from relational tables.
Advantages of MongoDB
NoSQL databases, on the other hand, are essentially any databases that don’t follow a relational database structure. In particular, document-based NoSQL databases like MongoDB consist of a series of documents which in turn contain JSON data objects with potentially deep nested data structures that aren’t necessarily consistent across objects. MongoDB is one of the most popular NoSQL databases and is used by millions of developers. What advantages does MongoDB have over a traditional relational database?
- Schemas are flexible – since MongoDB uses JSON objects to store data, an entity can contain lengthy, complex nested fields and different fields may be captured for different object instances.
- Speed – indexing on this document structure generally provides much quicker data access than is typical of relational databases.
- Scalability – MongoDB is built to handle big data efficiently by partitioning the database into shards.
Analytics: SQL Databases vs. MongoDB
How do these two types of databases compare when it comes to data analytics? As we alluded to earlier, analytics is typically seen as SQL database territory (e.g. MySQL Reporting and Analytics). Because SQL is so flexible, analysts can retrieve, filter, join and aggregate relational data with ease. As a result, visualizing and reporting on relational data also comes naturally.
Analytics on NoSQL databases, however, is a less well-developed area (which is consistent with NoSQL databases being a younger field in general). While the capability does exist, the query generation procedure is generally considered less clean than the SQL counterpart, partly due to the less structured nature of NoSQL databases. For example, joining across “tables” has historically not been supported for NoSQL database analytics. Further, the query language is not consistent across NoSQL databases in the way that SQL is consistent across relational databases.
What if I need to do Analytics on MongoDB?
So it appears that SQL Databases are better for analytics. But of course there are many other considerations when choosing a way to store your data, so maybe you choose MongoDB for the flexible schema, even though you will still need to do analytics on your data. Or maybe you don’t get to choose – your company already has a MongoDB database in use and you need to analyze its data. What are your options for analytics in this case?
One option is to replicate your MongoDB data in an SQL database, then use the relational version of the data for analytics. Many people go this route initially. This gives you all the advantages of doing analytics on a SQL database, and you may even be able to perform near real time analytics, depending on the ETL procedure your company is using to replicate data from MongoDB to the SQL database. Some companies like Holistics are doing good work in this area. Others like Fivetran are based entirely around helping companies move their data into a data warehouse (e.g. Snowflake) for this purpose. However, the downside of this approach is that it requires additional hardware resources, tedious configuration of the ETL process and maintenance of a whole new environment. What should have been the job of a data analyst is thus expanded to require support from several new groups, including data architects and data engineers. It is common for a company to share data to a replicated database from the original MongoDB data store, but what if you didn’t have to?
Some companies like Dremio are attempting to solve this problem by building a translation system that lets you put in SQL queries and have them translated into a MongoDB query. This is great for eliminating the need for ETL but runs into problems when we are trying to do things like cross-database joins.
On the other hand, at Knowi, we are taking a completely new approach. Keep reading for our secret sauce for doing real-time analytics on MongoDB.
At Knowi, we developed something completely different to solve this problem. Our secret sauce was building on Data Virtualization.
We connect directly to MongoDB but provide a high-level user interface for users to generate queries and manipulate data. You can think of it as data virtualization on steroids. Users build queries with point-and-click software or by using native MongoDB queries. Users can then further manipulate the data with an SQL based language called Cloud9QL. This is all done in real-time on the live MongoDB instance, so there’s no need to manage an ETL procedure to share and store the data in a SQL database. In other words, a Knowi customer can use the application to filter, join, and perform aggregation on real-time data from MongoDB just like the customer would do with a SQL database, but without actually using a SQL database.
We took this one step further and fully integrated this MongoDB query functionality into our business intelligence platform, so users can seamlessly perform real-time data analysis, data science, and data visualization on top of the MongoDB data. This means that your company can go from MongoDB to business intelligence application in just a few clicks.
Obviously, we like our solution the best, but there are a lot of good solution providers out there coming up with innovative ways to get analytics at scale on MongoDB and you should pick the one that is right for your business.
In the modern, dynamic data environment, we are constantly required to do data analysis and data science on a diverse variety of database types. Some of these are more amenable to analytics than others. MongoDB is not the first and won’t be the last database to cause confusion when it comes to analytics. Luckily, there are a lot of great innovative companies out there trying to solve this dilemma by standardizing and streamlining the process of analytics on various data sources. We at Knowi are one of a few playing in this arena. No matter which solution you pick, you no longer have to choose between MongoDB and analytics–you can have both.