MongoDB Analytics: Solutions & Best Practices

MongoDB Analytics: Solutions & Best Practices

If you need analytics from your data stored in MongoDB, this MongoDB analytics guide is for you. You will explore: 

To ELT or Not?

No. Don’t do it.

An all-too-common approach is to use a Data transporter (like FiveTran, Stitch etc) to move the data into a Data warehouse (Snowflake, Redshift etc) for Analytics. With this, your MongoDB data is replicated into a central data warehouse where data analysts versed with SQL can use it with traditional SQL based BI tools.

Here’s why you shouldn’t do it:

  • Balloons up cost: Raw data is moved, stored and computed that you have to pay for, particularly as data sizes grow. 
  • The Data warehouse becomes a dumping ground for meaningless raw data, most of which you might not need, given that the data is ingested via the oplog.  
  • Converts MongoDB data into a SQL form. For nested objects and arrays, dynamic fields, this will cause innumerable problems. 
  • Adds complexity and operational overhead, with multiple points of failure, processes, tools and people in between.
  • Not real time. There’ll always be a lag between your operational source and your data warehouse.

That leaves running analytics on your MongoDB directly. Before we get into solutions, let’s first cover best practices to set up your MongoDB cluster in a way that doesn’t impact your operational database.

MongoDB Analytics Setup Best Practices

  1. At the very least, always point your analytics client to a secondary node instead of your primary node from your analytics application, so that operational performance is not affected. 

  2. Configure a hidden replicaset member. This prevents this replicaset member from becoming a primary. A hidden member maintains a copy of the primary’s data, but invisible to client applications and can thus be used for read-only analytics purposes.

    To connect to a hidden member, use the directConnect option:

    Example: mongodb://user:password@192.168.0.1:27017/?authSource=db&directConnection=True

  3. If you are using MongoDB Atlas with an M10 or larger cluster, set up an Analytics Node. Analytics Nodes isolate queries on read-only nodes that do not contend with operational workloads. 

    To connect, use a connection string like below: 

    mongodb+srv://<USERNAME>:<PASSWORD>@foo-q8x1v.mycluster.com/test?readPreference=secondary&readPreferenceTags=nodeType:ANALYTICS

Choosing a MongoDB Analytics Solution

1. MongoDB BI Connector + Your Existing BI tool

The BI connector provides a bridge between semi-structured data in Mongo to traditional BI tools that are designed for tabular rows and columns. The BI Connector uses a MySQL wire protocol to convert the data. 

If you are using Atlas with M10 or a larger cluster, you can enable BI Connector within Atlas

For self hosted MongoDB clusters, the BI connector is available for download for customers with a MongoDB Enterprise Advanced subscription. 

The BI connector is run as a separate process (mongosqld) with a schema DRDL file passed in.  

If your organization is locked into existing SQL based BI tools, this approach is worth considering. 

However, there are some downsides to this approach to be aware of:  

a) The conversion of semi-structured data into a relational format is inherently problematic. 

b) Difficult to troubleshoot due to the translation.

c) Nested objects and arrays are particularly problematic and has performance implications.

d) Not suited for dynamic fields without schema.  

2. MongoDB Charts

MongoDB Charts provides charting and dashboard capabilities on Atlas. MongoDB built charts due to frustrations from their customers with limitations with the BI connector approach to create a native analytics solution on MongoDB. 

Charts is only available on MongoDB Atlas. It’s free to use, for the most part.  

If you are in need of basic visualization capabilities and all your data is in Mongo and it’s hosted on Atlas, this might be the option for you.  

3. Knowi

If MongoDB BI Connector and MongoDB Charts won’t cut it, Knowi might be your best MongoDB analytics solution. 

a) It provides native analytics on MongoDB, has a native Mongo query generator as well as the ability to write your own Mongo queries. 

b) It provides a business user-friendly Dataset-as-a-service concept where the results from Mongo queries can be visualized, analyzed and embedded. 

c) For non technical users, it provides a natural language on top of the data for users to be able to ask plain English questions. 

d) In addition to full BI and embed capabilities, it offers a powerful data engineering layer that allows you to join across collections, as well as other NoSQL and SQL datasources, along with data modeling, document level security and governance.  

Knowi provides a playground for interacting with a live MongoDB database here

Dataset-as-a-service

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email
Written by

More To Explore