An Overview on MongoDB
What is MongoDB?
MongoDB is a NoSQL DB that stores data in a document-oriented format (BSON). Released in February of 2009 and developed in C++, it was designed for scalability and handling large amounts of unstructured data. It is a semi-structured database that allows users to seamlessly access data from a myriad of programming languages and other data tools. MongoDB is considered a “NoSQL database” meaning that data is not stored in tables, but in documents. These documents are grouped into MongoDB’s ‘collections’ where they can be accessed through APIs. This novel system was a major difference between MongoDB and its competitors when Eliot Horowitz and Dwight Merriman founded MongoDB.
On their website, MongoDB writes, “We empower innovators to create, transform, and disrupt industries by unleashing the power of software and data.”(MongoDB) This idea stemmed from the many issues Horowitz encountered when trying to incorporate his more categorical structure into relational databases. The inefficiency and incoherence of many databases at the time prompted Horowitz to found his company and revolutionize the database industry.
In his words, “People develop classes, they develop rich structures. What they do then is take those rich structures and attach some object-relational mapping to those structures, and try to store it in a relational database.”(Horowitz) Today, MongoDB is used by major corporations including Adobe, Verizon, Lyft, and Twitch, just to name a few.
How does MongoDB Work?
MongoDB is unique in how it is structured as well as how data is stored within it. Read on to explore some basic concepts on how it organizes its data and backend components.
NoSQL vs. SQL Databases
“SQL” which stands for “Structured Query Language” is a type of DSL, or Domain Specific Language. Users can store data into a SQL-based relational database and use SQL to query and transform their data. Databases which do not use SQL are known as “NoSQL” or “non-relational”. The word “relational” refers to a model in which data are stored in tables, and key, identifying fields from each table are mapped in relation to one another. The term “NoSQL” is misleading as NoSQL databases (aka “Not only SQL”), including MongoDB, have querying functions, even if they do not function in the same way as SQL.
Clustering is a technique for grouping data into individual clusters with similar data points. It uses intrinsics characteristics of data such as datatype to as metrics for similarity. The concept of cluster analysis exists in MongoDB, just with a slightly different functionality. MongoDB has a feature called “replica sets” which connects multiple nodes and will identify redundant data, anomalies, and many other patterns. Sharding, which was also mentioned in the scalability section, is a key tool MongoDB uses to support their horizontal scalability. Shard keys, unique identifiers for each shard, ensure that data on different servers remains connected to the primary cluster.
MongoDB is a database which employs semi-structured data through BSON documents. Unlike relational data—which requires data to be structured in cells, rows, and columns—semi-structured data does not have a fixed schema. MongoDB’s document-oriented system relies heavily on nesting within a document to organize data. These documents support unstructured data (i.e. videos, audio files, and emails) while clearly identifying fields for easy retrieval.
To summarize, semi-structured data is an in-between of unstructured and structured data where there is structure without a tabular format. On one hand, semi-structured data allows for greater flexibility and freedom with how one chooses to order their data. Data can be integrated from multiple datasets without having to fit one format. On the other hand, without proper understanding of how to nest objects, data can easily be stored suboptimally within a collection causing an organizational headache.
What are the Advantages of MongoDB? What is MongoDB Used For?
Supporting semi-structured data allows users to import various types of unstructured datasets and reorganize their data into identifiable groups. Many of MongoDB’s customers store video files (NOD Games, Mediastream, etc.) while many others need to store audio files (Electronic Arts, Yousician, etc.). This type of data can not be stored in a relational database management system. In addition, the fact that many web APIs, including RESTful and HTTP-based APIs, already use JSON as their data format gives MongoDB an edge over many SQL databases.
MongoDB supports both the import and export of JSON data so HTTP-based APIs have no issue transferring data with MongoDB. Overall, MongoDB is able to combine disparate data stored across programs into one centralized database using its document-oriented system.
MongoDB is designed to be able to work on multiple servers and handle large loads of data. MongoDB uses a horizontal scaling, also known as a scale-out model, to handle their data. This spreads the load of the dataset evenly across multiple servers to prevent crashing or other complications. MongoDB also supports functions such as memory mapping, caching, and indexing—all of which help servers run smoothly. It uses a tool called sharding which breaks up clusters of data into subsets known as shards. These shards are distributed across servers and nodes, scaling horizontally.
Geospatial Indexes and Operators
MongoDB supports the creation of geospatial indexes, specifically the 2d index and the 2d sphere index. These indexes optimize spatial queries and enable fast retrieval of geospatial data. The 2d index is suitable for flat (Euclidean) coordinate systems, while the 2d sphere index supports spherical coordinate systems, making it ideal for Earth-like models. Using MongoDB, these indexes can be queried using the query optimization tool which selects which geospatial index to use and improves query performance.
MongoDB’s aggregation framework includes a set of geospatial aggregation operators. These operators enable complex analysis and calculations on geospatial data, such as finding the centroid of a polygon, measuring distances between points, or performing geometric calculations on shapes. One of MongoDB’s most notable geospatial aggregation operators is ‘$geoNear’ which calculates the distance between points and sorts by proximity.
Users can export files from MongoDB by using the command-line tool “mongoexport”. “mongoexport” will return a JSON or CSV file from the document data. Although some tools can retrieve data directly from MongoDB, in order to analyze the data many tools require the data to be exported directly into a file. One thing to be careful of is when data is exported to a CSV, MongoDB will have to modify the document formatting which may result in a flawed copy.
The “mongoexport” command can also be used to share documents to different clusters and projects. Once retrieved, data from MongoDB can be inputted into data warehouses, such as Knowi, to join data and further analyze the database. Knowi also has a function to directly add data from MongoDB by inputting the host or IP address, the database name, and password.
Additionally, “mongoexport” can sort or remove rows and columns and perform many other changes to the data to better fit the user’s needs. This limits any problems when the database has to be queried or transformed into a graph or chart.
MongoDB Realm (Mobile Applications)
Many mobile applications require a database which can synchronize data offline and cache that data for a more efficient user experience. Apps need to have data such as user profiles on hand and stored in the proper location for their applications to function. Developers created MongoDB Realm to meet the needs of these apps.
Realm’s efficiency and integration across programming languages makes it a great fit for this type of work. Backend programs are often overlooked, but they are integral to the functionality of any application. It is being used by 7-Eleven, Imgur, and many more companies. Srikanth Gandra had this to say about Realm, “We’ve heard good feedback from store managers. They can start using devices immediately, rather than waiting minutes to download the data on initial startup, like they used to. Data accuracy, especially around inventory when sales happen or shipments arrive, has really improved.”
Interestingly enough, some video gaming startups use Realm with some success. It can be difficult to find a platform that can handle the traffic of successful mobile applications; MongoDB’s Realm provides a solution to many issues that developers face and has proved successful in all kinds of applications.
MongoDB Deployment Options: On-premise vs. Cloud
Many platforms are limited to either on-premise data storage or storage in the cloud. On premise essentially means that the data is stored on local servers, while the Cloud is a term for the servers a company hosts.
MongoDB has the ability to do both; “MongoDB Atlas” is a subscription tool which allows users to host their data on cloud servers while the free version of MongoDB hosts data on-premise. On-premise hosting requires users to individually back-up their data, but users have the benefit of accessing files offline. The cloud has the benefit of allowing users to always retrieve their data, provided they still have access to their account.
More on MongoDB Atlas
What is MongoDB Atlas and how does it work? To augment the explanation above, MongoDB Atlas is a cloud database which fully manages users’ data using a suite of data services. Atlas is known as a multi-cloud database as it uses AWS, Azure, and Google Cloud services.
MongoDB also calls the tool a “multi-region” application as users can access the tool in most places with a connection to the internet. Atlas will store clusters on these cloud services and it will retrieve them whenever a user reopens the webpage. Because the data is being stored on the cloud, Atlas can monitor the data for the user and generate performance reports and optimization recommendations.
Other than Atlas, there are certainly many NoSQL cloud databases on the market today. A few notable examples are Amazon’s DocumentDB, Azure’s CosmosDB, and Couchbase’s Capella.
DocumentDB is a JSON document database run by Amazon Inc. DocumentDB is highly compatible with MongoDB as it is a NoSQL database and also runs on a document system. It was built as an improved version of previous Amazon databases with much newer technology. One of these improvements is DocumentDB’s auto-growth capabilities which scale storage capacity to meet user demands.
CosmosDB is a NoSQL, multi-model database from Microsoft Azure. Despite being a NoSQL database, CosmosDB has a unique way to use SQL through a JSON dialect, although it does not perform as well as SQL databases. It is available in over 54 regions worldwide and has features including turnkey global distribution to properly support international collaboration.
Finally, Capella is a NoSQL “Database-as-a-Service” tool from Couchbase. Capella, like CosmosDB, incorporates some of the strengths of a relational database and still maintains its performance capabilities. It supports ACID transactions and stores data in shards, both of which limit the impact of errors or bugs. Like Atlas, Capella uses JSON documents to store its data.
Overall, the primary differentiating factor between databases is whether they use SQL or not. Beyond that, most formatting between NoSQL databases is analogous.
How to Analyze MongoDB Data
Obviously, staring at a raw collection data isn’t the best way to make sense of data. In order to take action on data, finding the right visualization and reporting tool is crucial. Here are a number of popular analytics tool options for MongoDB.
Native MongoDB Visualization Tools
These tools do not require an ETL tool or additional connector to analyze MongoDB data.
MongoDB has a native tool called MongoDB Charts. Charts is MongoDB’s visualization tool which connects to a MongoDB collection and creates a dashboard with various charts. MongoDB Charts, like many other visualization tools, requires specific formatting to properly map out data.
Ensuring all of a document’s data is correctly organized is necessary for Charts to work. Charts can only create bar and line graphs, scatterplot, number charts, and other basic visualizations. Furthermore, Charts cannot easily facilitate joining between separate collections which complicates the process of comparing data.
Within Knowi, users can access advanced visualization options without an additional tool for ETL. Knowi can join SQL and NoSQL databases into one and combine data from various data sources with ease. This feature allows companies with multiple data sources to compare disparate datasets without any extra fuss.
The tool also supports MQL, Mongo Query Language, natively allowing users to query in whichever language they please. A con to Knowi is that it has somewhat limited ML capabilities. If ML features are a must, another option in this list might be a better fit.
MongoDB Visualization Tools that require ETL
These tools require an ETL tool or additional connector to analyze MongoDB data.
Tableau Software, or simply Tableau, is strictly a data-visualization company with a focus on aesthetics and simplicity. Tableau tries to make the process of creating visualizations as easy as possible by minimizing gratuitous features. The tool was specifically designed for businesses and has a long-standing reputation in many industries.
One consideration to keep in mind is that Tableau is fairly pricey compared to other options on the market. It also has some issues with interfaces such as mobile applications and other tools it was not primarily designed for.
Looker has integrated support for SQL making it easy to use with MySQL, PostgreSQL, OracleDB, and other relational databases. LookML, Looker’s modeling language, groups data into “views” which are analogous to tables in relational databases. Looker has a somewhat steeper learning curve compared to other analytics tools on this list, but many consider it a great option once that barrier has passed.
All of these tools have their own use cases where they are best fit for the job so make sure to thoroughly consider the different options. For a more in-depth look at analytics tools for MongoDB, take a look at this blog as it also takes a deeper dive into the options mentioned above.
Pros and Cons of MongoDB Reporting & Data Visualization Tools Compared
|Knowi||Supports MongoDB and MongoDB Atlas nativelySeamlessly supports NoSQL/SQL data blendingSupports MongoDB Query Language natively||Limited ML capabilitiesMay require query optimization with direct connections|
|MongoDB Charts||Already integrated with MongoDB dataDrag and drop builder||Only supports MongoDB dataDoesn’t easily support cross-collection joins|
|Looker||Lots of visualization optionsSupports git version control||Requires MongoDB connector and JDBC driver to pull data inRequires learning LookML|
|Tableau||Drag and drop interfaceChart types are visually appealing||Requires MongoDB connector and driverRequires relational data|
MongoDB Interface in Knowi
To explore the MongoDB interface on the Knowi website click here. Below is a quick tutorial of how the process works and the specific features Knowi offers.
To connect MongoDB to Knowi, enter the datasource name, hostname/IP address, database name, and password. In the screenshot above, the information for the demonstration has already been entered. Now, just create a new query and select Mongo Database to begin querying.
Here is an example of a query in MQL on this NBA dataset. AI CodeGen, the button at the top, is powered by OpenAI and will explain the query as well as find any issues.
Knowi also has a drag and drop feature visible on the left side of the UI. After selecting a given collection, simply expand the dataset to access the collection’s documents and retrieve a given variable. Knowi will write the query in MQL once the desired metrics and groupings are put in.
MongoDB was founded with the objective of creating a better database for programmers and developers. Today, MongoDB plays an integral role in managing unstructured data for thousands of companies as a leading NoSQL database.
MongoDB’s scalability makes it great for any company in need of fast results on large datasets. Its tools such as Atlas and Compass also limit the load on local servers by using cloud servers to host their data in an easily accessible form. MongoDB has spread fast and leading corporations such as Adobe and Lyft are already using the platform to host all things data. Its scalability and versatility give MongoDB a decisive edge over many other NoSQL databases.
In conclusion, MongoDB is a great option for both on-premise and cloud work, and with the help of Knowi, companies can seamlessly query their data and turn data into action.