How to analyze and filter databanks with MongoDB aggregation
Aggregation in MongoDB is a valuable tool for analyzing and filtering databases. The pipeline system makes it possible to specify queries, allowing for highly customized outputs.
What is aggregation in MongoDB?
MongoDB is a non-relational and document-oriented database that is designed for use with large and diverse amounts of data. By forgoing rigid tables and using techniques like sharding (storing data on different nodes), the NoSQL solution can scale horizontally while remaining highly flexible and resilient to failures.
Documents in the binary JSON format BSON are bundled in collections and can be queried and edited using the MongoDB Query Language (MQL). Even though this language offers many options, it’s not suitable (or perhaps not suitable enough) for data analysis. That’s why MongoDB provides aggregation.
In computer science, this term refers to various processes. In MongoDB, aggregation refers to the analysis and summarizing of data using various operation to produce a single and clear result. During this process, data from one or more documents is analyzed and filtered according to user-defined factors.
In the following sections, we not only look at the possibilities that MongoDB aggregation offers for comprehensive data analysis, but also provide examples of how you can use the aggregate ( )
method with a database management system.
What do I need for MongoDB aggregation?
There are only a few requirements for using aggregation in MongoDB. The method is executed in the shell and works according to logical rules that you can tailor to meet the needs of your analysis.
To use aggregation in Mongo DB, you need to have MongoDB already installed on your computer. If it isn’t, you can find out how to download, install and run the database in our comprehensive MongoDB tutorial.
You should also use a powerful firewall and make sure your database is set up according to all current security standards. To run aggregation in MongoDB, you need to have administration rights.
The database works across all platforms, so the steps described below apply to all operating systems.
What is the pipeline in the MongoDB aggregation framework?
In MongoDB, you can carry out simple searches or queries, with the database immediately displaying the results. However, this method is very limited, as it can only display results that already exist within the stored documents. This type of query is not intended for in-depth analysis, recurring patterns or for deriving further information.
Sometimes different sources within a database need to be taken into account in order to draw meaningful conclusions. MongoDB aggregation is used for situations like these. To achieve such results, the aggregate ( )
method uses pipelines.
Role of the pipeline
Aggregation pipelines in MongoDB are processes in which existing data is analyzed and filtered with the help of various steps in order to display the result users are looking for. These steps are referred to as stages. Depending on the requirements, one or more stages can be initiated. These are executed one after the other and change your original input so that the output (the information you are looking for) can be displayed at the end.
While the input is made up of numerous pieces of data, the output (i.e., the end result) is singular. We’ll explain the different stages of MongoDB aggregation later on in this section.
Syntax of the MongoDB aggregation pipeline
First, it’s worth taking a brief look at the syntax of aggregation in MongoDB. The method is always structured according to the same format and can be adapted to your specific requirements. The basic structure looks like this:
db.collection_name.aggregate ( pipeline, options )
shellHere, collection_name
is the name of the collection in question. The stages of MongoDB aggregation are listed under pipeline
. options
can be used for further optional parameters that define the output.
Pipeline stages
There are numerous stages for the aggregation pipeline in MongoDB. Most of them can be used multiple times within a pipeline. It would go beyond the scope of this article to list all the options here, especially as some are only required for very specific operations. However, to give you an idea of the stages, we’ll list a few of the most frequently used ones here:
$count
: This stage gives you an indication of how many BSON documents have been considered for the stage or stages in the pipeline.$group
: This stage sorts and bundles documents according to certain parameters.$limit
: Limits the number of documents passed to the next stage in the pipeline.$match
: With the $match stage, you limit the documents that are used for the following stage.$out
: This stage is used to include the results of the MongoDB aggregation in the collection. This stage can only be used at the end of a pipeline.$project
: Use $project to select specific fields from a collection.$skip
: This stage ignores a certain number of documents. You can specify this with an option.$sort
: This operation sorts the documents in the user’s collection. However, the documents are not changed beyond this.$unset
: $unset excludes certain fields. It does the opposite of what $project does.
An example of aggregation in MongoDB
To help you better understand how aggregation in MongoDB works, we’ll show you some examples of different stages and how to use them. To use MongoDB aggregation, open the shell as an administrator. Normally, a test database will be displayed first. If you want to use a different database, use the use
command.
For this example, let’s imagine a database that contains the data of customers who have purchased a specific product. To keep things simple, this database has just ten documents, which are all structured the same:
{
"name" : "Smith",
"city" : "Los Angeles",
"country" : "United States",
"quantity" : 14
}
shellThe following information about the customers has been included: their name, place of residence, country and the number of products they have purchased.
If you want to try aggregation in MongoDB, you can use the method insertMany ( )
to add all documents with customer data to the collection named “customers”:
db.customers.insertMany ( [
{ "name" : "Smith", "city" : "Los Angeles", "country" : "United States", "quantity" : 14 },
{ "name" : "Meyer", "city" : "Hamburg", "country" : "Germany", "quantity" : 26 },
{ "name" : "Lee", "city" : "Birmingham", "country" : "England", "quantity" : 5 },
{ "name" : "Rodriguez", "city" : "Madrid", "country" : "Spain", "quantity" : 19 },
{ "name" : "Nowak", "city" : "Krakow", "country" : "Poland", "quantity" : 13 },
{ "name" : "Rossi", "city" : "Milano", "country" : "Italy", "quantity" : 10 },
{ "name" : "Arslan", "city" : "Ankara", "country" : "Turkey", "quantity" : 18 },
{ "name" : "Martin", "city" : "Lyon", "country" : "France", "quantity" : 9 },
{ "name" : "Mancini", "city" : "Rome", "country" : "Italy", "quantity" : 21 },
{ "name" : "Schulz", "city" : "Munich", "country" : "Germany", "quantity" : 2 }
] )
shellA list of object IDs for each individual document will be displayed.
How to use $match
To illustrate the possibilities of aggregation in MongoDB, we’ll first apply the $match stage to our “customers” collection. Without additional parameters, this would simply output the complete list of customer data listed above.
In the following example, however, we’ve instructed it to only show us customers from Italy. Here’s the command:
db.customers.aggregate ( [
{ $match : { "country" : "Italy" } }
] )
shellYou’ll now only be shown the object IDs and information of the two customers from Italy.
Use $sort for a better overview
If you want to organize your customer database, you can use the $sort stage. In the following example, we instruct the system to sort all customer data according to the number of units purchased, starting with the highest number. The input looks like this:
db.customers.aggregate ( [
{ $sort : { "quantity" : -1 } }
] )
shellLimit the output with $project
With the stages used so far, you’ll see that the output is relatively extensive. For example, in addition to the actual information within the documents, the object ID is also always output. You can use $project in the MongoDB aggregation pipeline to determine which information should be output. To do this, we set the value 1 for required fields and 0 for fields that don’t need to be included in the output. In our example, we only want to see the customer name and the number of products purchased. To do this, we enter the following:
db.customers.aggregate ( [
{ $project : { _id : 0, name : 1, city : 0, country : 0, quantity : 1 } }
] )
shellCombine multiple stages with aggregation in MongoDB
MongoDB aggregation also gives you the option of applying several stages in succession. These are then run through one after the other, and at the end there is an output that takes all the desired parameters into account. For example, if you only want to display the names and purchases of U.S. customers in descending order, you can use the stages described above as follows:
db.customers.aggregate ( [
{ $match : { "country" : "United States" } }
{ $project : { _id : 0, name : 1, city : 0, country : 0, quantity : 1 } }
{ $sort : { "quantity" : -1 } }
] )
shellWant to find out more about MongoDB? We’ve got a lot more information in our Digital Guide. For example, you can read about how the list databases command works or how you can use MongoDB Sort to specify the order of your data output.