Blog
17

The Aggregation Framework in MongoDB

An introductory tutorial illustrating the flexible and versatile, multi-stage pipeline provided by the Aggregation Framework in MongoDB.

  • Pradyumn Sharma
  • October 10, 2017

Tags:

MongoDB’s find() method is useful for simple queries from a collection, but it does not have an equivalent to the “group by” clause of SQL, for aggregating data.
However, MongoDB provides an aggregation framework that is far more powerful and flexible a way to aggregate and analyze data. The Aggregation Framework is essentially a multi-stage pipeline of data processing operations. Click this for more info https://teamtraceur.com/

Pipeline is a common concept in many platforms, such as Linux. For example, consider the following statement in Linux:

cat ‘filename’ | more

The output of one command (in this case, cat) becomes the input to the next command (more) in a pipeline. No command is aware of its predecessor or successor.

 
Let’s build an example to illustrate some of the features of MongoDB’s aggregation framework, in small, incremental steps.
 
The Dataset
We’ll use a public domain dataset of the ZIP codes in USA, provided by MongoDB on their website, available for download here. The dataset looks like this:

{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA" }
{ "_id" : "01002", "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA" }
{ "_id" : "01005", "city" : "BARRE", "loc" : [ -72.10835400000001, 42.409698 ], "pop" : 4546, "state" : "MA" }
{ "_id" : "01007", "city" : "BELCHERTOWN", "loc" : [ -72.41095300000001, 42.275103 ], "pop" : 10579, "state" : "MA" }
{ "_id" : "01008", "city" : "BLANDFORD", "loc" : [ -72.936114, 42.182949 ], "pop" : 1240, "state" : "MA" }
{ "_id" : "01010", "city" : "BRIMFIELD", "loc" : [ -72.188455, 42.116543 ], "pop" : 3706, "state" : "MA" }
{ "_id" : "01011", "city" : "CHESTER", "loc" : [ -72.988761, 42.279421 ], "pop" : 1688, "state" : "MA" }
{ "_id" : "01012", "city" : "CHESTERFIELD", "loc" : [ -72.833309, 42.38167 ], "pop" : 177, "state" : "MA" }
{ "_id" : "01013", "city" : "CHICOPEE", "loc" : [ -72.607962, 42.162046 ], "pop" : 23396, "state" : "MA" }
{ "_id" : "01020", "city" : "CHICOPEE", "loc" : [ -72.576142, 42.176443 ], "pop" : 31495, "state" : "MA" }
{ "_id" : "01022", "city" : "WESTOVER AFB", "loc" : [ -72.558657, 42.196672 ], "pop" : 1764, "state" : "MA" }
{ "_id" : "01026", "city" : "CUMMINGTON", "loc" : [ -72.905767, 42.435296 ], "pop" : 1484, "state" : "MA" }
{ "_id" : "01027", "city" : "MOUNT TOM", "loc" : [ -72.67992099999999, 42.264319 ], "pop" : 16864, "state" : "MA" }

 
Download the dataset to your computer, and then use the mongoimport command to load the data in your database (replacing the path of the file with an appropriate value):

mongoimport –db tutorial –collection zips –file “c:\downloads\zips.json”

If the command is entered correctly by you, and is successfully executed, you should get the messages as shown below:

2017-07-30T17:10:45.419+0530 connected to: localhost
2017-07-30T17:10:48.036+0530 imported 29353 documents

 
Accessing the Dataset in MongoDB
In the “mongo” client window, verify that the dataset is properly imported, by issuing the following command:

db.zips.count()

The output should be 29353.
Have a look at the data, by issuing the following command:

db.zips.count()

The output should be like this:

> db.zips.find()
{ "_id" : "01001", "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA" }
{ "_id" : "01002", "city" : "CUSHMAN", "loc" : [ -72.51565, 42.377017 ], "pop" : 36963, "state" : "MA" }
{ "_id" : "01005", "city" : "BARRE", "loc" : [ -72.108354, 42.409698 ], "pop" : 4546, "state" : "MA" }
{ "_id" : "01007", "city" : "BELCHERTOWN", "loc" : [ -72.410953, 42.275103 ], "pop" : 10579, "state" : "MA" }
{ "_id" : "01008", "city" : "BLANDFORD", "loc" : [ -72.936114, 42.182949 ], "pop" : 1240, "state" : "MA" }
{ "_id" : "01010", "city" : "BRIMFIELD", "loc" : [ -72.188455, 42.116543 ], "pop" : 3706, "state" : "MA" }
{ "_id" : "01011", "city" : "CHESTER", "loc" : [ -72.988761, 42.279421 ], "pop" : 1688, "state" : "MA" }
{ "_id" : "01012", "city" : "CHESTERFIELD", "loc" : [ -72.833309, 42.38167 ], "pop" : 177, "state" : "MA" }
{ "_id" : "01013", "city" : "CHICOPEE", "loc" : [ -72.607962, 42.162046 ], "pop" : 23396, "state" : "MA" }
{ "_id" : "01020", "city" : "CHICOPEE", "loc" : [ -72.576142, 42.176443 ], "pop" : 31495, "state" : "MA" }
{ "_id" : "01022", "city" : "WESTOVER AFB", "loc" : [ -72.558657, 42.196672 ], "pop" : 1764, "state" : "MA" }
{ "_id" : "01026", "city" : "CUMMINGTON", "loc" : [ -72.905767, 42.435296 ], "pop" : 1484, "state" : "MA" }
{ "_id" : "01027", "city" : "MOUNT TOM", "loc" : [ -72.679921, 42.264319 ], "pop" : 16864, "state" : "MA" }
{ "_id" : "01028", "city" : "EAST LONGMEADOW", "loc" : [ -72.505565, 42.067203 ], "pop" : 13367, "state" : "MA" }
{ "_id" : "01030", "city" : "FEEDING HILLS", "loc" : [ -72.675077, 42.07182 ], "pop" : 11985, "state" : "MA" }
{ "_id" : "01031", "city" : "GILBERTVILLE", "loc" : [ -72.198585, 42.332194 ], "pop" : 2385, "state" : "MA" }
{ "_id" : "01032", "city" : "GOSHEN", "loc" : [ -72.844092, 42.466234 ], "pop" : 122, "state" : "MA" }
{ "_id" : "01033", "city" : "GRANBY", "loc" : [ -72.520001, 42.255704 ], "pop" : 5526, "state" : "MA" }
{ "_id" : "01034", "city" : "TOLLAND", "loc" : [ -72.908793, 42.070234 ], "pop" : 1652, "state" : "MA" }
{ "_id" : "01035", "city" : "HADLEY", "loc" : [ -72.571499, 42.36062 ], "pop" : 4231, "state" : "MA" }

 
Example 1: State-wise total population
Let’s start with a simple example to find out state-wise total population. This is the command we write:

db.zips.aggregate (
	{$group:
		{
			_id: '$state',
			population: {$sum: '$pop'}
		}
	}
)

 
First (a part of) the output, before the explanation:

{ "_id" : "AK", "population" : 544698 }
{ "_id" : "AZ", "population" : 3665228 }
{ "_id" : "UT", "population" : 1722850 }
{ "_id" : "WY", "population" : 453528 }
{ "_id" : "CO", "population" : 3293755 }
{ "_id" : "ID", "population" : 1006749 }
{ "_id" : "OK", "population" : 3145585 }
{ "_id" : "AR", "population" : 2350725 }
{ "_id" : "LA", "population" : 4217595 }
{ "_id" : "NV", "population" : 1201833 }
{ "_id" : "NE", "population" : 1578139 }
{ "_id" : "KS", "population" : 2475285 }
{ "_id" : "MO", "population" : 5110648 }
{ "_id" : "IL", "population" : 11427576 }
{ "_id" : "OH", "population" : 10846517 }
{ "_id" : "KY", "population" : 3675484 }
{ "_id" : "IN", "population" : 5544136 }
{ "_id" : "WI", "population" : 4891769 }
{ "_id" : "HI", "population" : 1108229 }
{ "_id" : "MS", "population" : 2573216 }

 
We invoke the aggregation framework by calling the db.collectionname.aggregate() function. This function takes one or more JSON documents as pipeline parameters. Each JSON document represents one stage in the aggregation pipeline. In the above example, there is only one JSON document ($group). In other words, there is only one stage in this pipeline.
The $group stage is equivalent to the GROUP BY clause in a SQL SELECT query. It groups documents from the input collection based on what you specify as the value of the _id field. In the above example, we are grouping the documents based on the state field of the input document, which needs to be preceded by the $ sign and enclosed within quotes. The _id field is mandatory, and becomes part of the conceptual collection that is the output of the $group stage. Visit  kicrestoration.com.
The additional fields that you specify in the $group stage become part of the output of this stage. In the above example, we store the sum of pop field from the input collection (specified as ‘$pop’) in a new field called population.
 
Example 2: City-wise total population
Let us now query the collection for total population by cities. Since some city names exist in more than one state, we cannot use the city field alone to query city-wise total population. We now use a compound expression, consisting of city and state fields from the zips collection, in the form of a nested JSON document as the value of the _id field. Visit sandiegodowntown.com.
Here is the query:

db.zips.aggregate (
	{$group:
		{
			_id: {city: '$city', state: '$state'},
			population: {$sum: '$pop'}
		}
	}
)

 
And here is a part of the output:

{ "_id" : { "city" : "POINT BAKER", "state" : "AK" }, "population" : 426 }
{ "_id" : { "city" : "CRAIG", "state" : "AK" }, "population" : 1398 }
{ "_id" : { "city" : "HYDABURG", "state" : "AK" }, "population" : 891 }
{ "_id" : { "city" : "KETCHIKAN", "state" : "AK" }, "population" : 14308 }
{ "_id" : { "city" : "PETERSBURG", "state" : "AK" }, "population" : 4253 }
{ "_id" : { "city" : "SITKA", "state" : "AK" }, "population" : 8638 }
{ "_id" : { "city" : "GUSTAVUS", "state" : "AK" }, "population" : 258 }
{ "_id" : { "city" : "ANGOON", "state" : "AK" }, "population" : 1002 }
{ "_id" : { "city" : "JUNEAU", "state" : "AK" }, "population" : 24947 }
{ "_id" : { "city" : "NUIQSUT", "state" : "AK" }, "population" : 354 }
{ "_id" : { "city" : "CHALKYITSIK", "state" : "AK" }, "population" : 99 }
{ "_id" : { "city" : "WHITE MOUNTAIN", "state" : "AK" }, "population" : 194 }
{ "_id" : { "city" : "AMBLER", "state" : "AK" }, "population" : 8 }
{ "_id" : { "city" : "TELLER", "state" : "AK" }, "population" : 260 }
{ "_id" : { "city" : "SHAKTOOLIK", "state" : "AK" }, "population" : 183 }
{ "_id" : { "city" : "SHUNGNAK", "state" : "AK" }, "population" : 0 }
{ "_id" : { "city" : "RUBY", "state" : "AK" }, "population" : 172 }
{ "_id" : { "city" : "NULATO", "state" : "AK" }, "population" : 492 }
{ "_id" : { "city" : "NOATAK", "state" : "AK" }, "population" : 395 }
{ "_id" : { "city" : "POINT LAY", "state" : "AK" }, "population" : 139 }

 
Example 3: Sort by Population
Let us sort the above output in the descending order of population. For this purpose, we use the $sort stage in the aggregation pipeline, as follows:

db.zips.aggregate (
	{$group:
		{
			_id: {city: '$city', state: '$state'},
			population: {$sum: '$pop'}
		}
	},
	{$sort:
		{population: -1}
	}
)

 
And here is the output:

{ "_id" : { "city" : "CHICAGO", "state" : "IL" }, "population" : 2452177 }
{ "_id" : { "city" : "BROOKLYN", "state" : "NY" }, "population" : 2300504 }
{ "_id" : { "city" : "LOS ANGELES", "state" : "CA" }, "population" : 2102295 }
{ "_id" : { "city" : "HOUSTON", "state" : "TX" }, "population" : 2095918 }
{ "_id" : { "city" : "PHILADELPHIA", "state" : "PA" }, "population" : 1610956 }
{ "_id" : { "city" : "NEW YORK", "state" : "NY" }, "population" : 1476790 }
{ "_id" : { "city" : "BRONX", "state" : "NY" }, "population" : 1209548 }
{ "_id" : { "city" : "SAN DIEGO", "state" : "CA" }, "population" : 1049298 }
{ "_id" : { "city" : "DETROIT", "state" : "MI" }, "population" : 963243 }
{ "_id" : { "city" : "DALLAS", "state" : "TX" }, "population" : 940191 }
{ "_id" : { "city" : "PHOENIX", "state" : "AZ" }, "population" : 890853 }
{ "_id" : { "city" : "MIAMI", "state" : "FL" }, "population" : 825232 }
{ "_id" : { "city" : "SAN JOSE", "state" : "CA" }, "population" : 816653 }
{ "_id" : { "city" : "SAN ANTONIO", "state" : "TX" }, "population" : 811792 }
{ "_id" : { "city" : "BALTIMORE", "state" : "MD" }, "population" : 733081 }
{ "_id" : { "city" : "SAN FRANCISCO", "state" : "CA" }, "population" : 723993 }
{ "_id" : { "city" : "MEMPHIS", "state" : "TN" }, "population" : 632837 }
{ "_id" : { "city" : "SACRAMENTO", "state" : "CA" }, "population" : 628279 }
{ "_id" : { "city" : "JACKSONVILLE", "state" : "FL" }, "population" : 610160 }
{ "_id" : { "city" : "ATLANTA", "state" : "GA" }, "population" : 609591 }

 
The $sort stage takes a JSON document that specifies one or more fields to sort the input collection by, and the sort order (1 being ascending order, and -1 descending order).
 
Example 4: Count of Zip Codes for Cities
How many zip codes do various cities have? Let’s see the output in the descending order of such count.
Well, surprisingly, MongoDB does not have a $count stage, but we can use {$sum: 1} to get the same result. Here is the query:

db.zips.aggregate (
	{$group:
		{
			_id: {city: '$city', state: '$state'},
			zips: {$sum: 1}
		}
	},
	{$sort:
		{zips: -1}
	}
)

 
And here is the output:

{ "_id" : { "city" : "HOUSTON", "state" : "TX" }, "zips" : 93 }
{ "_id" : { "city" : "LOS ANGELES", "state" : "CA" }, "zips" : 56 }
{ "_id" : { "city" : "PHILADELPHIA", "state" : "PA" }, "zips" : 48 }
{ "_id" : { "city" : "CHICAGO", "state" : "IL" }, "zips" : 47 }
{ "_id" : { "city" : "SAN ANTONIO", "state" : "TX" }, "zips" : 45 }
{ "_id" : { "city" : "DALLAS", "state" : "TX" }, "zips" : 44 }
{ "_id" : { "city" : "KANSAS CITY", "state" : "MO" }, "zips" : 41 }
{ "_id" : { "city" : "AUSTIN", "state" : "TX" }, "zips" : 40 }
{ "_id" : { "city" : "NEW YORK", "state" : "NY" }, "zips" : 40 }
{ "_id" : { "city" : "BROOKLYN", "state" : "NY" }, "zips" : 37 }
{ "_id" : { "city" : "SAN DIEGO", "state" : "CA" }, "zips" : 34 }
{ "_id" : { "city" : "MIAMI", "state" : "FL" }, "zips" : 34 }
{ "_id" : { "city" : "OKLAHOMA CITY", "state" : "OK" }, "zips" : 33 }
{ "_id" : { "city" : "PHOENIX", "state" : "AZ" }, "zips" : 33 }
{ "_id" : { "city" : "ATLANTA", "state" : "GA" }, "zips" : 31 }
{ "_id" : { "city" : "SAN JOSE", "state" : "CA" }, "zips" : 29 }
{ "_id" : { "city" : "TULSA", "state" : "OK" }, "zips" : 28 }
{ "_id" : { "city" : "SACRAMENTO", "state" : "CA" }, "zips" : 28 }
{ "_id" : { "city" : "OMAHA", "state" : "NE" }, "zips" : 27 }
{ "_id" : { "city" : "PORTLAND", "state" : "OR" }, "zips" : 26 }

 
Example 5: Limit the Query to a Specific State
Suppose we want to run the above query, only for the state of Illinois. We can use the $match stage to filter the documents that match a given condition, as shown below:

db.zips.aggregate (
	{$match:
		{state: 'IL'}
	},
	{$group:
		{
			_id: '$city',
			zips: {$sum: 1}
		}
	},
	{$sort:
		{zips: -1}
	}
)

 
Here is the output:

{ "_id" : "CHICAGO", "zips" : 47 }
{ "_id" : "ROCKFORD", "zips" : 8 }
{ "_id" : "PEORIA", "zips" : 5 }
{ "_id" : "NAPERVILLE", "zips" : 4 }
{ "_id" : "DECATUR", "zips" : 3 }
{ "_id" : "BELLEVILLE", "zips" : 3 }
{ "_id" : "JOLIET", "zips" : 3 }
{ "_id" : "OAK PARK", "zips" : 3 }
{ "_id" : "AURORA", "zips" : 3 }
{ "_id" : "LINCOLNWOOD", "zips" : 3 }
{ "_id" : "EVANSTON", "zips" : 3 }
{ "_id" : "EAST SAINT LOUIS", "zips" : 2 }
{ "_id" : "CHAMPAIGN", "zips" : 2 }
{ "_id" : "PEORIA HEIGHTS", "zips" : 2 }
{ "_id" : "RANTOUL", "zips" : 2 }
{ "_id" : "DOWNERS GROVE", "zips" : 2 }
{ "_id" : "SAINT CHARLES", "zips" : 2 }
{ "_id" : "SCHAUMBURG", "zips" : 2 }
{ "_id" : "HILLSIDE", "zips" : 2 }
{ "_id" : "HOFFMAN ESTATES", "zips" : 2 }

 
The $match stage takes a JSON document specifying the filter condition, with the same syntax as in the find() function of MongoDB. Since the output collection of this stage (and therefore the input collection of the next, $group stage) is limited to the {state: ‘IL’}, we don’t need to worry about including the state field in _id field of the $group stage.
 
Closing Remarks
The pipeline structure of the aggregation framework allows various operations to be applied, wherein the output of one stage is a conceptual collection that becomes the input to the next stage.
This article is only an introduction to a few features of the aggregation framework of MongoDB. There is much more to it, which I may cover in one or more follow-up articles some other time.

26 responses to “The Aggregation Framework in MongoDB”

  1. prony bony says:

    HxMDaQ Thanks a lot for the blog post. Fantastic.

  2. This site truly has all of the information and facts I needed about this subject and didn at know who to ask.

  3. I truly appreciate this post. I ave been looking everywhere for this! Thank God I found it on Google. You have made my day! Thank you again

  4. In truth, your creative writing abilities has inspired me to get my very own site now

  5. Wow! This can be one particular of the most useful blogs We have ever arrive across on this subject. Basically Wonderful. I am also a specialist in this topic therefore I can understand your effort.

  6. Thanks for every other fantastic post. Where else may anyone get that type of information in such a perfect way of writing? I have a presentation next week, and I am at the search for such info.

  7. You made some good points there. I looked on the internet for the topic and found most persons will go along with with your blog.

  8. Very interesting info !Perfect just what I was searching for! Justice delayed is justice denied. by William Gladstone.

  9. It as hard to come by well-informed people about this topic, but you seem like you know what you are talking about! Thanks

  10. Im obliged for the post.Much thanks again. Fantastic.

  11. Wow, great blog.Really looking forward to read more. Want more.

  12. I really liked your article.Really thank you! Will read on

  13. name says:

    You made some nice points there. I looked on the internet for the issue and found most individuals will consent with your site.

  14. This unique blog is really interesting as well as diverting. I have picked up a lot of handy tips out of this amazing blog. I ad love to return every once in a while. Thanks a lot!

  15. I think this is a real great article post.Really thank you! Will read on

  16. I truly appreciate this post. I have been looking all over for this! Thank goodness I found it on Bing. You ave made my day! Thank you again!

  17. sex toys says:

    Major thankies for the blog.Really looking forward to read more. Great.

  18. sex toys says:

    Well I really liked studying it. This post provided by you is very helpful for proper planning.

  19. lingerie says:

    Pretty nice post. I just stumbled upon your blog and wished to say that I have really enjoyed browsing your blog posts. After all I will be subscribing to your feed and I hope you write again soon!

  20. lingerie nyc says:

    PlаА аЂа•аА аЂаse let me know where аАааБТ“ou got your thаА аЂа•mаА аЂа•.

  21. nyc lingerie says:

    Perfectly written content material, Really enjoyed looking through.

  22. nyc sex toys says:

    You made some nice points there. I looked on the internet for the subject and found most people will consent with your website.

  23. It as hard to come by well-informed people in this particular topic, however, you sound like you know what you are talking about! Thanks

  24. Spot on with this write-up, I actually believe this web site needs a lot more attention.

  25. Wow! Thank you! I permanently needed to write on my blog something like that. Can I take a fragment of your post to my blog?

  26. Dispensary says:

    Say, you got a nice blog.Really looking forward to read more. Want more.

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2017 Pragati Software Pvt. Ltd. All Rights Reserved.

Enquiry

Pragatisoftware