How to Find the Document with the Longest Array in a MongoDB Collection?

When working with MongoDB collections, sometimes you may want to find the document that contains the longest array. For example, you may have a collection for storing shirts and the array "sizes" in it may contain all the sizes in which the shirt is available. In this case, how do you find the shirts with largest number of sizes? In other words, what is the shirt model with largest choice of sizes?

Let us assume that we use the collection "shirts" to store our shirt models available for sale. Let us populate it with some sample data.

db.shirts.insertMany([
{ "_id" : 1, "name" : "Blue Shirt", sizes: [ "S", "M", "L"] },
{ "_id" : 2, "name" : "Stripped Shirt", sizes: [ "S", "M", "L","XL","XXL"] },
{ "_id" : 3, "name" : "Black Shirt", sizes: [ "S", "M", "L","XL","XXL","XXXL"] },
{ "_id" : 4, "name" : "Grey Shirt", sizes: [ "S", "M", "L","XXL"] },
{ "_id" : 5, "name" : "Green Shirt", sizes: [ "S", "M", "L"] },
]);

From the data, it is clear that the Black shirt has the largest number of sizes. How do you find this using a MongoDB query? We can use the MongoDB aggregation with pipeline operations to get the document with the longest array. The following script returns the document id containing Black shirt which has the largest number of sizes.

db.shirts.aggregate( [
  { $unwind : "$sizes" },
  { $group : { _id : "$_id", length : { $sum : 1 } } },
  { $sort : { length : -1 } },
  { $limit : 1 }
] )

How does the above script work? Here is what happens,

  • The $unwind pipeline deconstructs an array field from the input documents to output a document for each element
  • The $group pipeline groups the documents based on _id
  • The $sum computes number of documents after grouping (due to _id grouping this becomes array size)
  • The $sort pipeline sorts in descending order of array length
  • Finally $limit returns the document id with largest array and size of array