parent
e0f9bc8456
commit
9370e262c0
14 changed files with 494 additions and 5 deletions
@ -0,0 +1,21 @@ |
|||||||
|
# Aggregation Concepts |
||||||
|
|
||||||
|
MongoDB aggregation framework provides a way to process and transform data that is stored in our MongoDB collections. It allows you to perform calculations and return the calculated results using various data aggregation tools such as aggregation pipelines, map-reduce functions, or single-purpose aggregation methods. |
||||||
|
|
||||||
|
Here are some of the most important concepts of MongoDB Aggregation: |
||||||
|
|
||||||
|
- **Pipeline:** A pipeline is a series of stages that are executed in order to process the data. Each stage transforms the data in some way and passes it to the next stage. The output of the last stage is the final result of the pipeline. |
||||||
|
- **Stage:** A stage is a single operation that is applied to the data. It can be a simple transformation or a complex aggregation. Each stage has a specific purpose and is responsible for a single task. |
||||||
|
- **Operator:** An operator is a special symbol that is used to perform a specific operation on the data. It can be a mathematical operator, a logical operator, or a comparison operator. |
||||||
|
|
||||||
|
Example of a simple aggregation pipeline: |
||||||
|
|
||||||
|
```javascript |
||||||
|
db.collection.aggregate([ |
||||||
|
{ $match: { status: 'A' } }, |
||||||
|
{ $group: { _id: '$cust_id', total: { $sum: '$amount' } } }, |
||||||
|
{ $sort: { total: -1 } }, |
||||||
|
]); |
||||||
|
``` |
||||||
|
|
||||||
|
Each item in the pipeline is a stage. The first stage is a `$match` stage that filters the documents in the collection. The second stage is a `$group` stage that groups the documents by the `cust_id` field and calculates the sum of the `amount` field. The third stage is a `$sort` stage that sorts the documents by the `total` field in descending order. |
@ -0,0 +1,61 @@ |
|||||||
|
# $group |
||||||
|
|
||||||
|
The `$group` operator in MongoDB is used to aggregate and perform operations on the grouped data. The operator allows you to categorize documents in a collection based on specific fields and perform various operations on each group. These operations range from counting the number of documents in a group, to summing up the values of a particular field, to calculating average values, and many more. |
||||||
|
|
||||||
|
#### Basic Usage |
||||||
|
|
||||||
|
The basic syntax for the `$group` operator is as follows: |
||||||
|
|
||||||
|
```javascript |
||||||
|
{ |
||||||
|
$group: { |
||||||
|
_id: <expression>, |
||||||
|
<field1>: { <accumulator1> : <expression1> }, |
||||||
|
... |
||||||
|
} |
||||||
|
} |
||||||
|
``` |
||||||
|
|
||||||
|
Here's a quick breakdown of the components: |
||||||
|
|
||||||
|
- `_id`: This field represents the criteria for grouping the documents. It can be a single field name or an expression that returns a value. |
||||||
|
- `<field1>`: This is the name of the field you want to create in the resulting documents, which store the computed values from the group. |
||||||
|
- `<accumulator1>`: This is one of the [accumulators](https://docs.mongodb.com/manual/reference/operator/aggregation/#grp._S_grp) that MongoDB provides (e.g. `$sum`, `$avg`, `$min`, `$max`, `$push`, etc.). They specify the operation to perform on the grouped data. |
||||||
|
- `<expression1>`: This is the field or expression that the `$group` operator applies to the specific accumulator. |
||||||
|
|
||||||
|
Suppose we have a collection called `orders`, which contains documents representing sales data. |
||||||
|
|
||||||
|
```javascript |
||||||
|
[ |
||||||
|
{ "_id": 1, "customer_id": "C1", "amount": 110 }, |
||||||
|
{ "_id": 2, "customer_id": "C2", "amount": 150 }, |
||||||
|
{ "_id": 3, "customer_id": "C1", "amount": 90 }, |
||||||
|
{ "_id": 4, "customer_id": "C3", "amount": 200 }, |
||||||
|
{ "_id": 5, "customer_id": "C2", "amount": 50 } |
||||||
|
] |
||||||
|
``` |
||||||
|
|
||||||
|
Now, let's group the data by `customer_id` and calculate each customer's total spent amount. |
||||||
|
|
||||||
|
```javascript |
||||||
|
db.orders.aggregate([ |
||||||
|
{ |
||||||
|
$group: { |
||||||
|
_id: "$customer_id", |
||||||
|
total_spent: { $sum: "$amount" } |
||||||
|
} |
||||||
|
} |
||||||
|
]) |
||||||
|
``` |
||||||
|
|
||||||
|
This query would result in the following: |
||||||
|
|
||||||
|
```javascript |
||||||
|
[ |
||||||
|
{ "_id": "C1", "total_spent": 200 }, |
||||||
|
{ "_id": "C2", "total_spent": 200 }, |
||||||
|
{ "_id": "C3", "total_spent": 200 } |
||||||
|
] |
||||||
|
``` |
||||||
|
|
||||||
|
Using the `$group` operator, documents in the `orders` collection were grouped by `customer_id`, and the total spent amount for each customer was calculated using the `$sum` accumulator. |
@ -0,0 +1,56 @@ |
|||||||
|
# $match |
||||||
|
|
||||||
|
The `$match` operator is used to filter documents within the pipeline in the MongoDB aggregation framework. It helps in excluding documents that do not fulfill the specified condition(s). The `$match` operator filters documents and passes only those that match the specified conditions to the next stage of the pipeline. |
||||||
|
|
||||||
|
The basic syntax for the `$match` operator is as follows: |
||||||
|
|
||||||
|
```python |
||||||
|
{ $match: { <query> } } |
||||||
|
``` |
||||||
|
|
||||||
|
Where `<query>` contains the conditions and the fields which the documents should match. |
||||||
|
|
||||||
|
### Examples |
||||||
|
|
||||||
|
Let's take a look at some examples to understand the usage of the `$match` operator. |
||||||
|
|
||||||
|
Suppose you have a collection named `employees` with the following document structure: |
||||||
|
|
||||||
|
```json |
||||||
|
{ |
||||||
|
"_id": ObjectId("123"), |
||||||
|
"firstName": "John", |
||||||
|
"lastName": "Doe", |
||||||
|
"age": 25, |
||||||
|
"department": "HR" |
||||||
|
} |
||||||
|
``` |
||||||
|
You are asked to find employees aged above 30. To do this, you can use the `$match` operator as follows: |
||||||
|
|
||||||
|
```python |
||||||
|
db.employees.aggregate([ |
||||||
|
{ $match: { age: { $gt: 30 } } } |
||||||
|
]) |
||||||
|
``` |
||||||
|
|
||||||
|
This returns all employees with age greater than 30. |
||||||
|
|
||||||
|
**Example 2:** |
||||||
|
|
||||||
|
Now, let's say you also want to filter employees working in the "HR" department. You can chain conditions to the `$match` operator like this: |
||||||
|
|
||||||
|
```python |
||||||
|
db.employees.aggregate([ |
||||||
|
{ $match: { age: { $gt: 30 }, department: "HR" } } |
||||||
|
]) |
||||||
|
``` |
||||||
|
|
||||||
|
This returns employees who are aged above 30 and working in the "HR" department. |
||||||
|
|
||||||
|
### Important Things to Keep in Mind |
||||||
|
|
||||||
|
- When using multiple conditions in the `$match` query, they work as an implicit `$and` operator. |
||||||
|
- `$match` operator works best earlier in the pipeline. Placing it earlier prevents unnecessary processing and filtering of documents in later stages, which can improve the overall performance of the aggregation pipeline. |
||||||
|
- The `$match` operator uses most of the standard query operators, like `$gt`, `$lte`, `$in`, and so on. |
||||||
|
|
||||||
|
In conclusion, the `$match` operator is a powerful and essential tool when working with MongoDB's aggregation pipeline to filter and process datasets based on specific conditions, leading to better performance and more relevant results. |
@ -0,0 +1,34 @@ |
|||||||
|
# $sort |
||||||
|
|
||||||
|
The `$sort` operator is an aggregation operator in MongoDB that sorts the documents that are passed through the pipeline. It takes one or more fields as parameters and sorts the documents in ascending or descending order based on the values in the specified fields. |
||||||
|
|
||||||
|
Here's the syntax for the `$sort` operator: |
||||||
|
|
||||||
|
```javascript |
||||||
|
{ $sort: { field1: <sort order>, field2: <sort order>, ... } } |
||||||
|
``` |
||||||
|
|
||||||
|
The `<sort order>` parameter can be either `1` or `-1`, which corresponds to ascending or descending order, respectively. |
||||||
|
|
||||||
|
For example, suppose we have a collection of documents containing information about books, and we want to sort the documents by the book's title in ascending order. We can use the following `$sort` operator: |
||||||
|
|
||||||
|
```javascript |
||||||
|
db.books.aggregate([ |
||||||
|
{ $sort : { title : 1 } } |
||||||
|
]) |
||||||
|
``` |
||||||
|
|
||||||
|
This will sort the documents by the `title` field in ascending order. |
||||||
|
|
||||||
|
We can also use the `$sort` operator to sort by multiple fields. For example, suppose we have a collection of documents containing information about students, and we want to sort the documents by the student's age in descending order and then by their name in ascending order. We can use the following `$sort` operator: |
||||||
|
|
||||||
|
|
||||||
|
```javascript |
||||||
|
db.students.aggregate([ |
||||||
|
{ $sort : { age : -1, name : 1 } } |
||||||
|
]) |
||||||
|
``` |
||||||
|
|
||||||
|
This will sort the documents by the `age` field in descending order and then by the `name` field in ascending order. |
||||||
|
|
||||||
|
It's important to note that the `$sort` operator can be an expensive operation, especially if sorting large datasets. So it's recommended to use it towards the end of a pipeline to minimize the number of documents being sorted. |
@ -0,0 +1,75 @@ |
|||||||
|
# $project |
||||||
|
|
||||||
|
The `$project` operator helps in selecting or controlling the fields in a document by passing only the necessary attributes to the next stage in the pipeline. |
||||||
|
|
||||||
|
```javascript |
||||||
|
db.collection.aggregate([ |
||||||
|
{ |
||||||
|
$project: |
||||||
|
{ |
||||||
|
field1: <1 or 0>, |
||||||
|
field2: <1 or 0>, |
||||||
|
... |
||||||
|
} |
||||||
|
} |
||||||
|
]) |
||||||
|
``` |
||||||
|
|
||||||
|
The value `1` or `0` in the syntax represents whether the field should be included or excluded, respectively. |
||||||
|
|
||||||
|
Let's assume we have the following documents in a `students` collection: |
||||||
|
|
||||||
|
```json |
||||||
|
[ |
||||||
|
{ "_id" : 1, "name" : "John Doe", "age" : 20, "subjects" : [ "Math", "Physics" ] }, |
||||||
|
{ "_id" : 2, "name" : "Jane Smith", "age" : 23, "subjects" : [ "Chemistry", "Biology" ] } |
||||||
|
] |
||||||
|
``` |
||||||
|
|
||||||
|
We can use the `$project` operator to include only the name and age fields, excluding the subjects: |
||||||
|
|
||||||
|
```javascript |
||||||
|
db.students.aggregate([ |
||||||
|
{ |
||||||
|
$project: { |
||||||
|
_id: 0, |
||||||
|
name: 1, |
||||||
|
age: 1 |
||||||
|
} |
||||||
|
} |
||||||
|
]) |
||||||
|
``` |
||||||
|
|
||||||
|
Returned documents: |
||||||
|
|
||||||
|
```json |
||||||
|
[ |
||||||
|
{ "name" : "John Doe", "age" : 20 }, |
||||||
|
{ "name" : "Jane Smith", "age" : 23 } |
||||||
|
] |
||||||
|
``` |
||||||
|
|
||||||
|
Notice that the resulting documents do not include the "_id" and "subjects" fields. |
||||||
|
|
||||||
|
In the example below, we'll exclude the "subjects" field: |
||||||
|
|
||||||
|
```javascript |
||||||
|
db.students.aggregate([ |
||||||
|
{ |
||||||
|
$project: { |
||||||
|
subjects: 0 |
||||||
|
} |
||||||
|
} |
||||||
|
]) |
||||||
|
``` |
||||||
|
|
||||||
|
Returned documents: |
||||||
|
|
||||||
|
```json |
||||||
|
[ |
||||||
|
{ "_id" : 1, "name" : "John Doe", "age" : 20 }, |
||||||
|
{ "_id" : 2, "name" : "Jane Smith", "age" : 23 } |
||||||
|
] |
||||||
|
``` |
||||||
|
|
||||||
|
Now that you have a basic understanding of the `$project` operator, you can try it out with various scenarios to reshape your MongoDB documents according to your needs. This operator can also be used in conjunction with other operators to perform complex data manipulations within the aggregation pipeline. |
@ -0,0 +1,32 @@ |
|||||||
|
# $skip |
||||||
|
|
||||||
|
The `$skip` operator is a useful tool for paginating query results or skipping over a specified number of documents in a collection. This operator can be applied in the aggregation pipeline using the `skip()` method. |
||||||
|
|
||||||
|
In the following example, we will demonstrate how to use the `$skip` operator: |
||||||
|
|
||||||
|
```javascript |
||||||
|
db.collection.aggregate([ |
||||||
|
{ |
||||||
|
$skip: <number> |
||||||
|
} |
||||||
|
]); |
||||||
|
``` |
||||||
|
|
||||||
|
Here, `<number>` is the number of documents you want to skip in the collection. |
||||||
|
|
||||||
|
## Example |
||||||
|
|
||||||
|
Let's say we have a collection named `employees` and we want to skip the first 5 documents of the collection (e.g., for paginating results). We can do this using the `$skip` operator: |
||||||
|
|
||||||
|
```javascript |
||||||
|
db.employees.aggregate([ |
||||||
|
{ |
||||||
|
$skip: 5 |
||||||
|
} |
||||||
|
]); |
||||||
|
``` |
||||||
|
|
||||||
|
## Important Notes |
||||||
|
|
||||||
|
- The `$skip` operator does not guarantee the order of documents passed through, so it's recommended you use `$sort` before `$skip` when order matters. |
||||||
|
- For better performance, consider combining `$skip` with additional filters, and placing it later in the pipeline. |
@ -0,0 +1,23 @@ |
|||||||
|
# $limit |
||||||
|
|
||||||
|
The $limit operator limits the number of documents passed to the next stage in the pipeline. The $limit operator is useful for debugging and testing pipelines. It is also useful for limiting the number of documents that are returned by a pipeline. |
||||||
|
|
||||||
|
Here's the syntax for the $limit operator: |
||||||
|
|
||||||
|
```javascript |
||||||
|
{ $limit: <number> } |
||||||
|
``` |
||||||
|
|
||||||
|
Here, `<number>` is the number of documents you want to limit the pipeline to. |
||||||
|
|
||||||
|
## Example |
||||||
|
|
||||||
|
Let's say we have a collection named `employees` and we want to limit the number of documents to 5. We can do this using the `$limit` operator: |
||||||
|
|
||||||
|
```javascript |
||||||
|
db.employees.aggregate([ |
||||||
|
{ |
||||||
|
$limit: 5 |
||||||
|
} |
||||||
|
]); |
||||||
|
``` |
@ -0,0 +1,63 @@ |
|||||||
|
# $unwind |
||||||
|
|
||||||
|
The `$unwind` operator is a powerful aggregation pipeline stage in MongoDB that allows you to deconstruct an array field from input documents and generate a new document for each element in the array, essentially "unwinding" the array. |
||||||
|
|
||||||
|
This operator is particularly useful when you have documents containing array fields, and you need to perform operations on the individual elements within those arrays. `$unwind` enables you to flatten the array structure and easily manipulate or analyze data within arrays as separate documents. |
||||||
|
|
||||||
|
## Syntax |
||||||
|
|
||||||
|
The general syntax for the `$unwind` operator is: |
||||||
|
|
||||||
|
```javascript |
||||||
|
{ |
||||||
|
$unwind: { |
||||||
|
path: <field path>, |
||||||
|
includeArrayIndex: <string>, // Optional |
||||||
|
preserveNullAndEmptyArrays: <boolean> // Optional |
||||||
|
} |
||||||
|
} |
||||||
|
``` |
||||||
|
|
||||||
|
## Parameters |
||||||
|
|
||||||
|
- `path`: A string representing the field path of the array you want to unwind. It must be prefixed with a `$` to indicate referencing a field in the input document. |
||||||
|
- `includeArrayIndex`: (Optional) A string representing the field name for the index of the array element. The output documents will include this field, with the value as the index of the element in the original array. |
||||||
|
- `preserveNullAndEmptyArrays`: (Optional) A boolean value that determines whether to output a document for input documents that don't have the specified `path` or have an empty array, null, or missing value. By default, these input documents are not included in the output. |
||||||
|
|
||||||
|
## Example |
||||||
|
|
||||||
|
Consider a `sales` collection with the following sample document: |
||||||
|
|
||||||
|
```javascript |
||||||
|
{ |
||||||
|
_id: 1, |
||||||
|
item: "itemA", |
||||||
|
orders: [ |
||||||
|
{ quantity: 2, unitPrice: 10 }, |
||||||
|
{ quantity: 3, unitPrice: 20 }, |
||||||
|
{ quantity: 1, unitPrice: 15 } |
||||||
|
] |
||||||
|
} |
||||||
|
``` |
||||||
|
|
||||||
|
If you want to calculate the total revenue for each individual order, you can use the `$unwind` operator to deconstruct the `orders` array: |
||||||
|
|
||||||
|
```javascript |
||||||
|
db.sales.aggregate([ |
||||||
|
{ $unwind: { path: "$orders" } } |
||||||
|
]) |
||||||
|
``` |
||||||
|
|
||||||
|
The output will be: |
||||||
|
|
||||||
|
```javascript |
||||||
|
[ |
||||||
|
{ _id: 1, item: "itemA", orders: { quantity: 2, unitPrice: 10 } }, |
||||||
|
{ _id: 1, item: "itemA", orders: { quantity: 3, unitPrice: 20 } }, |
||||||
|
{ _id: 1, item: "itemA", orders: { quantity: 1, unitPrice: 15 } } |
||||||
|
] |
||||||
|
``` |
||||||
|
|
||||||
|
Now each document represents a single order, and you can easily perform further operations like calculating the revenue for each document. |
||||||
|
|
||||||
|
Remember, the `$unwind` operator is a crucial tool for handling and analyzing array data in MongoDB, enabling you to efficiently work with complex data structures. |
@ -0,0 +1,61 @@ |
|||||||
|
# $lookup |
||||||
|
|
||||||
|
The `$lookup` stage in MongoDB is a powerful aggregation pipeline operator that allows you to perform left outer join between two collections. It is used for combining data from multiple collections in a single aggregation pipeline operation. |
||||||
|
|
||||||
|
Here's a brief summary of `$lookup` operator: |
||||||
|
|
||||||
|
## Syntax |
||||||
|
|
||||||
|
The `$lookup` operator uses the following syntax: |
||||||
|
|
||||||
|
```json |
||||||
|
{ |
||||||
|
"$lookup": { |
||||||
|
"from": "<collection_name>", |
||||||
|
"localField": "<field_from_input_documents>", |
||||||
|
"foreignField": "<field_from_documents_of_the_from_collection>", |
||||||
|
"as": "<output_array_field>" |
||||||
|
} |
||||||
|
} |
||||||
|
``` |
||||||
|
|
||||||
|
## Parameters |
||||||
|
|
||||||
|
* `from`: The target collection to perform the join operation with. |
||||||
|
* `localField`: The field from the input collection (i.e., the collection on which the `$lookup` is applied). |
||||||
|
* `foreignField`: The field from the target collection (i.e., the `from` collection). |
||||||
|
* `as`: The name of the output array field that will store the joined documents. |
||||||
|
|
||||||
|
## Example |
||||||
|
|
||||||
|
Suppose you have two collections, `orders` and `products`. The `orders` collection contains documents with following fields: `orderId`, `productId`, and `quantity`. The `products` collection contains documents with fields: `productId`, `productName`, and `price`. |
||||||
|
|
||||||
|
To calculate the total amount of each order, you can use the `$lookup` operator along with other aggregation stages: |
||||||
|
|
||||||
|
```javascript |
||||||
|
db.orders.aggregate([ |
||||||
|
{ |
||||||
|
"$lookup": { |
||||||
|
"from": "products", |
||||||
|
"localField": "productId", |
||||||
|
"foreignField": "productId", |
||||||
|
"as": "productDetails" |
||||||
|
} |
||||||
|
}, |
||||||
|
{ |
||||||
|
"$unwind": "$productDetails" |
||||||
|
}, |
||||||
|
{ |
||||||
|
"$project": { |
||||||
|
"orderId": 1, |
||||||
|
"totalAmount": { |
||||||
|
"$multiply": ["$quantity", "$productDetails.price"] |
||||||
|
} |
||||||
|
} |
||||||
|
} |
||||||
|
]) |
||||||
|
``` |
||||||
|
|
||||||
|
In this example, `$lookup` will join the `orders` and `products` collections based on `productId`. The joined data will be stored in the new `productDetails` array field. Additional aggregation stages (`$unwind` and `$project`) are used to calculate and display the total amount of each order. |
||||||
|
|
||||||
|
So, the `$lookup` operator becomes an essential tool when you need to work with data from multiple collections and perform complex data processing tasks in MongoDB. |
@ -0,0 +1,55 @@ |
|||||||
|
# $sum |
||||||
|
|
||||||
|
The `$sum` operator is a powerful and commonly-used operator in MongoDB, which is primarily utilized in conjunction with the `$group` stage in the aggregation pipeline. As the name suggests, it allows you to calculate the sum of the numeric values in either specified fields or by evaluating expression values for each input document. |
||||||
|
|
||||||
|
## Syntax |
||||||
|
|
||||||
|
The basic syntax for using the `$sum` operator is as follows: |
||||||
|
|
||||||
|
```javascript |
||||||
|
{ $sum: <expression> } |
||||||
|
``` |
||||||
|
|
||||||
|
The `<expression>` can be a field, a number value, or another operator that returns a numeric value. |
||||||
|
|
||||||
|
## Examples |
||||||
|
|
||||||
|
## Calculate Sum of Field Values |
||||||
|
|
||||||
|
Suppose you have a collection of `orders` and you want to calculate the total revenue. You can use the `$sum` operator in combination with the `$group` stage to achieve this: |
||||||
|
|
||||||
|
```javascript |
||||||
|
db.orders.aggregate([ |
||||||
|
{ |
||||||
|
$group: { |
||||||
|
_id: null, |
||||||
|
totalRevenue: { $sum: "$price" } |
||||||
|
} |
||||||
|
} |
||||||
|
]) |
||||||
|
``` |
||||||
|
|
||||||
|
## Calculate Sum with Expression |
||||||
|
|
||||||
|
You can also use the `$sum` operator with an expression to perform more complex calculations. For example, if your `orders` collection has a `quantity` field and you want to calculate the total quantity of items sold, you can use the following aggregation: |
||||||
|
|
||||||
|
```javascript |
||||||
|
db.orders.aggregate([ |
||||||
|
{ |
||||||
|
$group: { |
||||||
|
_id: null, |
||||||
|
totalQuantity: { $sum: { $multiply: ["$price", "$quantity"] } } |
||||||
|
} |
||||||
|
} |
||||||
|
]); |
||||||
|
``` |
||||||
|
|
||||||
|
In this example, the `$multiply` operator is used to calculate the total price for each order, and then `$sum` adds up those values to return the total quantity. |
||||||
|
|
||||||
|
## Caveats |
||||||
|
|
||||||
|
It's important to note that the `$sum` operator only works with numeric values. In case a non-numeric value is encountered, the `$sum` operator will return `null`. To prevent this, you can use the `$ifNull` or `$cond` operators to handle non-numeric values in your expression. |
||||||
|
|
||||||
|
## Conclusion |
||||||
|
|
||||||
|
The `$sum` operator is a versatile and essential tool in the aggregation pipeline. By allowing you to calculate the sum of field values or expressions, it helps you efficiently perform aggregate calculations for your MongoDB data. |
Loading…
Reference in new issue