MongoDB Connecting data between Collections Cheatsheet

Codylillyw
3 min readDec 16, 2024

--

NOTE: In some cases, if connecting data between collections is a concern you may need to reconsider your schema design.

See Schema Design Anti-Patterns — Part 1 — YouTube for common schema design mistakes.

Data that is accessed together should be stored together.

Here are some options for how to handle connected data:

Embed data

Embed the data directly into the parent document. Nesting objects is better for $project the entirety of the contents of an object and communicating related data.

DO

  • Duplicate data between collections that is not going to change

DONT

  • Create Massive unbounded arrays (16+MB) —Try to Invert the collection to be based on array items or if frequently updated, instead link by ids.

Choosing to create new collections instead of embed the data can quickly cause your queries to lose performance the more collections you need to connect.

Extended Reference Pattern

Store some of the data in the collection that is unlikely to change and store the rest in a separate collection.

Endpoint that Fetch and return data that is stored in other collections

Use this to access inexpensive calculated values not stored in a document. If it is not frequently accessed together or the value changes frequently separate from the document this may be a good option.

Computed Pattern

Calculate and store expensive values on update rather than calculating on read or if needed based on a time increment to avoid expensive reads.

Approximation Pattern

Perform write operations only when a specified threshold is reached. Use this when having the exact value is not important.

Reference collections

Use this to access data without mixing data. If the data is only sometimes accessed together this may be a good option. Add a ref to a document in another collection and populate that value.

const student = new Schema(
{
...
school_id: {
type: ObjectId,
ref: 'School'
},
...
}
const user = await Student.findOne(...).populate('school_id');

This results in data being stored in school_id ie. school_id.name. So if you need to do this you may want to name the id school instead of school_id.

Mongoose Virtual

Another way to do this is to use a virtual. Unlike populate a virtual can be told a new name/place for the value of the ref. A ref will store the looked up document within where the id was if a populate was used instead of a virtual.

userSchema.virtual('school', {
ref: 'School',
localField: 'school_id',
foreignField: '_id'
});

Aggregate and use $lookup stage

If you need to access ref values during aggregation you would instead do a $lookup.

students.aggregate([
{
$lookup: {
from: "School",
localField: "school_id",
foreignField: "_id",
as: "school"
}
}
]);

Create a view

Views are a good choice when you need to access combined data from multiple other collections to handle frequent read-only use cases. This makes the most sense when you would need to do excessive lookups to access all the data needed for an endpoint. This avoids duplicating frequently changed data. Unlike a materialized view, it executes its underlying query every time you access it, ensuring that the data is always up to date.

db.createView("schools", "students", [
{
$lookup: {
from: "students",
localField: "_id",
foreignField: "school_id",
as: "school"
}
}
]);

On-Demand Materialized Views

On-demand materialized views provide better read performance than standard views because they are read from disk instead of computed as part of the query. This performance benefit increases based on the complexity of the pipeline and size of the data being aggregated.

It requires manual or scheduled updates to refresh the data.

Use materialized views if avoiding repeat calculations every read is more important that maintaining up to date information or you want to handle maintaining the information separate from reading.

db.bakesales.aggregate( [
{ $match: { date: { $gte: startDate } } },
{ $group: { _id: { $dateToString: { format: "%Y-%m", date: "$date" } }, sales_quantity: { $sum: "$quantity"}, sales_amount: { $sum: "$amount" } } },
{ $merge: { into: "monthlybakesales", whenMatched: "replace" } }
] );

For more information on MongoDB patterns see: Building with Patterns: A Summary | MongoDB

--

--

Codylillyw
Codylillyw

Written by Codylillyw

I am a Software engineering student in my senior year with most of my experience in web development and related technology.

No responses yet