Content Index
- SQL vs. NoSQL
- Advantages and Disadvantages
- High Performance: FastAPI + MongoDB
- Flexibility vs. Consistency
- The Consistency Problem
- Key Concepts and Equivalencies
- Installation on Windows
- Installation on macOS (using Homebrew)
- Prerequisites
- Compatible macOS
- Homebrew Installation
- Necessary software to install MongoDB
- Install Command Line Tools for Xcode
- Install Homebrew
- What is a Homebrew tap?
- Command to add the tap
- Choosing the MongoDB version
- Install the MongoDB Homebrew Tap
- This is a custom Homebrew Tap (package) for official MongoDB software.
- Graphical Interface: MongoDB Compass
- Checking the MongoDB Installation
- Starting the MongoDB Process
- Starting the Service
- Stopping or Restarting MongoDB
- Common Errors and How to Resolve Them
- Connection error on localhost
- Incompatible Version Issues
- Motor: The Asynchronous Driver for MongoDB
- Installation of Dependencies
- First Connection to MongoDB with Motor
- Client Configuration
- Cleaning up SQLAlchemy in FastAPI
- Life Cycle with Lifespan in FastAPI
- Obtaining the MongoDB Client
- Router and Schema Implementation
- From ORM to Native Driver
- CRUD Operations Step by Step
- 1. Create Task (POST)
- _id and the ObjectId
- 2. Read All Tasks (GET)
- 3. Read a Specific Task (GET by ID)
- 4. Update and Delete (PUT / DELETE)
- Verification in Mongo Compass
- Example of the Relational Schema in MongoDB
- Adding and Removing Tags
- 1. Adding Tags ($addToSet Operator)
- $addToSet + $each
- 2. Removing Tags ($pull Operator)
- Schema Flexibility: Where is the tags table?
- How tags are stored in MongoDB
- Relationships
- Normalized vs. Denormalized Schemas
- 1. Denormalized Schema (Embedded)
- 2. Normalized Schema (Referenced)
- When to use each? (1:1, 1:N and N:N)
- Mass Operations: update_one vs update_many
- Conclusion and Practice
The next step is to configure our project to use MongoDB, which benefits greatly from the automated CRUD implementation we created earlier with FastAPI. We'll use the same database project structure we established in the previous section.
Let's learn how we can install MongoDB if we are on macOS; for this, we will start with the assumption that you have Homebrew installed, which is simply a package manager for macOS and Linux.
We will perform some practices to use a NoSQL database, specifically MongoDB, in FastAPI. Before starting, we will compare MongoDB with traditional relational (SQL) databases.
SQL vs. NoSQL
- Relational Databases (SQL): As we have seen so far, they are structured. They function like Excel tables linked to each other. They have a fixed schema; for example, if you have a "tasks" table with ID and Name, and then you want to save a Description, you must mandatory modify the database schema.
- NoSQL Databases (MongoDB): These are unstructured (or semi-structured) databases. Data is stored in a flexible way, generally in formats similar to JSON. This allows changing the structure without prior notice. For example, we can inject a "category" directly into the schema of a "task".
Advantages and Disadvantages
Advantages:
- Flexibility: Ideal for rapid prototyping and changing schemas.
- Massive scalability: Designed to handle gigantic volumes of data.
- Speed: They tend to be more efficient for simple read/write operations.
Disadvantages:
- Less consistency: By not having a fixed schema, they can become a mess if not managed well.
- Complex queries: It is more difficult to perform joins or very intricate queries.
- Maturity: Although they are popular, the SQL ecosystem has decades more of support and stability.
High Performance: FastAPI + MongoDB
The main focus is to help you break away from the relational SQL mindset and enter the NoSQL world.
- FastAPI is a framework recognized for its extremely high performance and speed.
- MongoDB is a database designed precisely for high performance and scalability.
This combination is ideal for high-demand projects where response speed is critical.
Flexibility vs. Consistency
One of the points I will repeat most often is the trade-off of lower data consistency in favor of greater flexibility. In MongoDB, everything is a JSON (technically BSON), which gives us total freedom, but also risks.
The Consistency Problem
Imagine we have a task with a category_id field. Due to MongoDB's flexibility, you could encounter:
- A task that has the category correctly defined.
- Another task that, due to a CRUD error, does not have the category field.
- A task with directly embedded tags, without an external relational table.
If you don't manage your logic well from the code, when trying to query the category of a record that doesn't have it, your application could throw a 500 error. In MongoDB, the responsibility for maintaining data integrity falls much more on the developer and how they program their CRUD.
Key Concepts and Equivalencies
Before starting our small project, it is essential that we speak the same language. If you come from the SQL world, here are the basic equivalencies:
SQL Concept MongoDB Equivalent
- Table Collection
- Record/Row Document
- Column Field
Installation on Windows
Installation on Windows is very simple:
- Search Google for MongoDB Community Server.
- Download the installer and follow the typical steps (Next, Next, Finish).
- Environment variables configuration: You will likely need to add the installation path (usually the bin folder) to the system environment variables.
- Tip: Right-click on "This PC" -> Properties -> Advanced system settings -> Environment Variables -> Path -> Add the bin folder path.
- Restart your computer and you will be able to use the mongosh command.
Installation on macOS (using Homebrew)
Installing MongoDB on macOS might seem complicated the first time, but using Homebrew makes the process much simpler and cleaner. In this guide, I explain step-by-step how to install MongoDB on macOS with Homebrew, how to start it correctly, and how to begin working with the database by performing basic CRUD operations.
This flow is the one I always use whenever I set up a new development environment on Mac, and it avoids most of the typical errors that usually appear when starting MongoDB for the first time.
With our package manager ready, nothing could be easier; the first thing we need to do is add the MongoDB repository to our package manager.
Prerequisites
Before installing MongoDB, it is important to ensure that the system has everything it needs.
Compatible macOS
MongoDB works correctly on modern versions of macOS (Catalina onwards). If you are using a very old version, it is recommended to update the system or install a compatible version of MongoDB.
Homebrew Installation
macOS does not include Homebrew by default, and it is one of the most important tools for development on Mac. Homebrew is a package manager that allows you to install software from the terminal easily.
To install it, follow the official instructions from their website:
Necessary software to install MongoDB
Install Command Line Tools for Xcode
Most likely, when you go to execute the Brew command to install the package, it will ask you to install the Command Line Tools for Xcode; accept, download, and install these tools.
It is very probable that, when executing any brew command, macOS will ask you to install the Command Line Tools for Xcode.
Accept the message and let them install, as they are necessary to compile and run many dependencies.
Install Homebrew
What is a Homebrew tap?
A tap is simply an additional repository that Homebrew uses to find packages. MongoDB maintains its official tap, which is important to avoid unsupported installations.
Command to add the tap
Now then, let's install Homebrew. macOS does not include the Homebrew preparation package by default; therefore, you have to install it as indicated on the official page. https://brew.sh/#install
Homebrew installs the things you need for your macOS from a terminal easily.
$ brew tap mongodb/brewThis step is key; many errors come from trying to install MongoDB without using the official tap.
Choosing the MongoDB version
MongoDB publishes several versions. In this case, we are going to install a specific stable version, which is the one that has given me the best results on macOS:
Install the MongoDB Homebrew Tap
Issue the following from the terminal to tap the official MongoDB Homebrew tap: https://github.com/mongodb/homebrew-brew
This is a custom Homebrew Tap (package) for official MongoDB software.
$ brew tap mongodb/brewAfter this, we install the latest version to date, which at the time of saying these words would be:
$ brew install mongodb-community@8.2Or you can install a specific version:
$ brew install mongodb-community@8.0
$ brew install mongodb-community@7.0Installing a specific version avoids incompatibilities with libraries or the operating system, something that has already saved me more than one headache.
Graphical Interface: MongoDB Compass
To work in a more pleasant way and not depend only on the terminal, we will install MongoDB Compass, the official graphical interface tool.
- On Windows: It can be selected during the server installation or downloaded separately from the official website.
- On macOS (via Homebrew):
$ brew install --cask mongodb-compass- (The --cask parameter indicates that we are installing an application with a graphical interface).
Once installed, you will find it in your Applications folder. Open it, connect to the local server, and you will be ready to manage your data collections.
You can also install it using the installer on macOS and Windows:
https://www.mongodb.com/try/download/compass
Checking the MongoDB Installation
Once the process is finished, MongoDB will be installed on your computer, but it will not be running yet.
Starting the MongoDB Process
Now that we have MongoDB on our computer, the next thing we are going to do is start the process, because if we run in our terminal:
$ brew services start mongodb-communityThis command is fundamental. If you don't start the service and run mongo directly, you will get a connection error.
Because if you don't start it and type mongo in the terminal, you will see an error like the following:
MongoDB shell version v8.0.2
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed: SocketException: Error connecting to 127.0.0.1:27017 :: caused by :: Connection refused :
connect@src/mongo/shell/mongo.js:372:17This happens because MongoDB is not listening on port 27017.
Once successfully started, the command:
$ mongoWill allow you to access the shell without problems, or see its installed version:
$ mongod --versionStarting the Service
Just like with services like MySQL, to be able to use it, we must start the service; because if we try to start the Mongo assistant without starting the service:
$ mongoshWe will see an error like the following:
Current Mongosh Log ID: 699c28e47c1b4855cf41cae5
Connecting to: mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+2.7.0
MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27017Which says it's trying to connect but the MongoDB server DID NOT respond; we start the service:
brew services start mongodb-community@8.2And now, if you run:
$ mongoshIt should greet you with a:
test>Stopping or Restarting MongoDB
Some useful commands I usually use:
$ brew services stop mongodb-community
$ brew services restart mongodb-communityCommon Errors and How to Resolve Them
Connection error on localhost
It is almost always because the service is not started. Verify with:
$ brew services listIncompatible Version Issues
If you changed your macOS version or updated MongoDB, it might be necessary to reinstall the correct version or clean up old services.
Motor: The Asynchronous Driver for MongoDB
To work with MongoDB in Python, many connectors exist, but we are going to use Motor.
Motor is a driver specifically designed to work asynchronously. As we mentioned previously, MongoDB is designed for large data volumes and high concurrency. If we add this to FastAPI's asynchronous schema, we obtain a fundamental tool for handling requests much more efficiently. It makes perfect sense to use an asynchronous service for high-consumption applications, which is precisely the purpose of MongoDB.
Installation of Dependencies
To install the tool, the process is the standard one using pip. Remember to have your virtual environment active before running the command.
Keep in mind that as we move forward, we will be deleting the dependencies we no longer need, such as everything related to SQLite and SQLAlchemy, since MongoDB does not require these libraries.
To install Motor, run:
$ pip install motorFirst Connection to MongoDB with Motor
We are going to make our first connection to MongoDB. We create a file named db_connection.py. In it, we import the Motor client to manage the asynchronous connection:
db_connection.py
import logging
from motor.motor_asyncio import AsyncIOMotorClient
logger = logging.getLogger("uvicorn.error")
mongo_client = AsyncIOMotorClient(
"mongodb://localhost:27017"
)
async def ping_mongo_db_server():
try:
await mongo_client.admin.command("ping")
logger.info("Connected to MongoDB")
except Exception as e:
logger.error(
f"Error connecting to MongoDB: {e}"
)
raise eClient Configuration
Before starting, make sure the MongoDB service is active (as we saw in the first video). If typing mongosh in your terminal gives you the welcome message, everything is in order; otherwise, you must start it.
We define the connection URL, which by default uses port 27017 (similar to how MySQL uses 3306).
We create a function to "ping" the server. If the attempt is successful, we will show a message indicating that we are connected; otherwise, we will throw an exception to warn that there are connection problems.
Cleaning up SQLAlchemy in FastAPI
In the main file of your API, you should comment out or delete everything related to SQLAlchemy and relational databases, since we will now use MongoDB.
- Imports: Comment out the SQLAlchemy lines and the routes that depend on it.
- Dependencies: You can delete the function that managed the relational database session.
- Models: We will no longer need to create tables at startup, as MongoDB does not require a fixed predefined schema.
Life Cycle with Lifespan in FastAPI
To manage the connection efficiently, we will use the Lifespan event handler. If you weren't familiar with it, it is a simple way to control application life cycles.
- Before the yield: Everything you place here will run before the application starts receiving requests. It is the ideal place to initialize the MongoDB client.
- After the yield: Here we will place the logic to close the connection or clean up resources when the application stops.
Finally, we configure this lifespan when creating the FastAPI instance:
api.py
from fastapi import FastAPI, Depends, APIRouter, Query, Path
from contextlib import asynccontextmanager
from db_connection import ping_mongo_db_server
@asynccontextmanager
async def lifespan(app: FastAPI):
await ping_mongo_db_server()
yield
app = FastAPI(lifespan=lifespan)Upon starting the server, you should see the message: "Connected to MongoDB" in the console. This confirms that the operation was successful and we are ready to start working with collections and documents.
Obtaining the MongoDB Client
Now, we are going to implement a service responsible for managing the connection. This service returns the database instance ready to use:
mongo_db.py
from db_connection import mongo_client
# Define the database that will contain all the collections of our application.
# The motor library will create it automatically if it does not exist.
database = mongo_client.task_manager
def get_mongo_database():
"""Returns the database to be used as a dependency."""
return databaseRouter and Schema Implementation
In the API file, we configure the routing under the Mongo Tasks tag:
api.py
from mongo_task import mongo_task_router
***
app.include_router(mongo_task_router, prefix="/mongo/tasks", tags=["Mongo Tasks"])You will notice that, although the structure is similar to the one used with SQLAlchemy, there are key differences in the methods and how we handle data.
From ORM to Native Driver
Previously, we used an ORM for relational databases. In MongoDB, being a document-oriented database, the nomenclature changes:
- Instead of traditional SQL methods, we use functions like insert_one, find, update_one, or delete_one.
- Data structure: MongoDB works natively with JSON-like structures. In the case of Python, this translates into the constant use of dictionaries.
CRUD Operations Step by Step
Let's start with the initial imports:
from fastapi import APIRouter, Body, Depends, HTTPException, status, Path
from pymongo.database import Database
from bson import ObjectId
from mongo_db import get_mongo_database
from schemes import TaskWrite
mongo_task_router = APIRouter()1. Create Task (POST)
We convert the model to a dictionary and insert the record. It is an asynchronous operation that returns the generated ID.
mongo_task.py
# CREATE
@mongo_task_router.post("/", status_code=status.HTTP_201_CREATED, summary="Create a new task")
async def add_task(
task: TaskWrite = Body(...),
db: Database = Depends(get_mongo_database),
):
"""
Creates a new task in the database.
"""
# task_dict = task.dict()
task_dict = task.model_dump()
insert_result = await db.tasks.insert_one(task_dict)
return {
"message": "Task added successfully",
"id": str(insert_result.inserted_id),
}_id and the ObjectId
When inserting your first task, you will notice that the identifier is not an incremental number (1, 2, 3...), but a strange hexadecimal string called ObjectId.
Why isn't it a sequential number?
Relational databases are usually centralized, which makes it easy to keep an exact count. However, MongoDB is designed to be decentralized.
If we had several MongoDB servers running in parallel, two servers might try to assign the ID "5" at the same time, generating a conflict. The ObjectId solves this by combining several factors:
- Timestamp: The exact time of creation (this guarantees it is unique in time).
- Process identifier and counter: Random data that ensures uniqueness even if two records are created in the same millisecond.
2. Read All Tasks (GET)
We use the find() method. It is important to convert the cursor returned by Mongo to a list using to_list() so that FastAPI can return it as a JSON.
mongo_task.py
# READ ALL
@mongo_task_router.get("/", summary="Get all tasks")
async def get_all_tasks(db: Database = Depends(get_mongo_database)):
"""
Gets all tasks from the 'tasks' collection.
"""
tasks_cursor = db.tasks.find()
return await tasks_cursor.to_list(length=None)3. Read a Specific Task (GET by ID)
Here we apply a double validation:
- Format validation: We check if the received string is a valid ObjectId. If it is not, we return a 400 error immediately to save resources.
- Search: If the format is correct but the record does not exist, we return a 404.
mongo_task.py
# READ ONE
@mongo_task_router.get("/{task_id}", summary="Get a task")
async def get_task(
task_id: str = Path(..., description="The ID of the task to retrieve"),
db: Database = Depends(get_mongo_database),
):
"""
Gets a single task by its ID.
"""
if not ObjectId.is_valid(task_id):
raise HTTPException(status_code=400, detail=f"Invalid ObjectId: {task_id}")
task = await db.tasks.find_one({"_id": ObjectId(task_id)})
if not task:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Task with id {task_id} not found"
)
return task4. Update and Delete (PUT / DELETE)
In the update, we send only the fields we want to modify. For deletion, we simply search by the _id and execute delete_one. If the affected document count is zero, we report that no action was taken.
mongo_task.py
# UPDATE
@mongo_task_router.put("/{task_id}", summary="Update a task")
async def update_task(
task_id: str = Path(..., description="The ID of the task to update"),
task: TaskWrite = Body(...),
db: Database = Depends(get_mongo_database),
):
"""
Updates the fields of a task.
"""
if not ObjectId.is_valid(task_id):
raise HTTPException(status_code=400, detail=f"Invalid ObjectId: {task_id}")
# update_data = task.dict(exclude_unset=True)
update_data = task.model_dump(exclude_none=True)
if not update_data:
raise HTTPException(status_code=400, detail="No data provided for update")
result = await db.tasks.update_one({"_id": ObjectId(task_id)}, {"$set": update_data})
if result.matched_count == 0:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail=f"Task with id {task_id} not found")
if result.modified_count == 1:
updated_task = await db.tasks.find_one({"_id": ObjectId(task_id)})
return updated_task
return {"message": "The task data was the same, no update was performed."}
# DELETE
@mongo_task_router.delete("/{task_id}", status_code=status.HTTP_204_NO_CONTENT, summary="Delete a task")
async def delete_task(
task_id: str = Path(..., description="The ID of the task to delete"),
db: Database = Depends(get_mongo_database),
):
"""
Deletes a task from the database.
"""
if not ObjectId.is_valid(task_id):
raise HTTPException(status_code=400, detail=f"Invalid ObjectId: {task_id}")
result = await db.tasks.delete_one({"_id": ObjectId(task_id)})
if result.deleted_count == 0:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail=f"Task with id {task_id} not found")
returnVerification in Mongo Compass
Once the tests are executed from the FastAPI interactive documentation (Swagger UI), you can refresh Mongo Compass. You will see how the documents are stored with their flexible JSON-type structure.
This workflow demonstrates how transparent switching from a relational schema to a NoSQL one can be if you have a good architecture.
Example of the Relational Schema in MongoDB
What do we want to do? Currently, we have the Task entity, but now I want to add a list of tags to it. In this case, it is a list of strings, although it could be anything else. Here is where the "weird" part begins: if this were a pure relational schema, the relationship wouldn't be so direct. Usually, we would define a property equal to an ID found in another entity. However, I didn't do it that way here because I want the tags to be simply embedded text.
Regarding the data model (Pydantic), we simply add the tags field, which is a list. We don't have an additional table for tags, and this is where I want you to reflect:
schemes.py
class Task(BaseModel):
name: str
description: Optional[str] = Field("No description",min_length=5)
status: StatusType
tags: List[str] = []
***
class TagsUpdate(BaseModel):
tags: List[str] Where are we going to store those tags if there isn't an independent table like in a relational schema?
In a relational model, we would necessarily have a table for tasks and another for tags, probably with an intermediate table.
In MongoDB, no. Here we break away from that traditional schema.
MongoDB works with JSON documents. And a JSON can contain an array. That array is precisely the tags field.
Adding and Removing Tags
In the mongo_task.py file is where the main changes are. The initial part (getting data and inserting) remains the same. To make it easier to read, I will focus on the tag manipulation part, which is a bit more abstract.
mongo_task.py
# ADD TAGS
@mongo_task_router.put("/{task_id}/tags/add", summary="Add tags to a task")
async def add_tags_to_task(
task_id: str = Path(..., description="The ID of the task to update"),
tags_update: TagsUpdate = Body(..., example={"tags": ["new_tag_1", "new_tag_2"]}),
db: Database = Depends(get_mongo_database),
):
"""
Adds one or more tags to an existing task.
Uses $addToSet to avoid duplicates in the tags array.
"""
if not ObjectId.is_valid(task_id):
raise HTTPException(status_code=400, detail=f"Invalid ObjectId: {task_id}")
result = await db.tasks.update_one(
{"_id": ObjectId(task_id)},
{"$addToSet": {"tags": {"$each": tags_update.tags}}}
)
if result.matched_count == 0:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail=f"Task with id {task_id} not found")
updated_task = await db.tasks.find_one({"_id": ObjectId(task_id)})
return updated_task
# REMOVE TAGS
@mongo_task_router.put("/{task_id}/tags/remove", summary="Remove tags from a task")
async def remove_tags_from_task(
task_id: str = Path(..., description="The ID of the task to update"),
tags_update: TagsUpdate = Body(..., example={"tags": ["tag_to_remove_1", "tag_to_remove_2"]}),
db: Database = Depends(get_mongo_database),
):
"""
Removes one or more tags from an existing task.
Uses $pull to remove instances of the specified tags.
"""
if not ObjectId.is_valid(task_id):
raise HTTPException(status_code=400, detail=f"Invalid ObjectId: {task_id}")
result = await db.tasks.update_one(
{"_id": ObjectId(task_id)},
{"$pull": {"tags": {"$in": tags_update.tags}}}
)
if result.matched_count == 0:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail=f"Task with id {task_id} not found")
updated_task = await db.tasks.find_one({"_id": ObjectId(task_id)})
return updated_taskThe initial part (getting data and inserting) remains the same. To make it easier to read, I will focus on the tag manipulation part, which is a bit more abstract.
1. Adding Tags ($addToSet Operator)
I have added methods to manipulate the tags of a task using its task_id. Note that we receive an array of data all at once. In a relational schema, if you wanted to add 10 tags, you would have to do 10 insertions or a complex operation between tables. Not here; everything is done in a single operation.
To update, we use update_one with an operator called $addToSet.
Why $addToSet? Because MongoDB works with JSON format, and a JSON can have an embedded JSONArray. This operator allows us to add elements to that array while ensuring they are not repeated, and it does so atomically.
To avoid manually iterating with a 'for' loop in our code (which would be inefficient), we use the $each modifier. This allows MongoDB to iterate internally and add all values at once.
$addToSet + $each
We use the $addToSet operator along with the $each modifier.
- $addToSet adds values without duplicating them.
- $each allows internal iteration over the received array.
This avoids having to:
- Make 10 API requests.
- Manually iterate through values.
- Execute multiple operations in the database.
2. Removing Tags ($pull Operator)
To remove tags, the logic is similar but we use the $pull operator.
- How does it work? We receive an array with the elements to remove.
- The $in operator: It is responsible for searching which of the tags we sent actually exist in the document.
- The $pull operator: Extracts them from the list.
Internally we use $pull along with $in, which allows comparing multiple values.
It is very flexible: if you send a tag that does not exist in the task, it simply skips it without throwing errors, like a silent conditional.
And if we check the database after performing some operations:
tasks (collection)
{
"_id": {
"$oid": "699c6024ba20f652d828f93c"
},
"name": "Task 1",
"description": "No description",
"status": "done",
"category_id": 1,
"user_id": 0,
"id": 0,
"tags": [
"Tag 1",
"Tag 3",
"Tag 4"
]
}
{
"_id": {
"$oid": "699d86e1701f772d79b49f03"
},
"name": "string",
"description": "No description",
"status": "done",
"tags": [
"Tag 2",
"Tag 3"
],
"id": "string"
}Schema Flexibility: Where is the tags table?
This is where I want you to ask yourself: Where are the tags? In the relational world, you would have a Tags table and perhaps a pivot table. Here they don't exist. The tags are embedded within the task's JSON itself.
This has advantages and disadvantages:
- The good: When you query a task, you already bring its tags "in one go" without needing to perform a JOIN. It is much faster for large data loads.
- The bad: There is no strict migration system. As you saw in the exercise, I modified the structure by adding the tags column and MongoDB didn't care; it simply started saving the new field in new or updated documents.
How tags are stored in MongoDB
Remember that MongoDB stores documents in JSON format:
{
"id": 1,
"title": "Task 1",
"tags": ["tag1", "tag2"]
}Here the tags are embedded within the task document itself. There is no separate table.
That has a big advantage: when we query the task, we already get all its tags in a single operation, without the need to JOIN.
This can be good or bad, depending on the use case, but in terms of performance and simplicity, it is quite efficient.
Relationships
Let's summarize quickly to reinforce the most important points. In the previous class we saw how to handle relationships in MongoDB and, although I didn't mention it explicitly, we are working with a Many-to-Many (N:N) relationship.
Why is it Many-to-Many? Let's see it in practice:
We have a task with tags 1, 3, and 4, and another task that has nothing. If we add "Tag 3" to that second task, now both share the same tag.
I know you might be wondering: "Isn't this a mess? There is no foreign key or relational indexes." This is where you must open your mind: MongoDB is not a relational database. The "table" scheme is broken to understand that everything is a JSON. MongoDB is, essentially, a manager that allows us to manipulate those JSONs with great flexibility. Even if the link is just text (string), if the value "Tag 3" is identical in both documents, a functional relationship exists.
Normalized vs. Denormalized Schemas
In MongoDB we can follow two paths to structure data:
1. Denormalized Schema (Embedded)
It is the one we are using. We save the value directly (the tag text) inside the task.
- Advantage: You don't need pivot tables or joins. When bringing the task, you already have all the information "in one fell swoop."
- Link: The value itself is the link. It is ideal if the data does not change frequently.
2. Normalized Schema (Referenced)
It is the equivalent of the relational schema. Instead of saving the text "Tag 3", we save the ID (or ObjectId) that references a document in a separate collection called tags.
- Structure: You would have an array of IDs called tag_ids.
- Usage: It is recommended when the related entity (the tag or the user) undergoes many updates. If you change the name of a tag in its own collection, the change is reflected everywhere because the tasks only point to the ID.
When to use each? (1:1, 1:N and N:N)
Everything depends on your business logic and update frequency:
- One-to-Many Relationship (1:N): Like categories. If a task belongs to a category, instead of an array, you would simply have a category field which can be the name or a reference.
- One-to-One Relationship (1:1): Example: Users and Addresses. Since an address is usually unique to a user, the most logical thing is for the address schema to live embedded (within) the user object. It doesn't make sense to create a separate collection for something that won't be shared.
Mass Operations: update_one vs update_many
If you use the denormalized schema and need to rename a tag in all tasks, you cannot use update_one. For that, update_many exists.
MongoDB offers these methods precisely because of its flexible nature. If you have 1,000 tasks with the tag "Old" and you want them to now say "New", you launch an update_many that looks for that value and replaces it in the entire collection at once.
Conclusion and Practice
The best way to understand this is by breaking the mental schema of Excel tables. I leave it to you as a task to research or ask your assistant to generate the code for an update_many following our schema. You have the source code in the repository to compare.
Try creating a model where addresses are an embedded object or try to simulate a 1:N relationship with categories. Only by practicing will you understand when the speed of denormalized data or the integrity of normalized data suits you better.
Source code:
https://github.com/libredesarrollo/fastapi-book-course-mongodb