What is Data Modelling?
Data modelling is the process of defining how your application data is structured, stored, and related in the database. A well-designed data model directly affects your application's performance, scalability, and ease of development. Poor data modelling leads to slow queries, data inconsistencies, and complex code that is hard to maintain.
In MongoDB, data modelling involves deciding:
- What schemas (document shapes) to define for each entity
- How entities relate to each other (embedding vs referencing)
- What indexes to add for fast queries
- What validation rules to enforce at the database level
---
Normalisation vs Denormalisation
| Concept | Normalisation | Denormalisation |
|---|---|---|
| Definition | Split data into separate collections, use references | Store related data together in one document |
| Storage | Less duplication | More duplication |
| Query speed | Needs $lookup (joins) — slower | Fast single-document reads |
| Update complexity | Update one place, consistent | Must update multiple places |
| Use case | Frequently updated data | Frequently read, rarely updated data |
| MongoDB example | Post with author: ObjectId | Post with author: { name, avatar } embedded |
MongoDB favours denormalisation for data that is read together frequently. Reference (normalise) data that changes often or is shared across many documents.
---
Mongoose Schema and Model
Schema defines the shape of documents in a collection — fields, types, validation, defaults. Model is a JavaScript class created from a Schema. It provides methods to query and manipulate documents: Model.find(), Model.create(), Model.findByIdAndUpdate(), etc.
import mongoose from 'mongoose';
const { Schema, model } = mongoose;
const userSchema = new Schema({ name: String }, { timestamps: true });
const User = model('User', userSchema);
// Collection name: 'users' (Mongoose pluralises and lowercases automatically)
---
Field Types in Mongoose
| Type | Usage | Example |
|---|---|---|
String | Text data | name: String |
Number | Integers and floats | price: Number |
Boolean | true/false | isPublished: Boolean |
Date | Date/time values | createdAt: Date |
mongoose.Schema.Types.ObjectId | References to other documents | owner: ObjectId |
Array | List of values or subdocuments | tags: [String] |
Mixed | Any type (use sparingly) | metadata: Schema.Types.Mixed |
Buffer | Binary data | file: Buffer |
Map | Key-value pairs | preferences: Map |
---
Validators in Mongoose
const productSchema = new Schema({
name: {
type: String,
required: [true, 'Product name is required'],
trim: true,
minlength: [3, 'Name must be at least 3 characters'],
maxlength: [100, 'Name cannot exceed 100 characters'],
},
price: {
type: Number,
required: true,
min: [0, 'Price cannot be negative'],
},
category: {
type: String,
enum: {
values: ['electronics', 'clothing', 'food', 'books'],
message: '{VALUE} is not a valid category',
},
},
email: {
type: String,
match: [/^[\w.-]+@[\w.-]+\.\w+$/, 'Invalid email format'],
},
rating: {
type: Number,
default: 0,
min: 0,
max: 5,
},
});
---
Relationships: Embedded vs Referenced
Embedded Documents (Denormalised):
// Address embedded inside User — good for one-to-one stable data
const userSchema = new Schema({
name: String,
address: {
street: String,
city: String,
state: String,
pincode: String,
},
});
Referenced Documents (Normalised):
// Post references User — good for one-to-many with updates
const postSchema = new Schema({
title: String,
content: String,
author: { type: Schema.Types.ObjectId, ref: 'User' },
});
// Populate to get full author data:
const post = await Post.findById(id).populate('author', 'name email avatar');
| Criterion | Embed | Reference |
|---|---|---|
| Data changes rarely | ✅ Good | OK |
| Data is always accessed together | ✅ Ideal | Needs populate() |
| Many-to-many relationships | ❌ Avoid | ✅ Use references |
| Sub-document count can be unlimited | ❌ 16MB doc limit | ✅ Safe |
| Query single entity frequently | ✅ Fast | Needs join |
---
Indexing
Indexes dramatically speed up queries but slow down writes (index must be updated on insert/update). Add indexes on fields you query or sort by frequently.
// Single field index
userSchema.index({ email: 1 }); // Ascending
userSchema.index({ username: 1 }, { unique: true });
// Compound index
postSchema.index({ author: 1, createdAt: -1 }); // Posts by author, newest first
// Text index for search
productSchema.index({ name: 'text', description: 'text' });
---
Virtuals and Timestamps
Virtuals are computed properties not stored in MongoDB:
userSchema.virtual('fullName').get(function () {
return `${this.firstName} ${this.lastName}`;
});
Timestamps automatically add createdAt and updatedAt fields:
const schema = new Schema({ ... }, { timestamps: true });
---
Pre/Post Hooks
Hooks (middleware) run before or after Mongoose operations:
// Hash password before saving
userSchema.pre('save', async function (next) {
if (!this.isModified('password')) return next();
this.password = await bcrypt.hash(this.password, 10);
next();
});
// Log after deletion
userSchema.post('findOneAndDelete', function (doc) {
console.log(`User ${doc.email} was deleted`);
});