Database Design Best Practices: From Schema to Performance
Introduction
Database design is foundational to application performance and scalability. Poor database design leads to slow queries, data inconsistencies, and scaling challenges. Here's how to design databases right.
Schema Design Principles
1. Normalization
What is Normalization? Organizing data to reduce redundancy and improve data integrity.
Normal Forms:
- 1NF: Eliminate duplicate columns
- 2NF: Remove partial dependencies
- 3NF: Remove transitive dependencies
When to Denormalize:
- Read-heavy workloads
- Performance critical queries
- Reporting/analytics
2. Primary Keys
Best Practices:
- Use surrogate keys (auto-increment, UUID)
- Keep primary keys simple
- Avoid composite keys when possible
- Use UUIDs for distributed systems
3. Foreign Keys
Always Use Foreign Keys:
- Maintain referential integrity
- Prevent orphaned records
- Database-level constraints
Cascade Rules:
- CASCADE: Delete related records
- SET NULL: Set foreign key to null
- RESTRICT: Prevent deletion
Indexing Strategy
Types of Indexes
1. Primary Index
- Automatically created for primary key
- Unique and clustered
2. Secondary Index
- Created on frequently queried columns
- Improves SELECT performance
- Slows down INSERT/UPDATE
3. Composite Index
- Multiple columns
- Order matters (leftmost prefix)
4. Unique Index
- Enforces uniqueness
- Can be on multiple columns
Index Best Practices
Index These:
- Foreign keys
- Frequently filtered columns
- Columns in WHERE clauses
- Columns in JOIN conditions
- Columns in ORDER BY
Don't Over-Index:
- Each index slows writes
- Takes storage space
- Requires maintenance
Example:
-- Good: Index on foreign key
CREATE INDEX idx_user_id ON orders(user_id);
-- Good: Composite index for common query pattern
CREATE INDEX idx_status_date ON orders(status, created_at);
-- Bad: Index on rarely queried column
CREATE INDEX idx_color ON products(color); -- if color is rarely filtered
Query Optimization
1. Avoid SELECT *
Bad:
SELECT * FROM users;
Good:
SELECT id, name, email FROM users;
2. Use EXPLAIN
Analyze Query Plans:
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';
Look For:
- Full table scans
- Missing indexes
- Expensive operations
3. Avoid N+1 Queries
Bad:
const users = await db.users.findAll();
for (const user of users) {
const orders = await db.orders.findAll({ where: { userId: user.id } });
}
Good:
const users = await db.users.findAll({
include: [{ model: Order }]
});
4. Use JOINs Efficiently
Prefer JOINs over Subqueries:
-- Good
SELECT u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id;
-- Less efficient
SELECT name, (SELECT total FROM orders WHERE user_id = u.id)
FROM users u;
Performance Tuning
1. Connection Pooling
Why?
- Creating connections is expensive
- Reuse connections efficiently
Configuration:
const pool = new Pool({
max: 20, // Maximum connections
min: 5, // Minimum connections
idleTimeoutMillis: 30000
});
2. Query Caching
When to Cache:
- Expensive queries
- Frequently accessed data
- Data that doesn't change often
Implementation:
- Application-level caching (Redis)
- Database query cache
- CDN for static data
3. Partitioning
When to Partition:
- Very large tables
- Time-based data
- Geographic data
Types:
- Range partitioning
- Hash partitioning
- List partitioning
4. Read Replicas
Benefits:
- Distribute read load
- Improve performance
- Geographic distribution
Use Cases:
- Read-heavy workloads
- Reporting/analytics
- Backup queries
Data Types
Choose Appropriate Types:
- Use smallest type that fits
- Use INT instead of BIGINT when possible
- Use VARCHAR with appropriate length
- Use DATE/TIME types, not strings
Example:
-- Good
created_at TIMESTAMP
status ENUM('active', 'inactive')
price DECIMAL(10, 2)
-- Bad
created_at VARCHAR(50)
status VARCHAR(20)
price VARCHAR(20)
Migration Strategy
Best Practices:
- Version control migrations
- Test migrations in staging
- Backward compatible changes
- Rollback strategy
- Zero-downtime migrations
Tools:
- Liquibase
- Flyway
- Rails migrations
- Custom scripts
Monitoring
Key Metrics:
- Query execution time
- Slow query log
- Connection pool usage
- Database size
- Index usage
Tools:
- Database monitoring tools
- APM solutions
- Custom dashboards
Conclusion
Good database design is crucial for application performance and scalability. Focus on proper normalization, strategic indexing, query optimization, and continuous monitoring. Remember: premature optimization is the root of all evil, but good design prevents most problems.
*Need help optimizing your database? [Contact us](/schedule-appointment) for a database audit.*