Databases and Storage

Relational vs NoSQL Databases
Database Design Principles
CAP Theorem

Relational vs NoSQL Databases

Relational Databases

Structured Data: Tables with rows, columns, and fixed schema

Examples: MySQL, PostgreSQL

ACID Properties: Ensuring data integrity in transactions

Atomicity: All or nothing
Consistency: Data follows rules
Isolation: Transactions don't interfere
Durability: Data persists after commit

SQL: Uses SQL for queries with joins and functions

Vertical Scaling: Need powerful CPU/RAM or adding more servers

Use Cases: Structured data, complex relationships, transactional consistency

NoSQL Databases

Flexible Schema: Schema-less or dynamic schema, useful for semi-structured or unstructured data

Examples: MongoDB, DynamoDB, Redis, Neo4j, ClickHouse

Horizontal Scalability: Easy to distribute across multiple servers

Eventual Consistency: Many NoSQL databases follow eventual consistency rather than strict ACID principles. This increases scalability and performance but can cause data consistency delays

Query Flexibility: Query structure is type-specific

Document DBs use JSON-like queries
Graph DBs use traversal queries

Use Cases: Rapidly changing data, unstructured data

Comparison Table

Aspect	Relational Databases	NoSQL Databases
Data Model	Tabular (Rows & Columns)	Document, Key-Value, Wide-Column, Graph
Schema	Fixed/Predefined Schema	Dynamic/Schema-less
Scalability	Vertical Scaling (Upgrading hardware)	Horizontal Scaling (Adding more nodes)
Query Language	SQL (Structured Query Language)	Varies (e.g., MongoDB query language, CQL for Cassandra)
Consistency	Strong consistency with ACID transactions	Eventual consistency (in many cases)
Ideal For	Structured data and complex relationships	Big data, unstructured data, and high throughput scenarios

Example 1: E-commerce Platform using Relational Database

Scenario

You have an e-commerce website where users view products, place orders, and make payment transactions. Here data is structured and transactions are critical.

Why Relational?

ACID Transactions:

Order placement, payment processing, and inventory updates need to be atomic
If payment process fails, the entire transaction gets rolled back

Structured Data & Joins:

Clear relationships exist between Users, Orders, Products, and Payments
Complex joins help you easily retrieve user order history or product details

Design Details

Tables:

Users: UserID, Name, Email, Address, etc.
Products: ProductID, Name, Description, Price, Stock, etc.
Orders: OrderID, UserID (Foreign Key), OrderDate, Status, etc.
Order_Items: OrderItemID, OrderID (Foreign Key), ProductID (Foreign Key), Quantity, Price, etc.
Payments: PaymentID, OrderID (Foreign Key), PaymentMethod, Amount, PaymentStatus, etc.

Transaction Example: When a user places an order:

A new order record is created
Order_Items are inserted
Payment process is initiated
Inventory is updated

All these happen in one transaction where data consistency is maintained.

Scenario

You have a social media platform where users share posts, make comments, and give likes. Data is semi-structured and changes rapidly.

Why NoSQL?

Flexible Schema:

Each post can have different information (images, videos, text, comments)
Schema-less nature allows you to easily make changes and updates without rigid structure

Horizontal Scalability:

Millions of users continuously write and read data
NoSQL databases (like MongoDB, Cassandra) easily distribute data across multiple nodes

Eventual Consistency:

Small delays in real-time feeds are acceptable
If a like or comment doesn't update immediately, it doesn't majorly impact the system

Design Details

Document Store Example (MongoDB):

Collection: Posts

{
  "_id": "postId123",
  "userId": "user456",
  "content": "This is a post",
  "media": ["image1.jpg", "video1.mp4"],
  "comments": [
    {
      "userId": "user789",
      "comment": "Nice post!",
      "timestamp": "2025-02-20T10:00:00Z"
    }
    // ... more comments
  ],
  "likes": 125,
  "timestamp": "2025-02-20T09:45:00Z"
}

Key Points:

Each post can have embedded comments inside it
Data structure is dynamic; you can easily add extra fields in the future

Scalability & Caching:

Horizontal Sharding: Distribute data across different shards to handle large volumes
Caching Layer (Redis): Cache popular posts and user feeds to improve real-time performance

Real-Time Feed Generation:

Quickly aggregate users' follower graphs and recent activities to generate personalized feeds
NoSQL's flexible data model and high write throughput are ideal for this use case

Database Design Principles

Overview

Three key principles: Normalization, Denormalization, and Indexing.

Refer to SQL notes → sql101 for better understanding.

Normalization

Purpose: Reduce data redundancy and maintain data integrity Types: 1NF, 2NF, 3NF, Boyce-Codd NF

Denormalization

Purpose: Opposite of normalization. Sometimes we need to trade-off to reduce query complexity and simplify it. Also makes data retrieval fast and avoids joins.

Indexing

Definition: A data structure technique where data is quickly located in database tables. Faster data retrieval happens by providing direct pointers to the data.

Trade-offs:

Pros:

Effective for queries that use WHERE clause or JOIN operations

Cons:

Frequent writes, updates, or inserts perform slowly because indexes need to be updated

Types of Indexes:

B-tree Index: Balanced tree structure
Hash Index: Exact match queries
Composite Index: Indexing multiple columns
Unique Index: Does not allow duplicate values

1. Normalization in E-commerce Order Management

Scenario: Your e-commerce platform manages orders, customers, products, and payments. Data integrity and consistency are most important because any inconsistency can cause financial loss or customer dissatisfaction.

Implementation Details:

Tables Structure (Normalized):

Customers Table: CustomerID, Name, Email, etc.
Addresses Table: AddressID, CustomerID (foreign key), Street, City, State, etc.
Products Table: ProductID, Name, Description, Price, Stock, etc.
Orders Table: OrderID, CustomerID (foreign key), OrderDate, TotalAmount, etc.
OrderItems Table: OrderItemID, OrderID (foreign key), ProductID (foreign key), Quantity, UnitPrice, etc.
Payments Table: PaymentID, OrderID (foreign key), PaymentMethod, PaymentStatus, etc.

Benefits:

Reduced Redundancy: Customer details and addresses are stored in separate tables, avoiding duplicate data
Data Integrity: When updating (like address change), only one table needs to be updated, ensuring consistency

2. Denormalization for Read-Heavy Features (Product Recommendations)

Scenario: E-commerce platforms have high read operations—like personalized product recommendations, fast search results, and product reviews. Here multiple joins (from normalized structure) can slow down query performance.

Implementation Details:

Denormalized Data Structure:

Product Catalog Cache Document (NoSQL):

{
  "ProductID": "P123",
  "Name": "Smartphone X",
  "Price": 699,
  "Category": "Electronics",
  "AverageRating": 4.5,
  "Reviews": [
    {"UserID": "U1", "Review": "Great phone!", "Rating": 5},
    {"UserID": "U2", "Review": "Worth the price.", "Rating": 4}
  ],
  "Recommendations": ["P124", "P125"]
}

Benefits:

Fast Reads: All product details, reviews, and recommendations come from one document—no joins needed
Performance Optimization: Pre-aggregated data for heavy read operations improves response time

3. Indexing for Faster Query Performance

Scenario: E-commerce website users frequently do product search, order history lookup, and filtering operations. Indexes are critical for fast retrieval.

Implementation Details:

Examples of Indexing:

B-Tree Index on OrderDate:
- If queries in Orders table use ORDER BY OrderDate or date range filters, creating an index on this column will retrieve data quickly
Composite Index on (Category, Price):
- If users in Products table want to filter by category and then see products in a price range, composite index boosts query performance
Unique Index on Email:
- To ensure no duplicate email gets registered in Customers table, unique index is useful

Benefits:

Quick Data Access: Indexes provide pointers directly to data rows, making search and filter operations very fast
Optimized Query Execution: Using indexes in joins and WHERE clauses improves overall performance

CAP Theorem

CAP theorem is a fundamental principle of distributed systems design. It states that three important properties—Consistency, Availability, and Partition Tolerance—can't be ensured simultaneously for a distributed data store. Only 2 of them can work together.

The Three Properties

Consistency (C):

Every read operation should return the latest write value or an error
If data is updated, it should immediately reflect on every node

Availability (A):

Every request should get a response regardless of whether it's latest data or stale data
System should be accessible without any downtime 24/7

Partition Tolerance (P):

System should work regardless of network partitions or communication failures
If any nodes lose network connectivity, the overall system should not be affected

Scenario: Online Retail Platform During a Major Sale Event

Context: Think of an online retailer (like Amazon) that has multiple data centers across different regions. This platform serves millions of users during major sale events (e.g., Black Friday).

Network Partition Situation: Suppose an unexpected network issue temporarily breaks connectivity between two data centers (or their clusters). Now the system has to decide:

1. Consistency over Availability:

What Happens:

System ensures all data centers have the same accurate data (like inventory count) before processing any orders
If data centers can't sync, some regions won't process orders or will have delays

Pros:

Data integrity is maintained; no overselling or inventory mismatch

Cons:

Customers face delays or errors completing transactions, which can hurt the sale experience

2. Availability over Consistency:

What Happens:

System keeps service available and continues processing orders, even if some data (like real-time inventory) is slightly outdated
Data eventually synchronizes once the network partition is resolved

Pros:

Customers can shop without interruption and place orders, improving user experience

Cons:

Data inconsistency can happen—for example, a product's inventory count might be temporarily incorrect, risking overselling

Real-World Implications

Generally, network partitions are unpredictable, so practically you can't compromise partition tolerance. Therefore, when designing distributed systems, you often have to choose a trade-off between consistency and availability.

Consistency Choice:

Banking systems and financial transactions need consistency to be critical. But in online retail, some delay might be acceptable if it keeps data accurate

Availability Choice:

E-commerce platforms generally prioritize availability during high traffic events, so customer experience isn't impacted. But in this case, some inconsistency (like delayed inventory updates) is tolerated, with the expectation that data eventually becomes consistent

Table of Contents​

Relational vs NoSQL Databases​

Relational Databases​

NoSQL Databases​

Comparison Table​

Example 1: E-commerce Platform using Relational Database​

Scenario​

Why Relational?​

Design Details​

Example 2: Social Media Feed using NoSQL Database​

Scenario​

Why NoSQL?​

Design Details​

Database Design Principles​

Overview​

Normalization​

Denormalization​

Indexing​

1. Normalization in E-commerce Order Management​

2. Denormalization for Read-Heavy Features (Product Recommendations)​

3. Indexing for Faster Query Performance​

CAP Theorem​

The Three Properties​

Scenario: Online Retail Platform During a Major Sale Event​

Real-World Implications​

Table of Contents

Relational vs NoSQL Databases

Relational Databases

NoSQL Databases

Comparison Table

Example 1: E-commerce Platform using Relational Database

Scenario

Why Relational?

Design Details

Example 2: Social Media Feed using NoSQL Database

Scenario

Why NoSQL?

Design Details

Database Design Principles

Overview

Normalization

Denormalization

Indexing

1. Normalization in E-commerce Order Management

2. Denormalization for Read-Heavy Features (Product Recommendations)

3. Indexing for Faster Query Performance

CAP Theorem

The Three Properties

Scenario: Online Retail Platform During a Major Sale Event

Real-World Implications