Chapter 4 — NoSQL Databases

Lab Assignments

5 Labs on Real-World Use Cases

Each lab is grounded in a concrete Algerian business context. You are not implementing abstract examples — you are building real architectures.

⚡

Lab 1 · Redis

E-Commerce Caching System

redis:7-alpine

🛒 ShopFast DZ — Speed up an Algerian e-commerce platform where product pages take 3–4 seconds to load.

Hash, List, Set, Sorted Set for products & carts
Cache-Aside pattern with configurable TTL
User sessions with sliding expiration
Real-time best-seller leaderboard
Pipelines & MULTI/EXEC transactions

🏥

Lab 2 · MongoDB

Digital Medical Records

mongo:6.0

🏥 HealthCare DZ — Replace 12 relational tables with flexible patient document records.

Embedding vs Referencing modelling decisions
$jsonSchema validation & constraints
Advanced queries with projection
Medical aggregation pipelines
Compound indexes & explain()

⚙️

Lab 3 · Cassandra

IoT Electric Grid Sensors

cassandra:4.1

⚡ SmartGrid DZ — Ingest 10,000 measurements per minute from sensors monitoring Algeria's power grid.

Partition Key & Clustering Key for time series
Batch ingestion: 50,000 rows with Python
CQL: advanced time-range queries
TimeWindowCompactionStrategy (TWCS)
Per-row TTL for automatic archiving

🎓

Lab 4 · Neo4j

University Social Network

neo4j:5.13 + GDS

🎓 UniConnect DZ — Connect students across Algerian universities (USTHB, UMBB, USTO…).

Graph model: Student, Course, Club, Skill
Cypher: MATCH, MERGE, UNWIND, WITH
Shortest path with shortestPath()
GDS algorithms: Louvain, PageRank, Centrality
Contact recommendations (Jaccard similarity)

📊

Lab 5 · Benchmark

Comparative Performance Analysis

Redis + MongoDB + Cassandra

📊 Produce a decision-support report: which database should you choose for which workload?

Write benchmark: 100,000 records × 3 databases
Read latencies P50, P95, P99
Load test: 50 concurrent clients
Argued recommendation table

Theory Review

The 4 NoSQL Families

Each family solves a problem that relational databases handle poorly. The right choice depends on your data structure and your target query patterns.

⚡ Key-Value

The simplest structure. O(1) access by key. Ideal for caching, sessions, counters and leaderboards. Redis adds rich data structures: Hash, List, Set, Sorted Set.

🏥 Document

Data stored as flexible JSON documents. No rigid schema. Perfect for complex entities with variable attributes (medical records, user profiles, product catalogs).

⚙️ Column-Family

Organized by columns rather than rows. Massively parallel writes with no contention point. The golden rule: model tables around queries, not entities.

🎓 Graph

Relationships are first-class citizens. Multi-hop traversals (friends of friends, shortest path) are native and efficient — where SQL would require recursive JOINs.

Cache-Aside Pattern in practice (Lab 1)

Python · ex3_cache.py

def get_product_cached(r, product_id, ttl=600):
    cache_key = f"product_cache:{product_id}"
    cached = r.get(cache_key)

    if cached:  # Cache HIT → instant return (<1ms)
        return json.loads(cached)

    # Cache MISS → slow DB query (2s)
    product = slow_db_get_product(product_id)

    if product:  # Store in Redis with TTL
        r.setex(cache_key, ttl, json.dumps(product))

    return product

MongoDB Aggregation Pipeline (Lab 2)

MongoDB · ex3_aggregation.js

db.patients.aggregate([
  { $unwind: "$consultations" },
  { $match: { "address.city": "Algiers" } },
  { $group: {
      _id: "$consultations.diagnosis",
      count: { $sum: 1 },
      doctors: { $addToSet: "$consultations.doctor.name" }
  }},
  { $sort: { count: -1 } },
  { $limit: 10 }
])

Neo4j Cypher Query (Lab 4)

Cypher · ex3_graph_algorithms.cypher

// Shortest path between two students
MATCH p = shortestPath(
  (a:Student {name: "Ahmed"})-[:KNOWS*..10]-(b:Student {name: "Yasmina"})
)
RETURN [n IN nodes(p) | n.name + " (" + n.university + ")"] AS path,
       length(p) AS hops

Decision Guide

Which Database Should You Choose?

The choice of a NoSQL database depends on data access patterns, not on technology popularity.

Technology	Model	Strength	Ideal Use Case	Avoid When…
Redis	Key-Value	Sub-millisecond latency, rich in-memory structures	Cache, sessions, leaderboards, rate limiting	Large volumes of persistent data
MongoDB	Document	Flexible schema, powerful aggregation	Profiles, catalogs, complex records	Critical multi-collection transactions
Cassandra	Column-Family	Massively scalable writes, no SPOF	IoT, logs, time series, analytics	Ad-hoc queries (ALLOW FILTERING)
Neo4j	Graph	Native multi-hop traversals	Social networks, recommendations, fraud	Tabular data with no complex relationships

Frequently Asked Questions

Lab FAQ

The most common questions about the labs and their theoretical background.

A Redis Sorted Set automatically keeps elements sorted by score. Incrementing a product's score (ZINCRBY) and retrieving the top 10 (ZREVRANGE) are both O(log N) operations. Doing the same in SQL would require a table, an index, a COUNT, and an ORDER BY — far more expensive on writes.

The general rule:

Embedding when data is always accessed together, the sub-document has a bounded size (no unbounded growth), and atomicity is required.
Referencing when the sub-document can grow indefinitely (e.g. all orders for a customer), data is accessed independently, or it is shared across multiple parent documents.

In HealthCare DZ: recent consultations are embedded (frequent access), but lab analyses are referenced (potentially unlimited volume).

ALLOW FILTERING forces Cassandra to scan ALL partitions in the cluster to find matching rows — it is a distributed full table scan. Over 10 million rows spread across 10 nodes, it reads 100% of the data to perhaps return 10 rows. In production, this saturates nodes and degrades performance for all concurrent queries.

The solution: create a dedicated table for each frequent query, with the correct Partition Key.

In SQL, finding "friends of friends" requires 2 JOINs on the same table. For "friends within 6 degrees", that's 6 recursive JOINs — the execution plan grows exponentially. Neo4j stores relationships as direct pointers: traversing a million relationships is as fast as traversing ten. Neo4j's shortestPath() uses an optimised BFS that only visits the necessary nodes.

A hot partition receives far more traffic than others. If you choose only "city" as the Partition Key, the node hosting the "Algiers" partition handles 3× more traffic than "Tamanrasset". That node becomes a bottleneck and negates the benefit of horizontal distribution.

Solution in SmartGrid DZ: use (sensor_id, date) as the Partition Key — data is distributed evenly across all nodes.

Key metrics to understand:

P50 (median): typical latency of a normal request
P95: 95% of requests are faster than this value
P99: the experience of your unluckiest users
Throughput (req/s): maximum capacity under load

A P99 of 500ms while P50 is 5ms indicates tail latencies — often caused by GC pauses, compaction, or hot spots. This is what you will analyse in your REPORT.md.

Getting Started

Up and Running in 4 Steps

The entire environment is containerised. Nothing needs to be installed manually.

Accept the GitHub Classroom assignment

Click the link provided by your instructor. GitHub Classroom automatically creates your personal repository with the starter code.

Clone and launch the environment

bash

git clone https://github.com/YOUR_ORG/YOUR_REPO.git
cd nosql-tp-chapter4
docker compose up -d
docker compose ps  # Verify all services are "Up"

Verify connections

Redis UI → localhost:8001 · MongoDB UI → localhost:8081 · Neo4j Browser → localhost:7474 · Cassandra via cqlsh inside the container.

Start with Lab 1 and run the tests