Chapter 4 · GitHub Classroom

NoSQL
Databases

5 hands-on lab assignments built around real Algerian use cases — Redis, MongoDB, Cassandra, Neo4j and a comparative performance benchmark.

5
Lab Assignments
100
Total Points
4
Technologies
13h
Estimated Time
Explore the Labs ↓
Scroll
Lab Assignments

5 Labs on Real-World Use Cases

Each lab is grounded in a concrete Algerian business context. You are not implementing abstract examples — you are building real architectures.

Lab 1 · Redis
E-Commerce Caching System
redis:7-alpine

🛒 ShopFast DZ — Speed up an Algerian e-commerce platform where product pages take 3–4 seconds to load.

  • Hash, List, Set, Sorted Set for products & carts
  • Cache-Aside pattern with configurable TTL
  • User sessions with sliding expiration
  • Real-time best-seller leaderboard
  • Pipelines & MULTI/EXEC transactions
🏥
Lab 2 · MongoDB
Digital Medical Records
mongo:6.0

🏥 HealthCare DZ — Replace 12 relational tables with flexible patient document records.

  • Embedding vs Referencing modelling decisions
  • $jsonSchema validation & constraints
  • Advanced queries with projection
  • Medical aggregation pipelines
  • Compound indexes & explain()
⚙️
Lab 3 · Cassandra
IoT Electric Grid Sensors
cassandra:4.1

⚡ SmartGrid DZ — Ingest 10,000 measurements per minute from sensors monitoring Algeria's power grid.

  • Partition Key & Clustering Key for time series
  • Batch ingestion: 50,000 rows with Python
  • CQL: advanced time-range queries
  • TimeWindowCompactionStrategy (TWCS)
  • Per-row TTL for automatic archiving
🎓
Lab 4 · Neo4j
University Social Network
neo4j:5.13 + GDS

🎓 UniConnect DZ — Connect students across Algerian universities (USTHB, UMBB, USTO…).

  • Graph model: Student, Course, Club, Skill
  • Cypher: MATCH, MERGE, UNWIND, WITH
  • Shortest path with shortestPath()
  • GDS algorithms: Louvain, PageRank, Centrality
  • Contact recommendations (Jaccard similarity)
📊
Lab 5 · Benchmark
Comparative Performance Analysis
Redis + MongoDB + Cassandra

📊 Produce a decision-support report: which database should you choose for which workload?

  • Write benchmark: 100,000 records × 3 databases
  • Read latencies P50, P95, P99
  • Load test: 50 concurrent clients
  • Argued recommendation table
Theory Review

The 4 NoSQL Families

Each family solves a problem that relational databases handle poorly. The right choice depends on your data structure and your target query patterns.

⚡ Key-Value

The simplest structure. O(1) access by key. Ideal for caching, sessions, counters and leaderboards. Redis adds rich data structures: Hash, List, Set, Sorted Set.

🏥 Document

Data stored as flexible JSON documents. No rigid schema. Perfect for complex entities with variable attributes (medical records, user profiles, product catalogs).

⚙️ Column-Family

Organized by columns rather than rows. Massively parallel writes with no contention point. The golden rule: model tables around queries, not entities.

🎓 Graph

Relationships are first-class citizens. Multi-hop traversals (friends of friends, shortest path) are native and efficient — where SQL would require recursive JOINs.

Cache-Aside Pattern in practice (Lab 1)

Python · ex3_cache.py
def get_product_cached(r, product_id, ttl=600):
    cache_key = f"product_cache:{product_id}"
    cached = r.get(cache_key)

    if cached:  # Cache HIT → instant return (<1ms)
        return json.loads(cached)

    # Cache MISS → slow DB query (2s)
    product = slow_db_get_product(product_id)

    if product:  # Store in Redis with TTL
        r.setex(cache_key, ttl, json.dumps(product))

    return product

MongoDB Aggregation Pipeline (Lab 2)

MongoDB · ex3_aggregation.js
db.patients.aggregate([
  { $unwind: "$consultations" },
  { $match: { "address.city": "Algiers" } },
  { $group: {
      _id: "$consultations.diagnosis",
      count: { $sum: 1 },
      doctors: { $addToSet: "$consultations.doctor.name" }
  }},
  { $sort: { count: -1 } },
  { $limit: 10 }
])

Neo4j Cypher Query (Lab 4)

Cypher · ex3_graph_algorithms.cypher
// Shortest path between two students
MATCH p = shortestPath(
  (a:Student {name: "Ahmed"})-[:KNOWS*..10]-(b:Student {name: "Yasmina"})
)
RETURN [n IN nodes(p) | n.name + " (" + n.university + ")"] AS path,
       length(p) AS hops
Decision Guide

Which Database Should You Choose?

The choice of a NoSQL database depends on data access patterns, not on technology popularity.

Technology Model Strength Ideal Use Case Avoid When…
Redis Key-Value Sub-millisecond latency, rich in-memory structures Cache, sessions, leaderboards, rate limiting Large volumes of persistent data
MongoDB Document Flexible schema, powerful aggregation Profiles, catalogs, complex records Critical multi-collection transactions
Cassandra Column-Family Massively scalable writes, no SPOF IoT, logs, time series, analytics Ad-hoc queries (ALLOW FILTERING)
Neo4j Graph Native multi-hop traversals Social networks, recommendations, fraud Tabular data with no complex relationships
Frequently Asked Questions

Lab FAQ

The most common questions about the labs and their theoretical background.

A Redis Sorted Set automatically keeps elements sorted by score. Incrementing a product's score (ZINCRBY) and retrieving the top 10 (ZREVRANGE) are both O(log N) operations. Doing the same in SQL would require a table, an index, a COUNT, and an ORDER BY — far more expensive on writes.

The general rule:

  • Embedding when data is always accessed together, the sub-document has a bounded size (no unbounded growth), and atomicity is required.
  • Referencing when the sub-document can grow indefinitely (e.g. all orders for a customer), data is accessed independently, or it is shared across multiple parent documents.

In HealthCare DZ: recent consultations are embedded (frequent access), but lab analyses are referenced (potentially unlimited volume).

ALLOW FILTERING forces Cassandra to scan ALL partitions in the cluster to find matching rows — it is a distributed full table scan. Over 10 million rows spread across 10 nodes, it reads 100% of the data to perhaps return 10 rows. In production, this saturates nodes and degrades performance for all concurrent queries.

The solution: create a dedicated table for each frequent query, with the correct Partition Key.

In SQL, finding "friends of friends" requires 2 JOINs on the same table. For "friends within 6 degrees", that's 6 recursive JOINs — the execution plan grows exponentially. Neo4j stores relationships as direct pointers: traversing a million relationships is as fast as traversing ten. Neo4j's shortestPath() uses an optimised BFS that only visits the necessary nodes.

A hot partition receives far more traffic than others. If you choose only "city" as the Partition Key, the node hosting the "Algiers" partition handles 3× more traffic than "Tamanrasset". That node becomes a bottleneck and negates the benefit of horizontal distribution.

Solution in SmartGrid DZ: use (sensor_id, date) as the Partition Key — data is distributed evenly across all nodes.

Key metrics to understand:

  • P50 (median): typical latency of a normal request
  • P95: 95% of requests are faster than this value
  • P99: the experience of your unluckiest users
  • Throughput (req/s): maximum capacity under load

A P99 of 500ms while P50 is 5ms indicates tail latencies — often caused by GC pauses, compaction, or hot spots. This is what you will analyse in your REPORT.md.

Getting Started

Up and Running in 4 Steps

The entire environment is containerised. Nothing needs to be installed manually.

1

Accept the GitHub Classroom assignment

Click the link provided by your instructor. GitHub Classroom automatically creates your personal repository with the starter code.

2

Clone and launch the environment

bash
git clone https://github.com/YOUR_ORG/YOUR_REPO.git
cd nosql-tp-chapter4
docker compose up -d
docker compose ps  # Verify all services are "Up"
3

Verify connections

Redis UI → localhost:8001 · MongoDB UI → localhost:8081 · Neo4j Browser → localhost:7474 · Cassandra via cqlsh inside the container.

4

Start with Lab 1 and run the tests

bash
cd TP1_KeyValue/starter
pip install redis pytest
# Implement ex1_structures.py
pytest tests/test_ex1.py -v

📋 Deliverables reminder

For each lab, you must submit: completed code inside starter/, a REPORT.md file with your analysis, and regular commits with meaningful messages. Automated tests run on every push via GitHub Actions.