5 hands-on lab assignments built around real Algerian use cases — Redis, MongoDB, Cassandra, Neo4j and a comparative performance benchmark.
Each lab is grounded in a concrete Algerian business context. You are not implementing abstract examples — you are building real architectures.
🛒 ShopFast DZ — Speed up an Algerian e-commerce platform where product pages take 3–4 seconds to load.
🏥 HealthCare DZ — Replace 12 relational tables with flexible patient document records.
⚡ SmartGrid DZ — Ingest 10,000 measurements per minute from sensors monitoring Algeria's power grid.
🎓 UniConnect DZ — Connect students across Algerian universities (USTHB, UMBB, USTO…).
📊 Produce a decision-support report: which database should you choose for which workload?
Each family solves a problem that relational databases handle poorly. The right choice depends on your data structure and your target query patterns.
The simplest structure. O(1) access by key. Ideal for caching, sessions, counters and leaderboards. Redis adds rich data structures: Hash, List, Set, Sorted Set.
Data stored as flexible JSON documents. No rigid schema. Perfect for complex entities with variable attributes (medical records, user profiles, product catalogs).
Organized by columns rather than rows. Massively parallel writes with no contention point. The golden rule: model tables around queries, not entities.
Relationships are first-class citizens. Multi-hop traversals (friends of friends, shortest path) are native and efficient — where SQL would require recursive JOINs.
def get_product_cached(r, product_id, ttl=600): cache_key = f"product_cache:{product_id}" cached = r.get(cache_key) if cached: # Cache HIT → instant return (<1ms) return json.loads(cached) # Cache MISS → slow DB query (2s) product = slow_db_get_product(product_id) if product: # Store in Redis with TTL r.setex(cache_key, ttl, json.dumps(product)) return product
db.patients.aggregate([
{ $unwind: "$consultations" },
{ $match: { "address.city": "Algiers" } },
{ $group: {
_id: "$consultations.diagnosis",
count: { $sum: 1 },
doctors: { $addToSet: "$consultations.doctor.name" }
}},
{ $sort: { count: -1 } },
{ $limit: 10 }
])
// Shortest path between two students MATCH p = shortestPath( (a:Student {name: "Ahmed"})-[:KNOWS*..10]-(b:Student {name: "Yasmina"}) ) RETURN [n IN nodes(p) | n.name + " (" + n.university + ")"] AS path, length(p) AS hops
The choice of a NoSQL database depends on data access patterns, not on technology popularity.
| Technology | Model | Strength | Ideal Use Case | Avoid When… |
|---|---|---|---|---|
| Redis | Key-Value | Sub-millisecond latency, rich in-memory structures | Cache, sessions, leaderboards, rate limiting | Large volumes of persistent data |
| MongoDB | Document | Flexible schema, powerful aggregation | Profiles, catalogs, complex records | Critical multi-collection transactions |
| Cassandra | Column-Family | Massively scalable writes, no SPOF | IoT, logs, time series, analytics | Ad-hoc queries (ALLOW FILTERING) |
| Neo4j | Graph | Native multi-hop traversals | Social networks, recommendations, fraud | Tabular data with no complex relationships |
The most common questions about the labs and their theoretical background.
A Redis Sorted Set automatically keeps elements sorted by score. Incrementing a product's score (ZINCRBY) and retrieving the top 10 (ZREVRANGE) are both O(log N) operations. Doing the same in SQL would require a table, an index, a COUNT, and an ORDER BY — far more expensive on writes.
The general rule:
In HealthCare DZ: recent consultations are embedded (frequent access), but lab analyses are referenced (potentially unlimited volume).
ALLOW FILTERING forces Cassandra to scan ALL partitions in the cluster to find matching rows — it is a distributed full table scan. Over 10 million rows spread across 10 nodes, it reads 100% of the data to perhaps return 10 rows. In production, this saturates nodes and degrades performance for all concurrent queries.
The solution: create a dedicated table for each frequent query, with the correct Partition Key.
In SQL, finding "friends of friends" requires 2 JOINs on the same table. For "friends within 6 degrees", that's 6 recursive JOINs — the execution plan grows exponentially. Neo4j stores relationships as direct pointers: traversing a million relationships is as fast as traversing ten. Neo4j's shortestPath() uses an optimised BFS that only visits the necessary nodes.
A hot partition receives far more traffic than others. If you choose only "city" as the Partition Key, the node hosting the "Algiers" partition handles 3× more traffic than "Tamanrasset". That node becomes a bottleneck and negates the benefit of horizontal distribution.
Solution in SmartGrid DZ: use (sensor_id, date) as the Partition Key — data is distributed evenly across all nodes.
Key metrics to understand:
A P99 of 500ms while P50 is 5ms indicates tail latencies — often caused by GC pauses, compaction, or hot spots. This is what you will analyse in your REPORT.md.
The entire environment is containerised. Nothing needs to be installed manually.
Click the link provided by your instructor. GitHub Classroom automatically creates your personal repository with the starter code.
git clone https://github.com/YOUR_ORG/YOUR_REPO.git
cd nosql-tp-chapter4
docker compose up -d
docker compose ps # Verify all services are "Up"
Redis UI → localhost:8001 · MongoDB UI → localhost:8081 · Neo4j Browser → localhost:7474 · Cassandra via cqlsh inside the container.
cd TP1_KeyValue/starter
pip install redis pytest
# Implement ex1_structures.py
pytest tests/test_ex1.py -v
For each lab, you must submit: completed code inside starter/, a REPORT.md file with your analysis, and regular commits with meaningful messages. Automated tests run on every push via GitHub Actions.