The Concept: What is a Cache Stampede?
A Cache Stampede (also known as dog-piling) occurs when a frequently accessed cache key (transient) expires, and multiple concurrent requests simultaneously attempt to regenerate the missing data.
The Failure Mode
In a low-traffic environment, get_transient() returns false, one PHP worker regenerates the data, calls set_transient(), and life goes on.
At Scale (e.g., 500 requests/second):
- T=0.0s: Transient
top_posts_queryexpires. - T=0.01s: 50 incoming requests call
get_transient('top_posts_query'). All returnfalse. - T=0.02s: All 50 PHP workers simultaneously trigger the heavy SQL query (
SELECT * FROM wp_posts...). - T=0.50s: The database CPU spikes to 100% due to identical parallel queries.
- T=1.00s: Database locks up; new connections time out (504 Gateway Timeout).
- Result: Site crashes precisely when you need it most.
The Standard Failure: set_transient
The native WordPress transient API is atomic in writing but has no protection for reading-then-regenerating.
php// ❌ VULNERABLE CODE
$data = get_transient('heavy_query');
if (false === $data) {
// ⚠️ 50 requests enter here simultaneously
$data = $wpdb->get_results("SELECT SLEEP(2)..."); // Heavy calculation
set_transient('heavy_query', $data, 3600);
}
This race condition exists because there is no coordination between the PHP workers. They all see the cache miss before any single worker has finished refilling it.
The Solution: Stale-While-Revalidate & Locking
To prevent stampedes, we must ensure only one process regenerates the data while others either:
- Wait (Locking)
- Serve stale data (Stale-While-Revalidate)
1. The Locking Pattern (Mutex)
Uses a lightweight “lock” key in Redis.
- Worker A sees cache miss → Acquires Lock → Regenerates → Sets Cache → Releases Lock.
- Worker B sees cache miss → Checks Lock → Exists? → Waits (or returns fallback) → Retries.
2. The Stale-While-Revalidate Pattern (Preferred)
We never let the actual data expire in Redis. Instead, we store a “soft expiration” timestamp inside the data object.
- Logic:
- Is
current_time > soft_expiry? - Yes: Return the STALE data immediately (fast!) AND acquire a non-blocking lock to regenerate in the background.
- No: Return fresh data.
- Is
PHP Implementation: The “Stampede-Proof” Helper
This helper uses Redis features (via wp_cache_* assuming a Redis object cache drop-in is active) to implement a non-blocking lock.
php<?php
/**
* Safe transient retrieval with "Stale-While-Revalidate" protection.
*
* @param string $key Cache key.
* @param callable $callback Function to regenerate data.
* @param int $ttl Time to live in seconds.
* @return mixed Cached data.
*/
function get_transient_stampede_proof(string $key, callable $callback, int $ttl) {
// 1. Fetch the raw object (we expect a wrapper array)
$cached_object = wp_cache_get($key, 'transients');
$lock_key = 'lock_' . $key;
$now = time();
// 2. Cache Miss (Cold Start): Must block and generate
if (false === $cached_object) {
// Attempt to acquire blocking lock (mutex) to prevent cold-start stampede
// setnx (set if not exists) is atomic.
$is_locked = wp_cache_add($lock_key, 1, 'transients', 30); // 30s lock safety
if ($is_locked) {
$data = $callback();
// Store with "soft" expiry (Update time) and "hard" expiry (TTL)
$wrapper = [
'data' => $data,
'soft_expiry' => $now + $ttl,
'hard_expiry' => $now + $ttl + 3600, // Keep in Redis longer than logical TTL
];
wp_cache_set($key, $wrapper, 'transients', $wrapper['hard_expiry']);
wp_cache_delete($lock_key, 'transients');
return $data;
}
// Lock failed? Wait briefly or return empty (Cold start race condition)
usleep(200000); // Wait 200ms
return get_transient_stampede_proof($key, $callback, $ttl);
}
// 3. Cache Hit: Check Soft Expiry (The Magic)
if ($cached_object['soft_expiry'] < $now) {
// Data is "stale". Attempt to acquire non-blocking lock to refresh.
// Only ONE worker enters this block.
$is_locked = wp_cache_add($lock_key, 1, 'transients', 30);
if ($is_locked) {
// Regeneration happens here (could be offloaded to background job)
$data = $callback();
$cached_object['data'] = $data;
$cached_object['soft_expiry'] = $now + $ttl;
// Extend the physical Redis key life
wp_cache_set($key, $cached_object, 'transients', $now + $ttl + 3600);
wp_cache_delete($lock_key, 'transients');
}
// Everyone else (and the locker) returns the "stale" data immediately.
// No DB spike. Zero blocking.
}
return $cached_object['data'];
}
// Usage Example:
$top_posts = get_transient_stampede_proof('top_posts_list', function() {
global $wpdb;
return $wpdb->get_results("SELECT * FROM {$wpdb->posts} WHERE post_status='publish' LIMIT 10");
}, 300); // 5 minutes logical TTL
Redis vs. Memcached: The Persistence Factor
While both are in-memory stores, their behavior during a restart affects stampedes differently.
Memcached (Pure Cache)
- Architecture: LRU (Least Recently Used) cache.
- Persistence: None. If the service restarts, cache is empty (0 hits).
- Risk: A Memcached restart causes a global cache stampede across the entire site immediately. Every query hits the DB at once.
Redis (Data Structure Store)
- Architecture: Supports persistence (RDB snapshots or AOF logs).
- Persistence: Can reload data from disk after a restart.
- Advantage: If Redis restarts, it restores the cache keys. Your site recovers instantly without hitting the MySQL database.
- Verdict: For high availability, Redis is mandatory due to disk persistence preventing “Cold Cache” death spirals.
Summary:
To survive scale, stop trusting set_transient blindly. Use the Stale-While-Revalidatepattern to decouple data expiration from data regeneration. Always prefer Redis for its ability to survive service restarts without nuking your database.