The Concept: What is a Cache Stampede?

Cache Stampede (also known as dog-piling) occurs when a frequently accessed cache key (transient) expires, and multiple concurrent requests simultaneously attempt to regenerate the missing data.

The Failure Mode

In a low-traffic environment, get_transient() returns false, one PHP worker regenerates the data, calls set_transient(), and life goes on.

At Scale (e.g., 500 requests/second):

  1. T=0.0s: Transient top_posts_query expires.
  2. T=0.01s: 50 incoming requests call get_transient('top_posts_query'). All return false.
  3. T=0.02s: All 50 PHP workers simultaneously trigger the heavy SQL query (SELECT * FROM wp_posts...).
  4. T=0.50s: The database CPU spikes to 100% due to identical parallel queries.
  5. T=1.00s: Database locks up; new connections time out (504 Gateway Timeout).
  6. Result: Site crashes precisely when you need it most.

The Standard Failure: set_transient

The native WordPress transient API is atomic in writing but has no protection for reading-then-regenerating.

php
// ❌ VULNERABLE CODE
$data = get_transient('heavy_query');

if (false === $data) {
// ⚠️ 50 requests enter here simultaneously
$data = $wpdb->get_results("SELECT SLEEP(2)..."); // Heavy calculation
set_transient('heavy_query', $data, 3600);
}

This race condition exists because there is no coordination between the PHP workers. They all see the cache miss before any single worker has finished refilling it.


The Solution: Stale-While-Revalidate & Locking

To prevent stampedes, we must ensure only one process regenerates the data while others either:

  1. Wait (Locking)
  2. Serve stale data (Stale-While-Revalidate)

1. The Locking Pattern (Mutex)

Uses a lightweight “lock” key in Redis.

  • Worker A sees cache miss → Acquires Lock → Regenerates → Sets Cache → Releases Lock.
  • Worker B sees cache miss → Checks Lock → Exists? → Waits (or returns fallback) → Retries.

2. The Stale-While-Revalidate Pattern (Preferred)

We never let the actual data expire in Redis. Instead, we store a “soft expiration” timestamp inside the data object.

  • Logic:
    • Is current_time > soft_expiry?
    • Yes: Return the STALE data immediately (fast!) AND acquire a non-blocking lock to regenerate in the background.
    • No: Return fresh data.

PHP Implementation: The “Stampede-Proof” Helper

This helper uses Redis features (via wp_cache_* assuming a Redis object cache drop-in is active) to implement a non-blocking lock.

php
<?php

/**
* Safe transient retrieval with "Stale-While-Revalidate" protection.
*
* @param string $key Cache key.
* @param callable $callback Function to regenerate data.
* @param int $ttl Time to live in seconds.
* @return mixed Cached data.
*/

function get_transient_stampede_proof(string $key, callable $callback, int $ttl) {
// 1. Fetch the raw object (we expect a wrapper array)
$cached_object = wp_cache_get($key, 'transients');
$lock_key = 'lock_' . $key;
$now = time();

// 2. Cache Miss (Cold Start): Must block and generate
if (false === $cached_object) {
// Attempt to acquire blocking lock (mutex) to prevent cold-start stampede
// setnx (set if not exists) is atomic.
$is_locked = wp_cache_add($lock_key, 1, 'transients', 30); // 30s lock safety

if ($is_locked) {
$data = $callback();
// Store with "soft" expiry (Update time) and "hard" expiry (TTL)
$wrapper = [
'data' => $data,
'soft_expiry' => $now + $ttl,
'hard_expiry' => $now + $ttl + 3600, // Keep in Redis longer than logical TTL
];
wp_cache_set($key, $wrapper, 'transients', $wrapper['hard_expiry']);
wp_cache_delete($lock_key, 'transients');
return $data;
}

// Lock failed? Wait briefly or return empty (Cold start race condition)
usleep(200000); // Wait 200ms
return get_transient_stampede_proof($key, $callback, $ttl);
}

// 3. Cache Hit: Check Soft Expiry (The Magic)
if ($cached_object['soft_expiry'] < $now) {
// Data is "stale". Attempt to acquire non-blocking lock to refresh.
// Only ONE worker enters this block.
$is_locked = wp_cache_add($lock_key, 1, 'transients', 30);

if ($is_locked) {
// Regeneration happens here (could be offloaded to background job)
$data = $callback();
$cached_object['data'] = $data;
$cached_object['soft_expiry'] = $now + $ttl;

// Extend the physical Redis key life
wp_cache_set($key, $cached_object, 'transients', $now + $ttl + 3600);
wp_cache_delete($lock_key, 'transients');
}

// Everyone else (and the locker) returns the "stale" data immediately.
// No DB spike. Zero blocking.
}

return $cached_object['data'];
}

// Usage Example:
$top_posts = get_transient_stampede_proof('top_posts_list', function() {
global $wpdb;
return $wpdb->get_results("SELECT * FROM {$wpdb->posts} WHERE post_status='publish' LIMIT 10");
}, 300); // 5 minutes logical TTL

Redis vs. Memcached: The Persistence Factor

While both are in-memory stores, their behavior during a restart affects stampedes differently.

Memcached (Pure Cache)

  • Architecture: LRU (Least Recently Used) cache.
  • Persistence: None. If the service restarts, cache is empty (0 hits).
  • Risk: A Memcached restart causes a global cache stampede across the entire site immediately. Every query hits the DB at once.

Redis (Data Structure Store)

  • Architecture: Supports persistence (RDB snapshots or AOF logs).
  • Persistence: Can reload data from disk after a restart.
  • Advantage: If Redis restarts, it restores the cache keys. Your site recovers instantly without hitting the MySQL database.
  • Verdict: For high availability, Redis is mandatory due to disk persistence preventing “Cold Cache” death spirals.

Summary:
To survive scale, stop trusting set_transient blindly. Use the Stale-While-Revalidatepattern to decouple data expiration from data regeneration. Always prefer Redis for its ability to survive service restarts without nuking your database.