retry storms
definition: occurs when retries amplify load instead of recovering from failure, often cascading across multiple agents.
- symptoms: exponential request growth, correlated failures across agents, degraded stability.
- detection: correlates retry attempts, error rates, and temporal clustering of identical actions.
- mitigation: exponential backoff, circuit breakers, and upstream retry limits.