Back to Manual
🔍failure detectors/retry storms

retry storms

definition: occurs when retries amplify load instead of recovering from failure, often cascading across multiple agents.

  • symptoms: exponential request growth, correlated failures across agents, degraded stability.
  • detection: correlates retry attempts, error rates, and temporal clustering of identical actions.
  • mitigation: exponential backoff, circuit breakers, and upstream retry limits.