Back to Manual
🔍failure detectors/token starvation

token starvation

definition: occurs when long-lived token streams (llm responses) are degraded by background system activity.

  • symptoms: stalled streams, increased time-to-first-byte, interference from heavy tool calls.
  • detection: heuristics analysing mixed traffic patterns and stream latency during high concurrency.
  • mitigation: prioritizing stream traffic, rate-limiting background workers.