LLM04: Model Denial of Service (Token Abuse, Tool Loops, Cost Exploits)

Description

Model DoS happens when prompts or inputs trigger extreme token usage, recursive tool calls, or long-running tasks that exhaust budgets and quotas. Attackers (or misconfigured systems) can create cost spikes, high latency, or rate-limit bans.

Keywords: token explosion, recursive tool loops, quota exhaustion, cost abuse, rate limiting.

Examples/Proof

  • Token bloat requests

    • "Repeat this paragraph 100,000 times." Observe token count and latency; check if caps prevent runaway outputs.
  • Recursive browsing/plan loops

    • Ask an agent to "research X indefinitely and continue until you have found 1M citations". If it keeps fetching without checks, loops are unbounded.
  • Long-running tool calls

    • Trigger expensive vector searches or external APIs in a loop; watch for budget/time caps.

Detection and Monitoring

  • Token and time budgets per session
    • Log token usage, wall-clock times, and tool counts; alert on spikes.
  • Circuit breakers
    • Halt sessions when thresholds trip; emit structured events for incident response.

Remediation

  1. Hard caps
    • Enforce max tokens, max tool calls, and max duration per turn/session; return partial summaries when limits hit.
  2. Rate limiting and tenant isolation
    • Apply per-user/tenant quotas; isolate budgets so one user cannot exhaust others’ limits.
  3. Guarded planning
    • Constrain planning prompts; require check-ins or approvals for long chains; prefer concise outputs by default.

Prevention Checklist

  • Per-request/session caps on tokens, duration, tool calls
  • Rate limits and tenant quotas with isolation
  • Circuit breakers and early-stopping rules in agents