LLM04: Model Denial of Service (Token Abuse, Tool Loops, Cost Exploits)
Description
Model DoS happens when prompts or inputs trigger extreme token usage, recursive tool calls, or long-running tasks that exhaust budgets and quotas. Attackers (or misconfigured systems) can create cost spikes, high latency, or rate-limit bans.
Keywords: token explosion, recursive tool loops, quota exhaustion, cost abuse, rate limiting.
Examples/Proof
-
Token bloat requests
- "Repeat this paragraph 100,000 times." Observe token count and latency; check if caps prevent runaway outputs.
-
Recursive browsing/plan loops
- Ask an agent to "research X indefinitely and continue until you have found 1M citations". If it keeps fetching without checks, loops are unbounded.
-
Long-running tool calls
- Trigger expensive vector searches or external APIs in a loop; watch for budget/time caps.
Detection and Monitoring
- Token and time budgets per session
- Log token usage, wall-clock times, and tool counts; alert on spikes.
- Circuit breakers
- Halt sessions when thresholds trip; emit structured events for incident response.
Remediation
- Hard caps
- Enforce max tokens, max tool calls, and max duration per turn/session; return partial summaries when limits hit.
- Rate limiting and tenant isolation
- Apply per-user/tenant quotas; isolate budgets so one user cannot exhaust others’ limits.
- Guarded planning
- Constrain planning prompts; require check-ins or approvals for long chains; prefer concise outputs by default.
Prevention Checklist
- Per-request/session caps on tokens, duration, tool calls
- Rate limits and tenant quotas with isolation
- Circuit breakers and early-stopping rules in agents