LLM04: Model Denial of Service (Token Abuse, Tool Loops, Cost Exploits)

Description

Model DoS happens when prompts or inputs trigger extreme token usage, recursive tool calls, or long-running tasks that exhaust budgets and quotas. Attackers (or misconfigured systems) can create cost spikes, high latency, or rate-limit bans.

Keywords: token explosion, recursive tool loops, quota exhaustion, cost abuse, rate limiting.

Examples/Proof

Token bloat requests
- "Repeat this paragraph 100,000 times." Observe token count and latency; check if caps prevent runaway outputs.
Recursive browsing/plan loops
- Ask an agent to "research X indefinitely and continue until you have found 1M citations". If it keeps fetching without checks, loops are unbounded.
Long-running tool calls
- Trigger expensive vector searches or external APIs in a loop; watch for budget/time caps.

Detection and Monitoring

Token and time budgets per session
- Log token usage, wall-clock times, and tool counts; alert on spikes.
Circuit breakers
- Halt sessions when thresholds trip; emit structured events for incident response.

Remediation

Hard caps
- Enforce max tokens, max tool calls, and max duration per turn/session; return partial summaries when limits hit.
Rate limiting and tenant isolation
- Apply per-user/tenant quotas; isolate budgets so one user cannot exhaust others’ limits.
Guarded planning
- Constrain planning prompts; require check-ins or approvals for long chains; prefer concise outputs by default.

Prevention Checklist

Per-request/session caps on tokens, duration, tool calls
Rate limits and tenant quotas with isolation
Circuit breakers and early-stopping rules in agents

Haxoris Wiki

LLM04: Model Denial of Service (Token Abuse, Tool Loops, Cost Exploits)

Description

Examples/Proof

Detection and Monitoring

Remediation

Prevention Checklist