A New Cyber Threat to AI: How Denial-of-Wallet (DoW) Attacks Quietly Inflate the Cost of RAG Systems

June 8, 2026

When people hear the words "cyberattack," most of us picture stolen data or servers knocked offline through DDoS (Denial-of-Service). But with the massive shift to the cloud and serverless architectures, a new, financially destructive threat has emerged: Denial-of-Wallet (DoW). In a DoW attack, the attacker does not take your system down. Instead, they abuse its automatic scaling capability and artificially generate enormous traffic, leading to astronomical cloud infrastructure bills.

Today, this nightmare is moving into the world of artificial intelligence and large language models (LLM). Attackers are beginning to target the extreme consumption of API tokens and expensive GPU power. And the latest research reveals that popular AI applications are extraordinarily vulnerable to this threat.

What is RA-ICA and why does it endanger your RAG systems?

In a recent scientific study titled Inference Cost Attacks for Retrieval-Augmented Large Language Models from 2026, researchers from The Hong Kong Polytechnic University introduced a new kind of critical vulnerability: Retrieval-Augmented Inference Cost Attack (RA-ICA).

Previous attempts to inflate the cost of LLMs (so-called Inference Cost Attacks) required direct manipulation of the user's query (the prompt). In practice and in production environments, however, this is very difficult to pull off. The new RA-ICA attack instead exploits a weakness directly within RAG systems (Retrieval-Augmented Generation), which modern AI applications use to look up current information from external websites and databases.

The attacker does not need to overcome your application security at all. It is enough to "poison" public data on the internet with a specially crafted document. When a customer asks a routine question in your application, the RAG system retrieves this malicious text in good faith on its own, instantly springing the financial trap.

Comparison of an Inference Cost Attack and a RA-Inference Cost Attack (RA-ICA) on RAG systems

The CREEP framework: 3 strategies for how one AI hacks another

The researchers built an automated attack tool called CREEP (Computational Resource Exhaustion via External Poisoning). It uses its own LLM agents to generate texts that are semantically highly relevant to RAG retrieval, yet impose an enormous computational burden on your language model.

The CREEP system uses three main tactics to deceive the artificial intelligence:

Decoy Injection: The agent hides logical puzzles or complex mathematical or planning tasks within the document. When your RAG system loads them, the model unknowingly starts solving them during its reasoning, needlessly burning through an enormous number of tokens.
Contradiction Injection: The malicious text contains facts that contradict one another. The LLM is forced to analyze these contradictions (triggering so-called overthinking), which dramatically extends the response generation time and GPU usage.
Task-Oriented Manipulation: The attacking AI directly optimizes the text to maximize your system's computational costs, while taking extreme care to make the text appear inconspicuous and evade detection.

This entire process is powered by an innovative reinforcement learning algorithm, MA-GRPO (Memory-Augmented Group Relative Policy Optimization), which stores the most successful historical attacks in memory and continuously refines their effectiveness.

The CREEP framework: Decoy Injection, Contradiction Injection and Task-Oriented manipulation with MA-GRPO training

Shocking statistics: API token bills up to 1,300% higher

Testing the RA-ICA attack against today's top models (such as GPT-5, Claude-Sonnet-4, and DeepSeek-R1) across benchmark datasets (Natural Questions, HotpotQA) produced alarming results for corporate finances:

A dramatic cost increase: The optimized attack was able to increase token consumption by an incredible amount – up to 13.12 times.
Extreme success rate: The attack documents were retrieved and downloaded by the RAG system with a success rate of more than 90%.
Perfect camouflage (stealth mode): The attack is practically invisible. It in no way distorts the correctness of the final answer for the user, so your usual AI security filters detect nothing suspicious. The customer is satisfied, but your wallet is bleeding.

How to protect corporate LLMs from financial exhaustion?

Securing AI applications can no longer be focused exclusively on preventing data leaks and fighting model hallucinations. This research clearly proves that a new front in 21st-century cybersecurity is the economic protection of infrastructure.

If your application uses RAG systems to obtain data from the open internet, it becomes an easy target for Denial-of-Wallet attacks. Developers and architects must immediately begin implementing strict sanitization and validation of external documents before serving them to the language model for processing. Protecting tokens is now just as important as protecting the data itself.

Among the specific measures we recommend are:

Sanitization and validation of external documents before they even reach the model's context.
Limits on reasoning length and token count (reasoning and output budget) for each individual query.
Real-time monitoring of token consumption and costs with automatic alerts for anomalies.
Assessing the trustworthiness of the sources the RAG system draws from, and prioritizing verified databases over the open internet.
Rate limiting and input control across the entire RAG pipeline.

How Haxoris can help

To keep this threat from becoming an expensive reality, Haxoris can help in ways such as:

Penetration testing of LLMs and AI integrations.
Security audits of the RAG pipeline and external data sources.
Red teaming of AI applications with a focus on resource and cost abuse.
Designing controls for token consumption and protecting operational costs.

Conclusion

Denial-of-Wallet and RA-ICA-style attacks show that AI security is no longer only about data and hallucinations. Your wallet has become a full-fledged target of attack.

An attack that does not spoil the answer for the customer in any way, yet quietly multiplies your API and GPU bill, is exactly the type of threat that companies notice only when the invoice arrives. That is why it is better to test your RAG systems before an attacker does it for you.

Source

Inference Cost Attacks for Retrieval-Augmented Large Language Models (2026), The Hong Kong Polytechnic University.