LLM10: Model Theft (Weight Exfiltration, API Extraction, Knockoff Nets)

Description

Model theft includes direct exfiltration of proprietary weights/checkpoints and indirect extraction via high-volume API sampling to train a copycat. Risks arise from exposed storage, permissive CI/CD, third-party hosts, or insufficient API protections.

Keywords: model exfiltration, checkpoint leaks, API scraping, watermarking, inference rate limiting.

Examples/Proof

Artifact exposure
- Scan storage/registries for public access to model files (e.g., .bin, .safetensors). If accessible, they can be copied.
API extraction
- Simulate high-rate queries to collect input-output pairs; if rate limits don’t throttle and watermarking is absent, approximation is feasible.

Detection and Monitoring

Access logs and anomaly detection
- Monitor unusual download volumes or IPs; detect scraping patterns on inference APIs.
Watermark/trace
- Embed statistical watermarks or response signatures; check for misuse in the wild.

Remediation

Protect weights and artefacts
- Encrypt and restrict storage; sign releases; use access gates and short-lived URLs.
API protections
- Rate limit; require authenticated clients; detect scraping; watermark outputs.
Contractual controls
- Enforce license/ToS; monitor marketplaces and repos for leaked or cloned models.

Prevention Checklist

Private, access-controlled storage; signed artifacts; short-lived download URLs
API rate limits, authentication, and watermarking
Monitoring for leak indicators and takedown workflows

Haxoris Wiki

LLM10: Model Theft (Weight Exfiltration, API Extraction, Knockoff Nets)

Description

Examples/Proof

Detection and Monitoring

Remediation

Prevention Checklist