LLM10: Model Theft (Weight Exfiltration, API Extraction, Knockoff Nets)
Description
Model theft includes direct exfiltration of proprietary weights/checkpoints and indirect extraction via high-volume API sampling to train a copycat. Risks arise from exposed storage, permissive CI/CD, third-party hosts, or insufficient API protections.
Keywords: model exfiltration, checkpoint leaks, API scraping, watermarking, inference rate limiting.
Examples/Proof
-
Artifact exposure
- Scan storage/registries for public access to model files (e.g.,
.bin,.safetensors). If accessible, they can be copied.
- Scan storage/registries for public access to model files (e.g.,
-
API extraction
- Simulate high-rate queries to collect input-output pairs; if rate limits don’t throttle and watermarking is absent, approximation is feasible.
Detection and Monitoring
- Access logs and anomaly detection
- Monitor unusual download volumes or IPs; detect scraping patterns on inference APIs.
- Watermark/trace
- Embed statistical watermarks or response signatures; check for misuse in the wild.
Remediation
- Protect weights and artefacts
- Encrypt and restrict storage; sign releases; use access gates and short-lived URLs.
- API protections
- Rate limit; require authenticated clients; detect scraping; watermark outputs.
- Contractual controls
- Enforce license/ToS; monitor marketplaces and repos for leaked or cloned models.
Prevention Checklist
- Private, access-controlled storage; signed artifacts; short-lived download URLs
- API rate limits, authentication, and watermarking
- Monitoring for leak indicators and takedown workflows