LLM10: Model Theft (Weight Exfiltration, API Extraction, Knockoff Nets)

Description

Model theft includes direct exfiltration of proprietary weights/checkpoints and indirect extraction via high-volume API sampling to train a copycat. Risks arise from exposed storage, permissive CI/CD, third-party hosts, or insufficient API protections.

Keywords: model exfiltration, checkpoint leaks, API scraping, watermarking, inference rate limiting.

Examples/Proof

  • Artifact exposure

    • Scan storage/registries for public access to model files (e.g., .bin, .safetensors). If accessible, they can be copied.
  • API extraction

    • Simulate high-rate queries to collect input-output pairs; if rate limits don’t throttle and watermarking is absent, approximation is feasible.

Detection and Monitoring

  • Access logs and anomaly detection
    • Monitor unusual download volumes or IPs; detect scraping patterns on inference APIs.
  • Watermark/trace
    • Embed statistical watermarks or response signatures; check for misuse in the wild.

Remediation

  1. Protect weights and artefacts
    • Encrypt and restrict storage; sign releases; use access gates and short-lived URLs.
  2. API protections
    • Rate limit; require authenticated clients; detect scraping; watermark outputs.
  3. Contractual controls
    • Enforce license/ToS; monitor marketplaces and repos for leaked or cloned models.

Prevention Checklist

  • Private, access-controlled storage; signed artifacts; short-lived download URLs
  • API rate limits, authentication, and watermarking
  • Monitoring for leak indicators and takedown workflows