How does air-gapped AI work?
Air-gapped AI removes every path between your models and the public internet. Instead of calling a hosted API like OpenAI or Anthropic over the network, the models run on hardware or in a cloud environment you control, and all traffic stays inside a private boundary.
In practice that means three things: the model weights are downloaded once and stored privately (often served with Ollama, vLLM, or NVIDIA NIM); inference happens on private compute with no egress; and supporting services — databases, queues, registries — are reached only through private networking. On AWS, this is typically done with a private VPC, VPC interface endpoints for every AWS service call, and a private container registry so no image is ever pulled from a public source.
The result is a system where regulated or confidential data is processed by AI without that data, or the prompts and responses derived from it, ever leaving the network.
Air-gapped AI vs on-prem vs private cloud
These terms overlap but are not identical. The deciding factor is where the network boundary sits and how much internet access is allowed.
| Model | Where it runs | Internet egress | Typical use |
|---|---|---|---|
| Air-gapped AI | Private network or isolated cloud VPC | None | Regulated finance, healthcare, defence, government |
| On-prem AI | Your own data centre / hardware | Often partial | Data-residency rules, existing hardware investment |
| Private cloud AI | Dedicated cloud tenancy | Usually allowed, controlled | Enterprises wanting control without owning hardware |
Air-gapped is the strictest of the three: zero egress is the defining constraint. On-prem describes ownership of the hardware, not necessarily isolation, and a private cloud deployment can still reach the internet through a controlled gateway.
When do you need air-gapped AI?
Air-gapped AI is worth its added complexity when a data-leak path is unacceptable, not merely undesirable. The common triggers are:
- Regulatory mandate — HIPAA, PCI-DSS, GDPR data-residency, or financial-services rules that prohibit sending data to third-party APIs.
- Confidential or proprietary data — source code, trading models, patient records, or classified material that must not transit a public network.
- Document processing on sensitive files — extraction pipelines (for example, PaddleOCR with a vision-language model such as Qwen2.5-VL) that must run without uploading documents anywhere.
- Contractual isolation — enterprise customers who require, in writing, that their data never leaves a defined boundary.
If none of these apply, a private cloud deployment usually delivers most of the control at a lower operational cost.
What are the trade-offs?
Air-gapping trades convenience for control. You give up frontier hosted models (GPT-5, Claude) and managed scaling, and you take on serving open-weight models yourself, sizing GPU capacity, and patching an environment that can't pull updates from the internet on demand. Updates arrive through a controlled ingestion process instead.
In exchange you get a hard guarantee: data and inference never leave your boundary. For organisations where that guarantee is the requirement, the trade is straightforward — and modern open-weight models are now strong enough that the quality gap for most production tasks is small.