Air-Gapped & Private LLM

What Is Air-Gapped AI? Private Model Deployment Explained

Air-gapped AI is the practice of running AI models — including large language models — on infrastructure with no inbound or outbound internet connection. Data, model weights, and inference all stay inside a private network or isolated cloud environment, so sensitive information never crosses the organisation's security boundary.

Dishant Sethi ·Updated Jun 15, 2026

How does air-gapped AI work?

Air-gapped AI removes every path between your models and the public internet. Instead of calling a hosted API like OpenAI or Anthropic over the network, the models run on hardware or in a cloud environment you control, and all traffic stays inside a private boundary.

In practice that means three things: the model weights are downloaded once and stored privately (often served with Ollama, vLLM, or NVIDIA NIM); inference happens on private compute with no egress; and supporting services — databases, queues, registries — are reached only through private networking. On AWS, this is typically done with a private VPC, VPC interface endpoints for every AWS service call, and a private container registry so no image is ever pulled from a public source.

The result is a system where regulated or confidential data is processed by AI without that data, or the prompts and responses derived from it, ever leaving the network.

Air-gapped AI vs on-prem vs private cloud

These terms overlap but are not identical. The deciding factor is where the network boundary sits and how much internet access is allowed.

ModelWhere it runsInternet egressTypical use
Air-gapped AIPrivate network or isolated cloud VPCNoneRegulated finance, healthcare, defence, government
On-prem AIYour own data centre / hardwareOften partialData-residency rules, existing hardware investment
Private cloud AIDedicated cloud tenancyUsually allowed, controlledEnterprises wanting control without owning hardware

Air-gapped is the strictest of the three: zero egress is the defining constraint. On-prem describes ownership of the hardware, not necessarily isolation, and a private cloud deployment can still reach the internet through a controlled gateway.

When do you need air-gapped AI?

Air-gapped AI is worth its added complexity when a data-leak path is unacceptable, not merely undesirable. The common triggers are:

  • Regulatory mandate — HIPAA, PCI-DSS, GDPR data-residency, or financial-services rules that prohibit sending data to third-party APIs.
  • Confidential or proprietary data — source code, trading models, patient records, or classified material that must not transit a public network.
  • Document processing on sensitive files — extraction pipelines (for example, PaddleOCR with a vision-language model such as Qwen2.5-VL) that must run without uploading documents anywhere.
  • Contractual isolation — enterprise customers who require, in writing, that their data never leaves a defined boundary.

If none of these apply, a private cloud deployment usually delivers most of the control at a lower operational cost.

What are the trade-offs?

Air-gapping trades convenience for control. You give up frontier hosted models (GPT-5, Claude) and managed scaling, and you take on serving open-weight models yourself, sizing GPU capacity, and patching an environment that can't pull updates from the internet on demand. Updates arrive through a controlled ingestion process instead.

In exchange you get a hard guarantee: data and inference never leave your boundary. For organisations where that guarantee is the requirement, the trade is straightforward — and modern open-weight models are now strong enough that the quality gap for most production tasks is small.

Frequently Asked Questions

No. On-premise refers to running on hardware you own, which may still have internet access. Air-gapped specifically means no inbound or outbound internet connection — the air gap is the absence of any network path to the outside world. You can run air-gapped AI in an isolated cloud VPC without owning any hardware.

Yes. Open-weight models such as Llama, Qwen, and Mistral can be downloaded once and served entirely offline using runtimes like Ollama, vLLM, or NVIDIA NIM. The main constraint is GPU capacity, since you provision and scale the serving infrastructure yourself rather than relying on a hosted API.

It depends on scope, but a production-grade air-gapped environment can be delivered quickly with the right architecture. Prodinit deployed a fully air-gapped AWS EKS platform — zero internet egress, private registry, and complete CI/CD — for a regulated fintech in four weeks.

Updates flow through a controlled ingestion process: new model weights or container images are scanned and brought into the private environment deliberately, rather than pulled automatically from a public registry. This keeps the air gap intact while still allowing the system to be patched and upgraded.

How Prodinit does this in productionHow we deployed a fully air-gapped AI platform for a regulated fintech in 4 weeks Read the case study

Stay ahead in AI engineering.

Get the latest insights on building production AI systems, be the first to explore approaches that actually work beyond the demo.

Start a Project →