GPU TEE Inference

Run AI models with hardware secured privacy and verifiable output. No code change required.

Features

Drop-in Replacement for LLM API

Enjoy the privacy and verifiability with zero cost. Our API is fully OpenAI-compatible. Just switch the endpoint and you're ready to go.

Learn more

Don't Trust, Verify

Every inference runs inside TEE. Data stays private and outputs are tamper-proof. You receive remote attestation to cryptographically verify the trustworthiness of each result.

Learn more

NVIDIA Confidential Compute & Open Infra

Built on NVIDIA Confidential Computing technology and the open source private-ml-sdk and dstack, you get unmatched transparency, hardware-backed security, open code, and full control.

Frequently Asked Questions

What is a TEE?

A Trusted Execution Environment (TEE) is a secure environment that isolates code and data from the host system using hardware-level protection. Even the vendor or operating system cannot access the app data or interfere with code execution. It enables verifiable and encrypted computation.

What is GPU TEE Inference?

GPU TEE Inference runs AI models inside GPU-based TEEs, providing hardware-level privacy and verifiability. It ensures that both your input data and model outputs are protected from cloud providers and third parties.

How is this different from using OpenAI or Hugging Face APIs?

Unlike traditional APIs, GPU TEE Inference provides confidentiality and execution proof. Each call runs inside a secure NVIDIA GPU TEE and includes remote attestation. So you can verify the inference was executed in a secure environment.

Is the API really OpenAI-compatible?

Yes. Our API is fully compatible with OpenAI and OpenRouter standards. Just update the base URL and reuse your existing code and tools. No rewrites required.

How can I verify that an inference was private and secure?

Every inference includes a cryptographic proof that shows it ran securely inside a GPU TEE. You can verify both the platform's integrity and the result itself. So your users can trust the output, not just take it on faith.

What hardware does Phala Cloud use for GPU TEE Inference?

Phala Cloud runs AI Inference tasks on TEE-enabled hardware such as Intel Xeon CPUs (for Intel TDX) and NVIDIA H100/H200 GPUs (NVIDIA Confidential Computing).

How much does it cost to use GPU TEE Inference?

We offer fair and transparent pricing that matches other AI inference providers. The difference is that with GPU TEE Inference, you get much stronger privacy and verifiable security at no additional cost.