WCSC2026

Srikanta Datta

Secure Multi-Tenant GPU-as-a-Service on Kubernetes: Architecture, Isolation, and Reliability at Scale

Abstract:

As AI adoption accelerates, GPU clouds must deliver high utilization without compromising tenant isolation. This talk presents a Kubernetes-native GPU-as-a-Service design at scale, covering GPU/CPU SKU modeling, scheduling, network and RDMA isolation, secrets injection, and reliability guardrails with real-world failure-mode mitigations.

The rapid adoption of artificial intelligence across enterprise and cloud-native environments has created unprecedented demand for GPU compute resources. As organizations move from experimental AI workloads to production deployments, the infrastructure challenges multiply: How do you share expensive GPU resources across multiple tenants without compromising isolation? How do you schedule workloads that have constraints beyond CPU and memory? How do you operate a GPU cloud that runs training jobs for days without disruption?

This talk presents a comprehensive architecture for Kubernetes-native GPU-as-a-Service at enterprise scale, drawing from practical experience building and operating such platforms.

GPU and CPU SKU Modeling
Effective GPU cloud management begins with understanding the heterogeneity of hardware. We present approaches to modeling GPU capabilities (compute, memory bandwidth, interconnect), CPU profiles for inference versus training, and how to expose these dimensions to schedulers and users through Kubernetes-native abstractions.

Multi-Dimensional Scheduling
Traditional Kubernetes scheduling optimizes for CPU and memory. GPU workloads require schedulers that consider memory bandwidth limits, NVLink/InfiniBand topology, thermal constraints, and tenant priority policies. We describe an extensible multi-dimensional scheduling approach that addresses these requirements.

Network and RDMA Isolation
Multi-tenant GPU environments must isolate not only compute but also network traffic—particularly for workloads using RDMA over InfiniBand. We present network isolation architectures that maintain performance while enforcing tenant boundaries.

Secrets Injection and Security Guardrails
Shared infrastructure requires cryptographic isolation. We describe a secrets fabric architecture that provides tenant-aware key management without adding latency to GPU operations—critical for inference workloads operating at microsecond margins.

Reliability at Scale
GPU training jobs may run for days or weeks. Control plane upgrades cannot disrupt running workloads. We present hitless upgrade patterns, failure-mode analysis, and chaos engineering practices that maintain reliability without sacrificing operational agility.

Profile:

Srikanta Datta is a Director of Engineering for AI Infrastructure at Coupang, leading global teams building Kubernetes-native GPU cloud platforms, with 11 patents and peer-reviewed publications in IEEE and Springer.

Srikanta Datta is a Director of Engineering for AI Infrastructure, leading global teams building Kubernetes-native GPU cloud control planes, service networking, and security guardrails. He is an inventor on 11 filed patents spanning GPU multi-tenancy, scheduling, isolation, secrets management, and reliability, and has authored peer-reviewed cybersecurity and AI systems publications. He has previously delivered large-scale platforms at Oracle and Cisco.

3^rd World Congress on Smart Computing
(WCSC2026)

Organized by

in Association with

Srikanta Datta

3rd World Congress on Smart Computing (WCSC2026)

Organized by

in Association with

Srikanta Datta

3^rd World Congress on Smart Computing
(WCSC2026)