Reeshav Kumar
Building Resilient AI Infrastructure: Strategic Architecture & Implementation
Abstract:
Modern AI infrastructure represents a fundamental shift in enterprise computing, requiring organizations to navigate complex trade-offs across quality, performance, cost, and sustainability. This presentation examines the strategic frameworks and architectural patterns necessary for building production-grade AI systems that scale effectively while maintaining operational excellence.
Drawing from comprehensive analysis of AI infrastructure implementations, this session explores the layered architecture model encompassing six critical layers: data governance, storage systems, compute resources, model toolchains, orchestration platforms, and serving infrastructure. Each layer presents unique bottlenecks and optimization opportunities that directly impact system performance and business value.
The presentation addresses key performance dimensions that technology leaders must balance: quality assurance in training and inference, latency requirements ranging from millisecond-level interactive responses to batch processing measured in hours, throughput capacity for handling concurrent workloads, cost optimization across compute and storage resources, and energy efficiency as organizations face increasing pressure to reduce carbon footprints and operational expenses.
Attendees will gain a systematic decision framework for infrastructure selection that emphasizes workload characterization, technical capability assessment, economic analysis including total cost of ownership, and comprehensive risk evaluation covering vendor stability, technology maturity, and security requirements. The session presents proven implementation patterns including incremental deployment strategies, comprehensive observability from project inception, and planning for continuous model evolution through versioning and staged rollout capabilities.
The presentation concludes with emerging trends reshaping AI infrastructure: specialized hardware proliferation beyond general-purpose GPUs, edge AI expansion requiring distributed model deployment and optimization, and sustainability as a first-class requirement driving architectural decisions. Organizations that master these infrastructure considerations position themselves to capture AI's full value potential while avoiding costly implementation mistakes.
Profile:
Reeshav Kumar is a Senior Product Manager at Meta Platforms, leading product development for Instagram's business analytics and next-generation Mixed Reality devices. With over a decade of experience spanning hardware, software, and AI, he shapes how people connect and businesses thrive in the digital ecosystem.
Before Meta, Reeshav spent four years at Apple as a Product Manager and Engineering Project Manager, spearheading computational photography and neural engine capabilities for iPhones and Apple Silicon. His AI/ML optimization work dramatically improved latency and power consumption, while his automation initiatives delivered over $10 million in cost savings.
His technical foundation includes engineering roles at Apple, Barefoot Networks, and Oracle, where he designed components for 517+ million devices, developed programmable switches for AI data center workloads, and created processors generating over $1 billion in revenue.
Reeshav holds an MBA from UC Berkeley's Haas School of Business and an MS in Electrical Engineering from Texas A&M University, where he also taught. He earned his BE with Honours from BITS Pilani, India, and has published research on IC design and network architectures.
Beyond work, Reeshav supported educational initiatives for 120+ underprivileged children in rural India and pursues underwater exploration as a PADI Master Scuba Diver.