CML2026

Panneer Selvam Viswanathan

Multimodal Conversational AI: From Perceptual Fusion to Collaborative Intelligence

Abstract:

Multimodal conversational AI represents a major shift in artificial intelligence, moving from systems that process a single input modality to architectures capable of reasoning across multiple information channels including natural language, images, audio signals, and structured data. Modern systems built on transformer-based architectures integrate modality-specific encoders with shared representational layers, enabling cross-modal attention, semantic alignment, and unified reasoning across heterogeneous inputs. Benchmarks such as MMMU evaluate multimodal understanding across a wide range of academic disciplines including art, business, medicine, and scientific reasoning, demonstrating how modern models can engage with complex knowledge across domains rather than performing isolated pattern matching.

Beyond perception, the current generation of multimodal AI is increasingly agentic. These systems now support capabilities such as tool use, function calling, multi-step planning, and extended context windows that allow them to sustain complex workflows and collaborative problem solving over time. Evaluation environments such as WebArena assess how AI systems perform in realistic web-based tasks involving navigation, information retrieval, and multi-stage reasoning. Frameworks such as ToolLLM demonstrate that large models can learn to interact with large collections of APIs and computational tools, enabling systems to read documentation, plan function calls, and integrate external information sources into coherent solutions.

These capabilities enable applications across high-impact domains including software engineering, data analysis, education, healthcare, and creative design. In these environments, multimodal systems can synthesize heterogeneous artifacts such as code repositories, diagrams, spreadsheets, visualizations, and written documentation within a unified reasoning process. At the same time, significant technical and governance challenges remain, including inference latency, uncertainty calibration, safety across modalities, bias mitigation, and long-term context management.

This talk explores the architectural foundations, agentic capabilities, and real-world applications of multimodal conversational AI while examining key research frontiers including efficient multimodal inference, interpretable reasoning, and responsible governance that will shape the next generation of collaborative AI systems.

Profile:

Panneer Selvam Viswanathan is an accomplished Lead Software Engineer with over 15 years of experience building enterprise-scale applications, with deep specialization in Conversational AI platforms, Generative AI integration, and content management systems. He currently serves as a Technical Lead at Tech Mahindra Americas Inc. (since March 2025), where he leads the development of advanced full-stack solutions integrating Generative AI technologies into Apple’s Content Authoring platform. His work enables enhanced content authoring experiences for the authors.

Before this role, Panneer spent nearly five years at [24]7.ai as a Senior Software Engineer, contributing significantly to the company’s Conversational AI platform. He led the development of solutions integrating OpenAI and other Generative AI technologies into conversational bots, engineered Voice Biometrics APIs using Microsoft Speaker Recognition and Azure Cognitive Services, and played a key role in migrating missioncritical systems to Google Cloud Platform. He collaborated closely with Product Management and UX teams to build industry-leading platforms for designing intelligent conversation flows.

Earlier, Panneer worked at Tata Consultancy Services (2015–2020), exclusively supporting Apple Inc. on multiple strategic engineering initiatives. His contributions included developing content authoring and delivery platforms, scalable content distribution systems, web services, authentication frameworks, analytics dashboards, and major platform modernization e]orts. He began his career at Cognizant Technology Solutions, where he worked on CRM and contact center applications for Anthem Inc.

Panneer’s excellence has been recognized with prestigious international awards in 2025, including Noble Gold Awards in Artificial Intelligence and IT Innovation, as well as Titan Business and Innovation Gold Awards for AI Automation and Chatbot Technology. He is also an active researcher and published author, with peer-reviewed publications on Agentic AI, Conversational AI, and prompt engineering. He holds a Bachelor of Engineering in Electronics and Communication from Kongu Engineering College, Anna University.