Sayantan Karmakar (Platform DevOps Engineer, Motorola Solutions, 8+ years) and Rishi Nikhilesh (Manager Software Engineering, 12+ years) presented the most concrete MCP production case study at the summit.
The Problem: 45-Minute Manual Investigation
Motorola's mission-critical environment covers Emergency Dispatch, Command Center, Video Management, Analytics, and Communication Services. Infrastructure: OpenShift Clusters, Grafana, Prometheus, AI Operations.
When a fire emergency comes in, the flow is: Citizen Call → Call Taker → CAD Incident Created → Dispatch Service → Fire Units Notified → Responders Arrive → Incident Resolved.
Every minute counts. A 45-minute MTTR for infrastructure incidents is unacceptable when lives depend on dispatch systems.
The Solution: AI Agent with MCP Protocol Layer
The architecture: SRE/Operator sends natural language query → AI Agent (AWS Bedrock) with Anomaly Detection, Incident Analysis, Correlation Engine, Recommendation Engine → MCP Protocol layer → Red Hat Kubernetes MCP Server + Grafana MCP Server + Runbook MCP Server.
Before MCP: Human manually intermediated between Grafana and Kubernetes. After MCP: AI Agent communicates through MCP Layer to both systems simultaneously.
Why MCP Instead of Custom Integrations?
| Traditional | MCP |
|---|---|
| Custom APIs | Standard Protocol |
| Point-to-point | Reusable |
| High Maintenance | Low Maintenance |
| Vendor-specific | Portable |
The Results
- - Before: 45 minutes manual investigation
- - After: 4 minutes with automated correlation
- - 91% reduction in Mean Time To Resolve
- - 99.99% uptime target achieved
Human-In-The-Loop Design
Critical principle: Control and Oversight. Agent acts within bounds; critical actions require a Human. Scaling, restarting, deploying — all restricted without human approval.
The meme slide said it best: SRE choosing between "Wait for Human Approval Gate" and "Run it autonomously (NO!)"
The Agent Workflow
- Incident Alert (Grafana, Sentry)
- Agent Trigger (pre-defined criteria)
- MCP Context Gathering (K8s state)
- Correlation Analysis
- Root Cause Identification
- Human Approval Request
- Automated Action (approved remediation only)
This is MCP in production for mission-critical infrastructure. Not a demo. Not a POC. Real public safety systems.