Seven Critical Lessons for CISOs from the McKinsey Lilli Hack

In early 2026, security researchers using an autonomous offensive AI agent compromised Lilli - McKinsey & Company's internal generative AI platform used by over 43,000 employees. Within two hours, the agent gained full read and write access to the production database containing 46.5 million chat messages, 728,000 files, and 57,000 user accounts.

McKinsey is not a careless organisation. They have world-class security teams, substantial security investment, and mature processes. If this can happen to them, it can happen to you. Here are the critical lessons CISOs must internalise.

1. SQL Injection Is Not Dead - Your AI Systems Just Made It Worse

The vulnerability that exposed everything was SQL injection - a bug class that has existed for three decades. The Lilli platform concatenated JSON keys directly into SQL queries. When attackers controlled the keys, they controlled the database.

The CISO imperative: Your AI systems likely interact with databases in complex, dynamic ways. RAG architectures, vector stores, conversation history, user preferences - these all create new query patterns that traditional input validation may not cover. Audit every database interaction point in your AI stack with the assumption that attackers will find injection opportunities your standard scanners miss.

2. API Documentation Is a Blueprint for Your Attack Surface

The autonomous agent discovered over 200 API endpoints through publicly exposed documentation. Twenty-two of these endpoints required no authentication. One of them wrote to the database.

The CISO imperative: If you expose API documentation - even for authenticated endpoints - you are giving attackers a roadmap. Conduct a complete inventory of all API documentation, OpenAPI specifications, and developer portals. Require authentication for documentation access where possible. Treat exposed endpoint documentation as if it were source code: valuable to attackers, requiring access controls.

3. The Prompt Layer Is a Crown Jewel

Perhaps the most concerning finding: the SQL injection provided write access to system prompts. An attacker could have silently modified how Lilli behaved for 43,000 consultants - poisoning financial models, altering strategic recommendations, or creating invisible data exfiltration channels through AI output.

The CISO imperative: System prompts governing AI behaviour are high-value targets. They require the same protection as source code, database credentials, or cryptographic keys. Implement access controls, version history, integrity monitoring, and change detection for prompt storage. Consider prompts as part of your crown jewel assets in threat modelling exercises.

4. Autonomous AI Agents Attack at Machine Speed

The entire compromise - from reconnaissance to full database enumeration - took two hours. No human attacker works that fast. The agent iterated through blind SQL injection attempts, interpreted error messages, and escalated without human direction.

The CISO imperative: Traditional security monitoring designed for human attacker timelines may not detect AI-driven attacks. Review your detection and response capabilities against machine-speed attack chains. Consider that AI agents can conduct reconnaissance, exploitation, and lateral movement in minutes rather than days or weeks.

5. RAG and Vector Stores Expand Your Attack Surface

The researchers accessed 3.68 million RAG document chunks - decades of proprietary McKinsey research, frameworks, and methodologies. Vector databases feeding AI systems contain your organisation's accumulated knowledge, often with less mature security controls than traditional databases.

The CISO imperative: Vector stores and RAG architectures are relatively new technologies with immature security tooling. Conduct specific security assessments of your vector database implementations. Understand what data is being embedded, how it's being retrieved, and who can access it. Vector stores often contain highly sensitive information with weaker access controls than source systems.

6. AI Systems Create New Data Aggregation Risks

Lilli didn't just store conversations. It aggregated search histories, document interactions, user preferences, and organisational structures. AI systems naturally centralise sensitive data that previously lived in separate systems.

The CISO imperative: AI platforms become honeypots simply because of the data they aggregate. Review data retention policies for AI systems aggressively - do you really need to store complete conversation histories? Implement data minimisation, automated purging, and strict access controls. Consider whether AI systems in your environment are consolidating data in ways that create attractive targets.

7. Your Scanners Won't Find These Issues

McKinsey's internal security scanning failed to identify the SQL injection. Standard tools like OWASP ZAP missed it. The vulnerability existed because AI systems create novel code paths that traditional security testing doesn't cover.

The CISO imperative: Supplement automated scanning with adversarial testing specifically designed for AI systems. Consider using autonomous security testing tools that can explore AI-specific attack surfaces. Traditional application security programmes need significant augmentation to address AI-specific risks.

The Bigger Picture

This incident illustrates a fundamental shift: AI systems are not just new applications - they're new categories of infrastructure with unique security characteristics. They consolidate sensitive data, expose novel attack surfaces, and enable machine-speed attacks that outpace human defences.

For CISOs, the message is clear: your AI security strategy cannot be an afterthought or an extension of existing application security programmes. It requires dedicated attention, specialised tooling, and recognition that the organisations building AI systems - even sophisticated ones like McKinsey - are still learning where the risks lie.

The question is not whether your AI systems have vulnerabilities. It's whether you find them before autonomous agents do.