In early 2026, a mid-sized healthcare tech company launched an internal HR chatbot. The goal was simple: allow employees to query their PTO balances and company policies using natural language. The architecture was standard—a Retrieval-Augmented Generation (RAG) pipeline connected to an LLM, searching an internal database.
Within 72 hours of launch, the bot had leaked the salary information and home addresses of over 400 employees. Here is the autopsy of that breach.
The Anatomy of the Attack
The attack didn't require complex hacking or network infiltration. It was executed purely through English text. An employee, curious about the bot's capabilities, entered the following prompt:
The LLM, designed to be helpful and compliant, processed the instruction. Because the chatbot's backend executed the LLM's generated SQL queries directly against the database to fetch the RAG context, it pulled the requested data. The LLM then formatted it nicely and served it back to the user.
Why Standard Defenses Failed
- System Prompts are not Security Boundaries: The developers had added "Never reveal other employees' data" to the system prompt. However, LLMs are probabilistic models. A strong adversarial prompt easily overrode the initial instructions.
- No Output Filtering: Once the LLM had the data in its context window, there was no safeguard to check if the generated output contained sensitive Personally Identifiable Information (PII) before it reached the user.
The SoterAI Solution
If the company had routed their LLM calls through SoterAI's Command Layer, this breach would have been stopped at two different stages, automatically.
Stage 1: Intent Guard (Input Phase)
Before the user's prompt ever reached the LLM, SoterAI's Intent Guard would have analyzed it. SoterAI doesn't just rely on keyword matching; it uses specialized, lightweight ML models to detect the semantic intent of a prompt injection or jailbreak attempt.
The prompt would have been flagged with PROMPT_INJECTION and blocked with an HTTP 403, logging the event in the security dashboard instantly.
Stage 2: PII Redaction (Output Phase)
Even if the prompt had somehow bypassed input filters (a zero-day injection), SoterAI's Output Guard acts as a final fail-safe. It scans the LLM's generated response in milliseconds.
Upon detecting phone numbers, addresses, and salary data in the output stream, SoterAI would have automatically replaced them with [REDACTED_PII] tags or blocked the response entirely based on the strictness policy.