Every Input Is Hostile — Even When It Comes From Your Own Team

ai-architecture failure-modes input-validation comprehension-audit

One malformed JSON record took down an entire notification pipeline. The input wasn't an attack. It was a database migration.

The notification settings record looked like every other record in the table. Customer ID, notification type, schedule frequency, recipient list. Thousands of rows, all following the same schema. Except one.

One record stored its recipient list as 1 instead of ["1"]. Not a string. Not an array. A bare integer that had survived a database migration three months earlier, sitting in production like a landmine with a patient fuse.

The processing pipeline deserialized every record’s recipient field as a JSON array. It had done this correctly for thousands of records across hundreds of daily runs. When it hit the integer, the deserializer didn’t throw an exception. It returned null. The downstream template builder received a null recipient list, skipped the record silently, and moved on to the next customer. No error in the logs. No alert. No failed job. Just one customer who stopped receiving notifications and had no idea why.

Scene: Database table with thousands of rows, one highlighted row with corrupted data field

We found it because the customer called. Not because our monitoring caught it. Not because our test suite flagged it. A human being picked up a phone and told us their system was broken.

The fix took ten minutes. The audit that followed took three days.

I’ve been building production systems long enough to expect hostile input from external sources. User-submitted forms get sanitized. API endpoints get rate-limited. Public-facing surfaces get the full defensive treatment because the threat model is obvious — you don’t control who’s on the other side.

What I kept underestimating was the input that comes from inside the house.

A year after the notification record incident, I shipped the first production publish from a content automation pipeline I’d built — an AI-powered system that generates, validates, and deploys marketing content across multiple brands. The individual components had been tested extensively. The content generation worked. The validation worked. The deployment pipeline worked. I pressed the button to run the full sequence for the first time.

The text appeared on the live site. For about thirty seconds, I thought it was perfect.

Seven Failures in Ninety Seconds

Then seven failures surfaced in sequence. Not in any single component — every component had passed its own tests. The failures lived in the integration surfaces. A filename generated by one component didn’t match the pattern expected by the next. A metadata field that was optional in the generator was required by the validator. A deployment path that worked in the staging environment resolved differently in production because the environment variable format had changed between deploys.

Seven bugs. All invisible to component-level testing. All visible only when real data flowed through the complete pipeline for the first time.

The pattern was identical to the notification record. The input wasn’t hostile in the security sense — nobody was attacking the system. The input was hostile in the engineering sense — it violated assumptions that were never documented, never tested, and never visible until production traffic exposed them.

A month later, a different system taught me the same lesson from the consumer side. I built a lead generation tool for a financial services site — a diagnostic assessment that collected email addresses in exchange for a personalized report. The tool worked. People filled out the form, provided their email, received their report. The email list grew.

Then I audited the delivery paths.

Three separate routes let someone access the report content without providing an email address. A direct URL to the PDF that wasn’t behind an authentication check. A cached version in the CDN that served the content without triggering the email-gate middleware. A preview mode that I’d built for internal testing and forgotten to disable in production.

The failure mode wasn't a crash. It was the system working exactly as built — just not as intended.

The system never crashed. It never threw an error. It delivered exactly what it was designed to deliver. The failure mode wasn’t a crash — it was the system working exactly as built, just not as intended. The email gate existed on the happy path. It didn’t exist on three other paths that I’d created myself.

These experiences changed how I think about input validation. The traditional model — sanitize user input, trust system input — is backwards. System input is more dangerous precisely because it’s trusted. The notification pipeline trusted its own database. The content pipeline trusted its own component outputs. The lead generation tool trusted its own access controls. All three failed at the trust boundary.

When I designed the input handling for the Comprehension Audit, I started from a different assumption: every input is hostile, including the ones I generate myself.

The tool accepts free-text responses to four open-ended questions about AI projects. Those responses get sent to an LLM judge for scoring. The attack surface isn’t sophisticated — it’s a text field on a website. But the failure surface is enormous. A response could contain XML that breaks the structured prompt template. It could contain prompt injection attempts that redirect the judge’s scoring behavior. It could contain 50,000 characters that blow out the context window and produce degraded assessments. It could contain nothing at all.

The sanitization layer doesn’t use regex filtering. Regex is a blocklist — a finite set of rules trying to catch an infinite set of attacks. Every regex pattern I write is a bet that I’ve imagined every possible malformed input. The notification record incident proved that bet loses.

Instead, the system uses XML tag sandboxing. User input gets wrapped in clearly delineated tags that the LLM judge is instructed to treat as opaque data, not as instructions. The input doesn’t need to be “clean” — it needs to be contained. The boundary between “data to evaluate” and “instructions for evaluation” is structural, not pattern-matched.

A hard truncation limit of 2,000 characters per response prevents context window abuse. Not a soft warning. Not a client-side character counter that a determined user can bypass. A server-side truncation that fires before the input reaches the judge, regardless of what the frontend says.

Rate limiting isn’t implemented as an error state. It’s implemented as a UX state — one of five explicit states the frontend manages. A user who hits the rate limit sees a designed experience that tells them what happened and when to come back, not a raw 429 response code or a generic error modal. Because rate limiting isn’t a failure. It’s the system protecting itself, and the user deserves to know that’s what’s happening.

Scene: Architect drawing containment boundaries on a whiteboard, structural defense diagram

The discipline underneath all of this has a name: Failure Mode Engineering. Not “error handling.” Not “input validation.” Failure Mode Engineering — the practice of designing for the specific ways a system will fail before designing for the ways it will succeed.

The notification record failed because nobody wrote the failure spec for “what happens when a recipient field contains a non-array value.” The content pipeline failed because nobody wrote the failure spec for “what happens when component A’s output doesn’t match component B’s expected input format.” The lead generation tool failed because nobody wrote the failure spec for “what are all the paths to the protected content, and which ones enforce the gate.”

In each case, the happy path was well-designed. The failure path was unconsidered. And the failures that actually hurt were never the dramatic crashes — they were the silent successes. The system that processes a malformed record and moves on. The pipeline that deploys content with wrong metadata and reports success. The gate that protects one door while three others stand open.

The failures that actually hurt were never the dramatic crashes. They were the silent successes.

The most dangerous input isn’t the one that breaks your system. It’s the one that passes through your system and produces wrong output that looks right. The notification customer who stopped getting emails. The content that deployed with incorrect metadata. The leads who got the report without giving their email. All systems green. All outputs wrong.

Every input is hostile. Especially the ones you trust.

The Comprehension Audit scores failure mode awareness as one of eight evaluation dimensions — and it’s weighted above the baseline because this is where comprehension gaps cause the most expensive damage. Explore the sanitization architecture in the open-source repository.

Wilfred Morgan

AI Systems Architect · Agentic AI Implementation

Book a Strategy Call →