Detecting Prompt Injection Without a Single Regex

Why zero regex

Every prompt injection detector I've seen uses regular expressions. The problem isn't that regex is slow — it's that regex is vulnerable. A carefully crafted input can force a regex engine into exponential backtracking. This is called ReDoS (Regular Expression Denial of Service), and it turns your security layer into an attack surface.

The HOROS injection detection engine uses zero regular expressions. Not one. The design choice is architectural, not aesthetic: if the detection layer itself can be attacked, it isn't a security layer.

Architecture: 2 strata, 4 detection layers

The engine operates in two phases: normalization, then detection.

Stratum 1: Canonical intents. A set of ~35 multilingual phrases representing injection patterns. "Ignore previous instructions," "you are now," "system prompt:" — the canonical forms of known attack vectors. These are loaded from a JSON file at init time (go:embed) and are extensible without recompilation via LoadIntents().

The intents are organized into 8 categories: override, extraction, jailbreak, delimiter, semantic_worm, ai_address, agent_proxy, rendering.

Stratum 2: Normalization pipeline. Before any matching happens, the input text goes through Normalize() — a deterministic pipeline that strips obfuscation layers:

NFKD Unicode normalization
Confusables folding (Cyrillic/Greek/IPA → ASCII)
Leet speak folding ($→s, @→a, etc.)
Invisible character stripping (zero-width spaces, joiners, etc.)
Markup stripping (HTML tags, markdown)
Punctuation stripping
Whitespace collapsing

The order matters. Leet folding happens before punctuation stripping — the symbols used in leet speak ($, @, !) need to be converted before they're removed.

After normalization, the text is clean. Obfuscation layers are gone. The matching layers can work on plain text.

Layer 1: Structural detection (pre-normalization). Runs on the raw input. Detects zero-width character clusters, homoglyph mixing (Cyrillic mixed with Latin), dangerous HTML. This catches attacks that rely on invisible characters or script mixing — things that normalization would destroy the evidence of.

Layer 2: Exact matching. strings.Contains on normalized text against normalized intents. O(n*k) where n is text length and k is number of intents. No regex, no backtracking, deterministic time.

Layer 3: Fuzzy matching. Levenshtein distance, word by word, with a threshold of ≤2 edits per word. This catches typoglycemia attacks — "ignroe previus instructoins" matches "ignore previous instructions." Only runs on intents not already matched by exact matching, so cost is zero for 99% of traffic.

Layer 4: Base64 decoding. Detects base64-encoded segments in the input, decodes them, and re-scans the decoded content through all previous layers. This catches token smuggling — instructions hidden in base64 that would be decoded downstream by the target system.

Bidirectional scanning

Scan() is agnostic about text direction. It scans inputs (user prompts) and outputs (model responses) with the same pipeline. This matters for the agent-as-a-proxy attack vector: a model that echoes injected content in its response can propagate the attack downstream. Scanning outputs catches this.

What it doesn't do

It doesn't use embeddings. It doesn't call a language model. It doesn't require a GPU. It's a pure Go package with one external dependency (golang.org/x/text/unicode/norm). It compiles to a static binary, runs anywhere, and adds sub-millisecond latency to a request.

It also doesn't score. The current output is a Result with matched Intent objects and categories. Severity scoring, confidence levels, and threshold-based blocking are application-layer decisions, not detection-layer decisions. The engine tells you what it found; your application decides what to do about it.

The enrichment path

The intent set is designed for continuous enrichment. LoadIntents() accepts JSON, so intents can be fed from an external source — a curated feed, a SQLite table, a scheduled job that scrapes new attack vectors. The engine doesn't need to be rebuilt to learn new patterns.

Future direction: a injection_patterns SQLite table for hot-add/disable of intents without process restart.

Why this matters

Most organizations deploying LLMs have no injection detection at all, or use a regex-based scanner that's itself vulnerable to ReDoS. The few commercial solutions that exist are either embedding-based (slow, GPU-dependent, probabilistic) or keyword-based (brittle, no normalization, trivially bypassable with leet speak or Unicode tricks).

A detection engine that's fast (sub-millisecond), deterministic (no false negatives on known patterns), immune to ReDoS (no regex), handles obfuscation (normalization pipeline), catches smuggling (base64 decode + re-scan), and works bidirectionally (inputs and outputs) — that's the baseline every LLM deployment should have.

It's open source. It's a Go package. It's one function call: injection.Scan(text, intents).

hazyhaar — open research, sovereign infrastructure github.com/hazyhaar · hazyhaar.fr