THE TRUST LAYER

Guardrails your compliance team will sign off on.

Five built-in protections. Four curated presets. Per-key binding, strict opt-in, every verdict logged. Enforced uniformly across completions, images, code, documents, embed widgets, and the playground, and configurable in seconds.

Built-in policies

Curated presets

Verdict types

100%

Audited

FIVE BUILT-IN POLICIES

Drop in. Configure. Bound per-key.

Each policy ships out of the box. Strict opt-in: nothing runs against your traffic until you bind it to an API key from the dashboard or via the SDK.

PII Redactor

pii_redactor

Catches the canonical NIST PII shapes (email, phone, SSN, credit-card, driver's license) and replaces them with [REDACTED:<kind>] tokens before any model ever sees the prompt.

phases:input

verdicts:redact

Prompt Injection Deny

prompt_injection

Pattern-matches known prompt-injection payloads and adversarial-instruction shapes. Denies the request before it lands on any model. Returns a structured error you can route to a human.

phases:input

verdicts:deny

Profanity Flag

profanity

Flags profanity in either direction without denying the request. The verdict lands in the executions log so you can investigate context without blocking legitimate users.

phases:inputoutput

verdicts:flag

Max-Length Truncate

max_length

Caps input or output tokens at a configurable ceiling (default 4K or 8K). Prevents runaway prompts and runaway responses from blowing your budget or the upstream context window.

phases:inputoutput

verdicts:truncate

JSON Repair

json_repair

When the model emits malformed JSON, runs a one-shot repair through Haiku before the response leaves the gateway. Fenced or unfenced, your agentic workflows get clean structured output.

phases:output

verdicts:repair

FOUR CURATED PRESETS

One-click policies for the most common postures.

Pick a starting point, bind it to a key, customize from there. Presets create a policy row in your account but never auto-bind. You stay in control of which traffic gets governed.

PII-Safe

pii-safe

Redact email / phone / SSN / credit-card / DL before the model sees them.

Sensible default for compliance-sensitive teams. The PII redactor catches the canonical shapes and replaces them with [REDACTED:<kind>] tokens before any model call.

pii_redactor · input · redact

Strict JSON

strict-json

Reject prompt injection on input; repair malformed JSON on output.

For agentic / structured-output workflows. Denies known prompt-injection patterns at input and runs one Haiku repair attempt when the model emits malformed JSON.

prompt_injection · input · deny

json_repair · output · repair

Cost-Conscious

cost-conscious

Cap input + output length so a runaway prompt never blows the budget.

Truncates both input and output to a defensible 16K-character ceiling (≈16K input / 16K output at the default ratio) and flags everything else for review.

max_length · input · truncate (4K)

max_length · output · truncate (4K)

Enterprise Default

enterprise-default

PII redaction + prompt-injection deny + JSON repair + length cap + profanity flag.

The canonical full-stack policy. Layers every builtin in a sane order: PII first, then prompt-injection deny, then JSON repair + length truncation on output, with profanity flagging across both phases.

pii_redactor · input · redact

prompt_injection · input · deny

max_length · input · truncate (8K)

profanity · input · flag

json_repair · output · repair

max_length · output · truncate (8K)

profanity · output · flag

WHERE THEY RUN

Two phases. Every request. Every key.

INPUT PHASE

PII redacted. Prompt-injection denied. Length capped. Profanity flagged.

→ MODEL →

OUTPUT PHASE

JSON repaired if needed. Length capped. Profanity flagged. Verdicts logged.

SECURITY LAYER, ALWAYS ON

The guardrails you do not have to configure.

Six platform-level guards that ship enabled by default for every account on every plan.

Geo-blocking

Block or allow by ISO 3166-1 country code. Keep regulated widgets in their lane without writing a single line of code.

Turnstile bot protection

Invisible bot challenge on public embeds. Stops scripted abuse before it hits your inbox.

Abuse heuristics

Rapid-fire, content repetition, and prompt-length guards. Drains and spam blocked automatically per-key.

Image sanitizer

Strips EXIF / IPTC / XMP / ICC / DICOM metadata before any image touches an AI provider. PHI-aware.

SSRF validator

Every outbound URL is checked for private IPs (RFC 1918/6598), loopback, link-local, cloud metadata endpoints, and embedded credentials.

Rate-limit + idempotency

Per-key rate-limit. Idempotency keys deduplicate concurrent POSTs in a 24-hour replay window. Safe retries by construction.

EVERY VERDICT LOGGED

The executions log.

Every redact, deny, flag, truncate, and repair lands in your account's guardrail-executions stream. Searchable, exportable, hash-anchored.

theo / guardrails → executionsHASH-ANCHORED

redactEnterprise Defaultinput

Detected email + phone in user message

gex_01

denyStrict JSONinput

Prompt-injection pattern "ignore previous instructions"

gex_02

repairStrict JSONoutput

Malformed JSON; fenced repair succeeded in one shot

gex_03

truncateCost-Consciousoutput

Response exceeded 4K-token cap (truncated at 4096)

gex_04

Compliance closes faster with guardrails on the box.

Bind a preset to your key in 30 seconds. Customize from there. Every verdict logged. No surprises.

START FREE →THE PRIVACY STACK OPEN THE GUARDRAILS PANEL