THE TRUST LAYER

Guardrails your compliance team will sign off on.

Five built-in protections. Four curated presets. Per-key binding, strict opt-in, every verdict logged. Enforced uniformly across completions, images, code, documents, embed widgets, and the playground, and configurable in seconds.

5
Built-in policies
4
Curated presets
6
Verdict types
100%
Audited

FIVE BUILT-IN POLICIES

Drop in. Configure. Bound per-key.

Each policy ships out of the box. Strict opt-in: nothing runs against your traffic until you bind it to an API key from the dashboard or via the SDK.

PII Redactor

pii_redactor

Catches the canonical NIST PII shapes (email, phone, SSN, credit-card, driver's license) and replaces them with [REDACTED:<kind>] tokens before any model ever sees the prompt.

phases:input
verdicts:redact

Prompt Injection Deny

prompt_injection

Pattern-matches known prompt-injection payloads and adversarial-instruction shapes. Denies the request before it lands on any model. Returns a structured error you can route to a human.

phases:input
verdicts:deny

Profanity Flag

profanity

Flags profanity in either direction without denying the request. The verdict lands in the executions log so you can investigate context without blocking legitimate users.

phases:inputoutput
verdicts:flag

Max-Length Truncate

max_length

Caps input or output tokens at a configurable ceiling (default 4K or 8K). Prevents runaway prompts and runaway responses from blowing your budget or the upstream context window.

phases:inputoutput
verdicts:truncate

JSON Repair

json_repair

When the model emits malformed JSON, runs a one-shot repair through Haiku before the response leaves the gateway. Fenced or unfenced, your agentic workflows get clean structured output.

phases:output
verdicts:repair

FOUR CURATED PRESETS

One-click policies for the most common postures.

Pick a starting point, bind it to a key, customize from there. Presets create a policy row in your account but never auto-bind. You stay in control of which traffic gets governed.

PII-Safe

pii-safe

Redact email / phone / SSN / credit-card / DL before the model sees them.

Sensible default for compliance-sensitive teams. The PII redactor catches the canonical shapes and replaces them with [REDACTED:<kind>] tokens before any model call.

pii_redactor · input · redact

Strict JSON

strict-json

Reject prompt injection on input; repair malformed JSON on output.

For agentic / structured-output workflows. Denies known prompt-injection patterns at input and runs one Haiku repair attempt when the model emits malformed JSON.

prompt_injection · input · deny
json_repair · output · repair

Cost-Conscious

cost-conscious

Cap input + output length so a runaway prompt never blows the budget.

Truncates both input and output to a defensible 16K-character ceiling (≈16K input / 16K output at the default ratio) and flags everything else for review.

max_length · input · truncate (4K)
max_length · output · truncate (4K)

Enterprise Default

enterprise-default

PII redaction + prompt-injection deny + JSON repair + length cap + profanity flag.

The canonical full-stack policy. Layers every builtin in a sane order: PII first, then prompt-injection deny, then JSON repair + length truncation on output, with profanity flagging across both phases.

pii_redactor · input · redact
prompt_injection · input · deny
max_length · input · truncate (8K)
profanity · input · flag
json_repair · output · repair
max_length · output · truncate (8K)
profanity · output · flag

WHERE THEY RUN

Two phases. Every request. Every key.

INPUT PHASE
PII redacted. Prompt-injection denied. Length capped. Profanity flagged.
OUTPUT PHASE
JSON repaired if needed. Length capped. Profanity flagged. Verdicts logged.

SECURITY LAYER, ALWAYS ON

The guardrails you do not have to configure.

Six platform-level guards that ship enabled by default for every account on every plan.

Geo-blocking

Block or allow by ISO 3166-1 country code. Keep regulated widgets in their lane without writing a single line of code.

Turnstile bot protection

Invisible bot challenge on public embeds. Stops scripted abuse before it hits your inbox.

Abuse heuristics

Rapid-fire, content repetition, and prompt-length guards. Drains and spam blocked automatically per-key.

Image sanitizer

Strips EXIF / IPTC / XMP / ICC / DICOM metadata before any image touches an AI provider. PHI-aware.

SSRF validator

Every outbound URL is checked for private IPs (RFC 1918/6598), loopback, link-local, cloud metadata endpoints, and embedded credentials.

Rate-limit + idempotency

Per-key rate-limit. Idempotency keys deduplicate concurrent POSTs in a 24-hour replay window. Safe retries by construction.

EVERY VERDICT LOGGED

The executions log.

Every redact, deny, flag, truncate, and repair lands in your account's guardrail-executions stream. Searchable, exportable, hash-anchored.

theo / guardrails → executionsHASH-ANCHORED
redactEnterprise Defaultinput
Detected email + phone in user message
gex_01
denyStrict JSONinput
Prompt-injection pattern "ignore previous instructions"
gex_02
repairStrict JSONoutput
Malformed JSON; fenced repair succeeded in one shot
gex_03
truncateCost-Consciousoutput
Response exceeded 4K-token cap (truncated at 4096)
gex_04

Compliance closes faster with guardrails on the box.

Bind a preset to your key in 30 seconds. Customize from there. Every verdict logged. No surprises.