Theo Research

We publish our numbers

Live performance data from production. No cherry-picked benchmarks, no inflated scores. See exactly how Theo performs, updated continuously.

Loading performance data...

METHODOLOGY

How we test. Every prompt suite, environment spec, competitor configuration, and statistical approach documented in full.

Read methodology

BENCHMARK RUNNER

Our benchmark harness is open source. Run the same tests yourself against Theo and competitors with your own API keys.

View on GitHub

PROMPT SUITES

Every prompt used in our benchmarks is published. Verify our tests, suggest improvements, or use them to evaluate your own systems.

code_generationcreative_writingfactual_qaresearchimage_routingreasoningambiguous