Theo Research

We publish our numbers

Live performance data from production. No cherry-picked benchmarks, no inflated scores. See exactly how Theo performs, updated continuously.

Loading performance data...

How we test. Every prompt suite, environment spec, competitor configuration, and statistical approach documented in full.

Our benchmark harness is open source. Run the same tests yourself against Theo and competitors with your own API keys.

Every prompt used in our benchmarks is published. Verify our tests, suggest improvements, or use them to evaluate your own systems.

code_generationcreative_writingfactual_qaresearchimage_routingreasoningambiguous