Live performance data from production. No cherry-picked benchmarks, no inflated scores. See exactly how Theo performs, updated continuously.
How we test. Every prompt suite, environment spec, competitor configuration, and statistical approach documented in full.
Read methodologyOur benchmark harness is open source. Run the same tests yourself against Theo and competitors with your own API keys.
View on GitHubEvery prompt used in our benchmarks is published. Verify our tests, suggest improvements, or use them to evaluate your own systems.