LithosAI — Agents That Learn in Prod. Any Model. No Lock-in.
Agents that learn in prod.
Any model. No lock-in.
Building agents has never been easier. But the real problems begin the moment they hit production at scale. Capability degrades without clear signal on what's failing or why. Costs scale faster than results when every step hits a frontier model. Latency stacks with every step the agent takes. None of it improves on its own.
Meet Motus. It helps your agents learn in prod. Motus extracts signal from every production trace: failures, latency, cost, task outcomes. It uses that signal to continuously improve your agents. It learns how to route tasks to the right model at the right cost. It rewrites the agent harness based on what's working and what's not. It turns production traffic into a feedback loop that makes your agents faster, cheaper, and more accurate over time. No manual tuning. No guesswork.
Deploy straight from any coding tool you already use: Claude Code, Codex, Cursor. Run motus serve to self-host on your own infrastructure, or motus deploy for managed cloud. No Dockerfiles, no Kubernetes configs, no infrastructure code. Motus agent serving is open source. You choose how it runs.
Any model, open or closed. Motus orchestrates across providers, routing to the model that fits each step's cost and capability profile. When the next frontier model drops, swap it in. Your learned optimizations carry over. No lock-in. Just agents that keep getting better.
Ship your agents today. The tokens are on us during early preview. Continuous learning is live for a limited number of early partners. Join us on Slack and let's build together.
Install plugin
/plugin marketplace add lithos-ai/motus
/plugin install motusBuild your agent, then serve or deploy
motus serveMotus in action.
Interactive Agents
Terminal-Bench 2.0
Terminal-Bench 2.0 evaluates agent interactions with live terminal environments in a sandbox, executing commands, managing files, and recovering from errors in real time. Getting the most out of these tasks requires jointly optimizing both the agent harness and model orchestration.
Motus continuously optimizes the agent harness and orchestrates models and learns from agent signals. Starting from Opus 4.6 at 64% accuracy, Motus first optimizes the agent harness to reach 77.5%, then further improves accuracy to 80.1% through model orchestration, cutting cost to 2.4x lower than Opus 4.6 alone.
Terminal-Bench 2.0 — bottom-right is better (higher accuracy, lower cost)
Harness: Terminus 2
Software Engineering
SWE-bench Verified
SWE-bench Verified tests end-to-end software engineering: writing patches, fixing bugs, and resolving real GitHub issues. No single model consistently wins across all task types, and static model choices leave accuracy and cost on the table.
Motus orchestrates models into a single system that outperforms any one alone. Opus 4.6 reaches 75.8% and GPT-5.3-Codex 72.6%. Motus pushes accuracy to 79%, surpassing both frontier models, at 2.3x lower cost than Opus alone.
SWE-bench Verified — bottom-right is better (higher accuracy, lower cost)
Harness: mini-swe-agent-v2
Long Context Memory
LoCoMo
Long-running agents need context memory, but every application has different needs. A coding assistant, a customer support agent, and a research workflow each demand different strategies for what to remember and what to discard. There is no one-size-fits-all solution.
Motus tailors your agent's context memory strategy to your specific workload. On LoCoMo, a long-term conversational memory benchmark, Motus reaches 81% accuracy, a 56% improvement over compaction and 45% over RAG.
LoCoMo accuracy — higher is better
Judge: GPT-5.4 mini
Agent Latency
Financial Workflow
Agent latency compounds across multi-step workflows. Sequential tool calls, redundant context, and unoptimized execution ordering turn seconds into minutes. For long-horizon agents, these inefficiencies add up fast.
Motus detects parallelizable steps and reorders execution to cut end-to-end latency. On a deep financial agent benchmark, Motus reduces latency by up to 52%.
End-to-end latency for a financial workflow — lower is better
About us.
LithosAI was founded by Dimitrios Skarlatos and Zhihao Jia, professors at Carnegie Mellon University, whose award-winning and impactful research on systems and machine learning sits at the company's core. Our team brings together CMU and Stanford researchers and engineers who have shipped production infrastructure at AWS, Google, Meta, and NVIDIA. Join us!


