Skip to main content

LithosAI — Agents That Learn in Prod. Any Model. No Lock-in.

Agents that learn in prod.
Any model. No lock-in.

Building agents has never been easier. But the real problems begin the moment they hit production at scale. Capability degrades without clear signal on what's failing or why. Costs scale faster than results when every step hits a frontier model. Latency stacks with every step the agent takes. None of it improves on its own.

Meet Motus. It helps your agents learn in prod. Motus extracts signal from every production trace: failures, latency, cost, task outcomes. It uses that signal to continuously improve your agents. It learns how to route tasks to the right model at the right cost. It rewrites the agent harness based on what's working and what's not. It turns production traffic into a feedback loop that makes your agents faster, cheaper, and more accurate over time. No manual tuning. No guesswork.

Deploy straight from any coding tool you already use: Claude Code, Codex, Cursor. Run motus serve to self-host on your own infrastructure, or motus deploy for managed cloud. No Dockerfiles, no Kubernetes configs, no infrastructure code. Motus agent serving is open source. You choose how it runs.

Any model, open or closed. Motus orchestrates across providers, routing to the model that fits each step's cost and capability profile. When the next frontier model drops, swap it in. Your learned optimizations carry over. No lock-in. Just agents that keep getting better.

Ship your agents today. The tokens are on us during early preview. Continuous learning is live for a limited number of early partners. Join us on Slack and let's build together.

Install plugin

/plugin marketplace add lithos-ai/motus
/plugin install motus

Build your agent, then serve or deploy

motus serve

Motus in action.

Interactive Agents

Terminal-Bench 2.0

Terminal-Bench 2.0 evaluates agent interactions with live terminal environments in a sandbox, executing commands, managing files, and recovering from errors in real time. Getting the most out of these tasks requires jointly optimizing both the agent harness and model orchestration.

Motus continuously optimizes the agent harness and orchestrates models and learns from agent signals. Starting from Opus 4.6 at 64% accuracy, Motus first optimizes the agent harness to reach 77.5%, then further improves accuracy to 80.1% through model orchestration, cutting cost to 2.4x lower than Opus 4.6 alone.

40%50%60%70%80%Accuracy$0.00$0.20$0.40$0.60$0.80$1.00$1.20$1.40Cost / TaskContinuous agent harnessoptimization with MotusContinuous modelorchestration optimization with MotusMotus + multi-modelMotus + Claude Opus 4.6Claude Opus 4.6GPT-5.3-CodexMiniMax M2.7Kimi K2.5GPT-5.4 mini

Terminal-Bench 2.0 — bottom-right is better (higher accuracy, lower cost)

Harness: Terminus 2

Software Engineering

SWE-bench Verified

SWE-bench Verified tests end-to-end software engineering: writing patches, fixing bugs, and resolving real GitHub issues. No single model consistently wins across all task types, and static model choices leave accuracy and cost on the table.

Motus orchestrates models into a single system that outperforms any one alone. Opus 4.6 reaches 75.8% and GPT-5.3-Codex 72.6%. Motus pushes accuracy to 79%, surpassing both frontier models, at 2.3x lower cost than Opus alone.

60%65%70%75%80%Accuracy$0.00$0.10$0.20$0.30$0.40$0.50$0.60Cost / TaskMotus + multi-modelClaude Opus 4.6GPT-5.3-CodexMiniMax M2.7GPT-5.4 miniClaude Haiku 4.5

SWE-bench Verified — bottom-right is better (higher accuracy, lower cost)

Harness: mini-swe-agent-v2

Long Context Memory

LoCoMo

Long-running agents need context memory, but every application has different needs. A coding assistant, a customer support agent, and a research workflow each demand different strategies for what to remember and what to discard. There is no one-size-fits-all solution.

Motus tailors your agent's context memory strategy to your specific workload. On LoCoMo, a long-term conversational memory benchmark, Motus reaches 81% accuracy, a 56% improvement over compaction and 45% over RAG.

Motus
81%
Mem0
56%
RAG
56%
Compaction
52%

LoCoMo accuracy — higher is better

Judge: GPT-5.4 mini

Agent Latency

Financial Workflow

Agent latency compounds across multi-step workflows. Sequential tool calls, redundant context, and unoptimized execution ordering turn seconds into minutes. For long-horizon agents, these inefficiencies add up fast.

Motus detects parallelizable steps and reorders execution to cut end-to-end latency. On a deep financial agent benchmark, Motus reduces latency by up to 52%.

Motus
12.8s
Python
+34%
17.2s
LangGraph
+45%
18.6s
Google ADK
+52%
19.5s

End-to-end latency for a financial workflow — lower is better

About us.

Dimitrios Skarlatos

DimitriosSkarlatos

CEO
Carnegie Mellon
Computer Science Professor

Zhihao Jia

ZhihaoJia

CTO
Carnegie Mellon
Computer Science Professor

LithosAI was founded by Dimitrios Skarlatos and Zhihao Jia, professors at Carnegie Mellon University, whose award-winning and impactful research on systems and machine learning sits at the company's core. Our team brings together CMU and Stanford researchers and engineers who have shipped production infrastructure at AWS, Google, Meta, and NVIDIA. Join us!

The LithosAI team
The LithosAI team

Stay in the loop.