What is TRACER?
TRACER is an open-source routing layer that trains a lightweight machine-learning surrogate on your LLM's own production classification traces. It routes the predictable 90 percent of traffic to the surrogate (near-zero cost) and defers only the hard 10 percent back to the LLM. Available as a Python SDK (pip install tracer-llm) or as a one-click hosted endpoint.
How do I reduce LLM costs?
To reduce LLM costs in production, route only the requests that genuinely need an LLM. Most production LLM workloads are repetitive classification tasks (intent detection, content moderation, support triage, tool selection). TRACER trains a small ML surrogate on your existing LLM traces and routes the predictable 90% of traffic to that surrogate at near-zero cost, deferring only the hard 10% back to the LLM. Typical impact: 5,000× cheaper per call on the routed slice and 80× lower latency, with a parity gate guaranteeing quality stays above your threshold. No fine-tuning, no manual labeling required.
What is LLM routing?
LLM routing sends each request to the cheapest model that can answer it correctly, instead of always hitting the same frontier LLM. Most model routers pick which LLM to call (frontier vs smaller LLM). TRACER goes further: it routes predictable requests out of the LLM stack entirely, into a lightweight ML surrogate trained on your own production traces. Routing is gated by measured agreement with your teacher LLM, so quality never silently drops. Available as tracer-llm on PyPI or as a hosted multi-tier routing endpoint.
How much does TRACER reduce LLM cost?
On the Banking77 benchmark with 10,000 daily classification calls, TRACER offloaded 92.2 percent of traffic to a local ML surrogate at 0.961 teacher agreement, cutting per-day cost from $44.50 to $3.47, about $14,976 saved per year. Actual savings depend on your workload's predictability; the more repetitive the traffic, the larger the saving.
How is TRACER different from a model router or smaller LLM?
Most LLM cost tools keep the request inside the LLM cost structure: caching only works on exact repeats, prompt optimization shaves tokens, smaller LLMs are still orders of magnitude more expensive than CPU-class ML, and model routers only pick which LLM to call. TRACER routes predictable slices out of the LLM stack entirely, gated by measured agreement (parity) with your teacher LLM so quality never silently degrades.
How does TRACER guarantee quality on the routed traffic?
TRACER deploys a parity gate: the surrogate goes live only when its agreement with the teacher LLM exceeds your threshold (for example 0.95) on held-out calibration data. If a workload is too hard, TRACER refuses to route it and everything stays on the LLM. Every routing decision exposes the matched cluster, the per-model accuracy on that cluster, and the confidence bound, fully auditable.
What kinds of workloads does TRACER work for?
TRACER targets repetitive LLM classification workloads: intent classification, content moderation, compliance scanning, support triage, document extraction, eval pipelines, and per-step tool selection in agentic workflows. Anywhere the same kinds of decisions happen many times a day, TRACER finds the predictable slices.
How long does it take to deploy TRACER?
On the hosted version, the setup wizard is six steps: pick your task, point to your traces, choose embeddings, pick your model menu, set a quality target, and get a live HTTPS endpoint. The build runs in the background and takes minutes (not days) depending on dataset size. With the open-source SDK, the equivalent is pip install tracer-llm followed by tracer fit traces.jsonl --target 0.95 and tracer serve.
Is TRACER open source?
Yes. The TRACER routing core is MIT-licensed and available on GitHub at github.com/adrida/tracer and on PyPI as tracer-llm. The hosted version layers managed infrastructure (managed embeddings, hosted endpoint, monitoring, audit dashboard) on top of the same OSS core.
Do I need to label my training data?
No. Every classification call your LLM already makes is a labeled (input, output) pair already in your logs. TRACER fits the surrogate directly on these traces with no manual labeling. As traces accumulate the surrogate refits and coverage compounds: 43% on day 1, 98% on day 2, 100% by day 4 in the demo workload.