tracer · live tool-selection · hour 1 1,214 real session traces · BGE-M3 embeddings → PCA-128 → Extra Trees · learn-to-defer @ TA = 0.90
handled locally 0
deferred to gpt-5.2 0
avg decision latency 18 ms
· incoming queries

· tool classes

· accuracy on handled

teacher-agreement
— / — correct
· what the hell is this?

Every turn an agent (Hermes) receives a user query. Normally it calls gpt-5.2 to pick the right tool — slow, expensive, and gpt-5.2 answers the same tool-selection questions over and over.

tracer is a learned router that lives between the agent and the LLM. For every turn it decides: can I classify this cheaply myself (handled — no LLM call) or is it weird / uncertain and I should defer (→ gpt-5.2)?

Under the hood it's classic scikit-learn — logistic regression, extra-trees, random forests, a small MLP — fitted on BGE-M3 embeddings → PCA-128, then gated by a learn-to-defer threshold calibrated at TA = 0.90. No fancy deep nets. ~80 % of turns routed locally, ~1/50 the cost and 1/60 the latency.

github.com/adrida/tracer adrida/tracer
SAVED / YEAR $0
vs llm-only $0/year
SAVED / HOUR $0.00 vs llm-only $0.00 · 0 queries