Every turn an agent (Hermes) receives a user query. Normally it calls gpt-5.2 to pick the right tool — slow, expensive, and gpt-5.2 answers the same tool-selection questions over and over.
tracer is a learned router that lives between the agent and the LLM. For every turn it decides: can I classify this cheaply myself (handled — no LLM call) or is it weird / uncertain and I should defer (→ gpt-5.2)?
Under the hood it's classic scikit-learn — logistic regression, extra-trees, random forests, a small MLP — fitted on BGE-M3 embeddings → PCA-128, then gated by a learn-to-defer threshold calibrated at TA = 0.90. No fancy deep nets. ~80 % of turns routed locally, ~1/50 the cost and 1/60 the latency.
adrida/tracer