Why does cost attribution matter?
For most LLM applications, inference is the largest variable cost, and it scales with usage in ways that are easy to lose track of. Without attribution, you see a single growing bill and can't tell whether it's driven by one expensive feature, a handful of heavy users, retries, or an agent making far more calls than expected. You can't optimise what you can't see.
Cost attribution turns that bill into a map. Once you can say "this feature costs X per request" or "this customer's usage costs Y," you can make decisions: cap or price the expensive path, route it to a cheaper model, cache it, or redesign it. Attribution is the measurement step that every cost-reduction effort depends on.
How do you attribute LLM cost?
The mechanics come down to tagging and logging every model call with the context needed to group spend afterward:
- Token logging per call — record input and output tokens, model, and computed cost for every request, typically via an observability layer like Langfuse.
- Dimensional tags — attach the feature, user or tenant, request type, and (for agents) the step or tool, so spend can be sliced by any of them.
- Agent-step granularity — in multi-agent systems, attribute cost down to individual agents and tool calls, since one user action can fan out into many model calls.
- Aggregation and alerting — roll the tagged data into per-feature and per-customer views, and alert when any dimension spikes.
From attribution to reduction
Attribution is the diagnosis; reduction is the treatment. Once Prodinit could see where a voice AI platform's token spend concentrated, the path was clear — distilling the expensive GPT-4.1 calls into a fine-tuned GPT-4o-mini handled the bulk of traffic at far lower cost, cutting inference spend 70%. The cut was only possible because the cost was attributed first; you can't target what you can't measure.