When should an agent call tools in parallel?
Call tools in parallel when the calls are independent — none needs another's output. Fetching three documents, querying two APIs, or checking several data sources at once are all naturally parallel, and running them concurrently can turn a multi-second sequential chain into a single round trip.
Keep calls sequential when there is a true dependency: the second call needs the result of the first. Forcing dependent steps to run in parallel just produces calls made with missing information. The skill is recognising which parts of a task are independent — that is where parallelism is safe and valuable.
How do you handle partial failures?
Parallelism introduces a failure mode sequential code rarely hits: some calls return, others error or time out. The agent must continue with an incomplete picture. Three rules make this robust:
- Never hallucinate the missing result. A failed call's output must be represented as an explicit error in the context, not silently dropped — otherwise the model fills the gap with invented data.
- Decide degrade vs retry vs abort per call. Some failures are retryable (a timeout), some are fatal to the task, and some are safely ignorable. The agent or orchestrator needs that policy, not a blanket crash.
- Give the model the partial state. Tell it which calls succeeded, which failed, and why, so it can decide whether to proceed, retry, or ask for help.
The anti-pattern is treating a partial failure as a total failure (throwing away good results) or as a total success (proceeding as if nothing failed). Both produce bad agent behaviour.
Why partial-failure handling is a reliability issue
In a multi-agent system, partial failures compound: if a specialist returns incomplete results and the orchestrator doesn't know, the error propagates silently into later steps. Representing failure explicitly — as data the next agent can reason about — is what keeps a large agent system honest about what it does and doesn't know.