- Do you use a framework (LangChain, CrewAI) or roll your own - How do you handle agent-to-agent data passing? - What does your observability look like for agent runs? - Are you running agents on cron/webhooks or manual-only?
Interested in hearing what's working and what's painful.
The naive approach is stateless. Each reply gets processed independently. This breaks down fast when a prospect says "as I mentioned before" and the agent has no memory of what they mentioned before.
What has worked better: treating the entire conversation thread as the context window, not just the latest message. Every reply, every prior message, the research done on the prospect at the start, all of it gets passed through. The agent always knows where it is in the conversation and what has already been said.
The second problem is confidence calibration. Multi-agent systems in production need to know when to act autonomously and when to surface something for human review. In sales specifically, the cost of an agent saying something wrong to a real prospect is high. We err toward flagging ambiguous situations rather than guessing.
The pattern that has held up: agents own clearly bounded tasks end to end (research, draft, send, parse reply), with a thin orchestration layer that routes based on reply classification. Classification is the hardest part to get right and the most important to get right.
Observability is the part most people underestimate. I log every agent run with input, output, token usage, and latency to a dedicated collection. Simple but it catches failures fast.
How do you handle agent-to-agent data passing? - We do have a memory concept for the pipeline we are in
What does your observability look like for agent runs? - locally, we are using our own test abstraction and eval. For production, we are using https://www.wayfound.ai
Are you running agents on cron/webhooks or manual-only? - webhook and cron when needed
Also curious if you're running agents on triggers (webhooks, cron) or mostly manual execution?