There's a quiet failure mode in AI sales tools that nobody markets against. The follow-up email, the meeting summary, the deal-specific one-pager — they get written by a language model that was given a prompt and asked to produce something plausible. The result reads well. It also fabricates details that were never said on the call.
We've seen drafts that promise to send pricing the rep never mentioned. We've seen meeting summaries that attribute quotes to people who weren't in the meeting. We've seen one-pagers that cite customer wins from companies the seller never worked with. The model isn't malicious. It was trained to be helpful, and 'helpful' in the absence of grounding looks like confident invention.
Why this is worse than a hallucinated paragraph in ChatGPT.
When ChatGPT hallucinates a fact, the user sees it and corrects it. When a sales tool hallucinates a customer commitment in a follow-up email, the email gets sent. The rep is operating at the speed of trust they have in the tool — they spot-check the first few, then assume the rest are fine. The hallucinated artifact reaches the prospect with the rep's name on it. By the time someone notices, the deal is already weird.
Hallucinated next-steps are the worst version of this. The prospect reads 'I'll send the case study by Wednesday' and waits. Wednesday comes, no case study, because the case study was invented. The rep doesn't know it was promised. Trust is gone before the rep ever finds out.
Prompt-grounded vs. transcript-grounded.
There are two architectures, and most buyers don't distinguish them.
- Prompt-grounded: the user types 'write a follow-up for the call I had with Acme.' The model invents the rest. Whatever it produces is plausible-by-default and unverifiable.
- Transcript-grounded: the model is given the actual transcript, plus a structured extraction of who was on the call, what was discussed, what commitments were made. The model writes from that ground-truth context. If it tries to reference something that wasn't said, it has no source for it and the structure refuses the claim.
What it takes to prevent the failure mode.
Transcript-grounded isn't enough on its own. The prompt has to forbid the dangerous moves explicitly. We maintain a rule list in production that includes things like:
- Never reference a customer name unless it appears verbatim on our verified customer list.
- Never offer to send an artifact (case study, deck, pricing) unless that artifact was committed to on the call.
- Never assert whether a prior commitment was fulfilled — we don't have visibility into the rep's outbox.
- Never write as if you were on the call when the sender is a different rep than the original caller.
Each rule maps to a credibility bomb we've seen go off. The list grows as we discover new failure modes in production data.
Why this matters when you're evaluating tools.
Ask any AI follow-up vendor: 'What does your system do when the model tries to reference a customer the rep doesn't have?' If the answer is 'the prompt asks it not to,' you're looking at prompt-grounded with optimism. If the answer is 'it's structurally impossible because the customer list is a hard input,' you're looking at the right architecture.
Generic AI follow-up tools can be useful at speed. They can also fabricate the commitments that destroy your reputation with the prospect. The difference between the two is architectural, not promotional.