Building the agent whisper without making it creepy

A coach in the agent ear is useful. A coach that talks over the agent is not. The design constraints that kept the whisper helpful instead of intrusive.

Introduction

Real-time agent assist has been the conversation-intelligence demo of choice for four years. The pitch is simple: an AI listens to the call, understands what is happening, and feeds the agent useful information in real time — the right phrase to use, the right policy, the answer to the question the customer just asked. In demos it looks miraculous. In production, more often than not, agents turn it off.

When we started designing our own version of the feature in 2025, we read every published interview we could find with agents who had used the major competing tools. The pattern was consistent. Agents reported three feelings: distracted by the volume of suggestions, second-guessed when the AI suggested something different from what they were about to say, and — most often — talked over. A coach who interrupts is not a coach.

We decided the design problem was not "how do we surface more information to the agent" but "how do we surface less, more carefully, only when it helps."

Why we called it a whisper, and what that constrained

The name we landed on, "agent whisper," was load-bearing. A whisper is quiet, occasional, easy to ignore, and feels intimate rather than authoritative. The name set the bar for the design: anything that did not fit the metaphor of a quiet voice in the agent's ear got cut.

Three design constraints came directly from the metaphor.

One whisper at a time. The agent sees at most one suggestion on screen. New suggestions replace the old one. There is no scrolling feed.
The whisper has to earn the moment. If the model is not at least 80% confident the suggestion would help, no whisper is shown. Most calls receive between zero and three whispers across their entire duration. The median call gets one.
The whisper never speaks aloud. Visual only. The agent cannot accidentally hear the AI talking while the customer is mid-sentence.

The third constraint took the most internal debate. Some on the team argued for an actual audio whisper, with the AI murmuring suggestions into the agent's headset on a side channel. We tested this in a closed pilot. Agents hated it. The cognitive load of parsing two voices simultaneously — even when one is much quieter — was higher than the value of the suggestion. Visual won.

What the whisper actually says

The whisper has three modes, and they were arrived at by watching where agents had been hesitating during calls.

Mode 1: Policy retrieval

The customer asks a specific factual question. "Does my plan include international roaming?" The whisper retrieves the relevant policy excerpt and shows it on the agent's screen. The agent's job is to read it, apply it, and say it in their own words. The whisper does not draft the response.

Mode 2: Compliance reminder

A regulated phrase is missing from the call. On a debt-collection call, the mini-Miranda has not been delivered. On a recorded support call, the recording disclosure was not stated. The whisper fires a single-line prompt: "say the disclosure." It does not suggest exact wording — the agent has been trained on the wording — it just reminds them of the missing element.

Mode 3: Risk signal

The conversation is heading somewhere the model has learned predicts a bad outcome. The customer has used a churn-indicator phrase. The sentiment trajectory is dropping. The whisper does not tell the agent what to say; it just flags the state with a short label: "escalation risk." The agent decides what to do with that information.

None of the modes ever draft full sentences. The choice not to draft sentences was deliberate. A drafted sentence the agent reads aloud feels rehearsed to the customer and, more importantly, feels like a script to the agent. The whisper is a prompt, not a teleprompter.

The suppression rules that made it usable

The hard work of agent assist is not generating suggestions; it is suppressing the bad ones. We built a layered suppression system.

First, the confidence threshold blocks any suggestion below 80% — but this is the obvious part. Second, the recency cooldown: after any whisper is shown, no new whisper can appear for at least 25 seconds. This prevents the "flurry" effect where five back-to-back suggestions arrive during a fast call segment.

Third, the conversation-state suppression: when the agent is in the middle of an utterance the system detects as a complete thought, no new whisper appears until they pause. Agents reported this as the most valuable rule — they were no longer being interrupted mid-sentence by a notification appearing on screen.

Fourth, the per-agent calibration: each agent has a personal "whisper density" setting that the system adjusts based on dismissal rates. An agent who dismisses 60% of whispers within two seconds of seeing them gets fewer whispers. An agent who acts on most of theirs gets more. We do not surface the setting to the agent; we let it adapt quietly in the background.

What we removed during the pilot

Four features were in the early designs and did not survive contact with agents.

Confidence percentages on whispers. Early versions showed "85% confidence" alongside the suggestion. Agents read the percentage as a verdict on their own competence ("the AI is only 85% sure I should say this"), which made them dismiss good suggestions defensively. We removed the percentage display entirely. The threshold still operates internally; the agent does not see it.

Suggestion explanations. "We suggest this because the customer used phrase X." Sounds helpful. In practice, agents found the explanations more distracting than the suggestions themselves. The explanation only matters when the suggestion looks wrong, and a suggestion that looks wrong should be dismissed regardless of its rationale.

Multi-suggestion lists. An early UI showed the top three suggestions ranked. Agents picked the first one regardless of fit. We collapsed to a single suggestion to avoid the ranking-bias problem.

Supervisor-visible adoption metrics. Originally the supervisor dashboard showed "% of whispers accepted" per agent. We pulled it. Agents who knew their acceptance was being measured started accepting whispers performatively to look engaged. The metric corrupted itself.

The supervisor side of the same product

The supervisor view of the whisper is deliberately different from the agent view. A supervisor monitoring a live call sees the whisper, the agent's response to it, and a short trace of why the system surfaced it. The supervisor can also push a manual whisper to the agent — a coaching note in real time — that arrives in the same UI as the AI-generated whispers.

The manual-whisper feature is used less than we expected. Supervisors generally trust the AI whisper for the routine prompts and reserve manual intervention for unusual situations. About 4% of whispers fired in production are supervisor-initiated. The other 96% are model-initiated. We thought it would be closer to 50/50; the gap is informative.

Adoption and the metric we actually track

We deliberately do not track "whisper adoption rate" — the percentage of whispers an agent acted on. Tracking it incentivises the wrong behaviour, as the supervisor-dashboard story above demonstrates.

Instead, we track agent retention against feature-on vs feature-off cohorts, and customer outcome scores on calls where whispers fired vs comparable calls where they did not. After eight months of running both, agents using the whisper had retention 11% above the no-whisper cohort, and the calls where whispers fired had outcome scores measurably above matched controls. Neither number is the kind of headline-friendly chart that goes in a marketing slide, but they are the numbers that tell us the feature is working without distorting agent behaviour.

If we have one rule about agent assist, it is this: never measure the feature on a metric the agent can influence by performing for the measurement. Measure it on a metric the agent does not know is being measured. That is the only way the data stays honest.

Run your voice on Ajoxi.

AI receptionists, wholesale routes, virtual numbers — built on one platform with transparent pricing and a 24/7 NOC.

See pricing Talk to us

Building the agent whisper without making it creepy

A coach in the agent ear is useful. A coach that talks over the agent is not. The design constraints that kept the whisper helpful instead of intrusive.

Introduction

We decided the design problem was not "how do we surface more information to the agent" but "how do we surface less, more carefully, only when it helps."

Why we called it a whisper, and what that constrained

Three design constraints came directly from the metaphor.

One whisper at a time. The agent sees at most one suggestion on screen. New suggestions replace the old one. There is no scrolling feed.
The whisper has to earn the moment. If the model is not at least 80% confident the suggestion would help, no whisper is shown. Most calls receive between zero and three whispers across their entire duration. The median call gets one.
The whisper never speaks aloud. Visual only. The agent cannot accidentally hear the AI talking while the customer is mid-sentence.

What the whisper actually says

The whisper has three modes, and they were arrived at by watching where agents had been hesitating during calls.

Mode 1: Policy retrieval

Mode 2: Compliance reminder

Mode 3: Risk signal

The suppression rules that made it usable

The hard work of agent assist is not generating suggestions; it is suppressing the bad ones. We built a layered suppression system.

What we removed during the pilot

Four features were in the early designs and did not survive contact with agents.

Multi-suggestion lists. An early UI showed the top three suggestions ranked. Agents picked the first one regardless of fit. We collapsed to a single suggestion to avoid the ranking-bias problem.

The supervisor side of the same product

Adoption and the metric we actually track

Run your voice on Ajoxi.

AI receptionists, wholesale routes, virtual numbers — built on one platform with transparent pricing and a 24/7 NOC.

See pricing Talk to us

Core Capabilities

By Industry & Team

Native Sync

Learn

Build

Trust

Building the agent whisper without making it creepy

Introduction

Why we called it a whisper, and what that constrained

What the whisper actually says

Mode 1: Policy retrieval

Mode 2: Compliance reminder

Mode 3: Risk signal

The suppression rules that made it usable

What we removed during the pilot

The supervisor side of the same product

Adoption and the metric we actually track

Run your voice on Ajoxi.

Building the agent whisper without making it creepy

Introduction

Why we called it a whisper, and what that constrained

What the whisper actually says

Mode 1: Policy retrieval

Mode 2: Compliance reminder

Mode 3: Risk signal

The suppression rules that made it usable

What we removed during the pilot

The supervisor side of the same product

Adoption and the metric we actually track

Run your voice on Ajoxi.

Introduction

Why we called it a whisper, and what that constrained

What the whisper actually says

Mode 1: Policy retrieval

Mode 2: Compliance reminder

Mode 3: Risk signal

The suppression rules that made it usable

What we removed during the pilot

The supervisor side of the same product

Adoption and the metric we actually track

Run your voice on Ajoxi.

Related reading

We measured AI receptionist accuracy across 8,400 real calls

Why we ship STIR/SHAKEN attestation on day one

The case for ranking calls, not sampling them

Introduction

Why we called it a whisper, and what that constrained

What the whisper actually says

Mode 1: Policy retrieval

Mode 2: Compliance reminder

Mode 3: Risk signal

The suppression rules that made it usable

What we removed during the pilot

The supervisor side of the same product

Adoption and the metric we actually track

Run your voice on Ajoxi.

Related reading

We measured AI receptionist accuracy across 8,400 real calls

Why we ship STIR/SHAKEN attestation on day one

The case for ranking calls, not sampling them