Ajoxi
  • Pillar
    CLOUD PHONE

    Cloud phone, messaging, video, fax, chat — the full UCaaS stack.

    • Business PhoneCalling, SMS, video, one login
    • Customer EngagementEvery channel, one thread
    • Personal AIAI sidekick for every rep
    • SMS & MMSText from the main business line
    • Team ChatInternal chat, tied to customers
    • Video MeetingsRooms with AI notes + recap
    • Online FaxFax without the fax machine
    • Website ChatbotAuto-resolves order status & returns
    • Phone SystemModern PBX with AI built in
    Featured
    Everything included.
    Cloud phone, AI contact center, AI Receptionist, SMS, video, 300+ integrations.
    See plans & pricing
  • Core Capabilities
    • AI Receptionist24/7 first answer · 32 languages
    • AI SentimentRoutes upset callers automatically
    • AI Agent AssistWhisper scripts + next-best-action
    • Conversation IntelligenceTranscripts, sentiment, objections
    • Call RecordingFull fidelity + keyword search
    • Auto-attendantDrag-and-drop visual IVR builder
    • Supervisor ToolsListen · whisper · barge · audit log
    • Toll-free Numbers800, 888, 877 — provisioned fast
    New
    AI Sentiment · live scoring.
    Routes upset customers to senior agents the moment sentiment dips. On every paid plan.
    See AI Sentiment
  • By Industry & Team
    • FinanceSOC 2 · FINRA-ready audit trails
    • RetailOmnichannel + cart-recovery SMS
    • SaaSAPIs + Personal AI on every seat
    • LogisticsMulti-site dispatch routing
    • Sales TeamsPower dialer + live AI coaching
    • Support TeamsShared memory across 8 channels
    • Remote TeamsSame number on every device
    • SMBAI receptionist as your front desk
    • Enterprise ITSSO, SCIM, multi-site governance
    Most adopted
    A calling stack compliance trusts.
    Call recording, STIR/SHAKEN, sentiment routing. SOC 2, PCI, and FINRA-ready audit trails.
    See finance
  • Native Sync
    • HubSpotTwo-way sync · lifecycle triggers
    • ZohoCRM · Desk · Books · Bigin
    Coming soon
    Salesforce. Pipedrive. Freshsales.
    All three native two-way syncs in Q3 2026. Want a heads-up on launch?
    Email me on launch
  • Pricing
  • Learn
    • BlogEngineering & product notes
    • Customer storiesReal outcomes, real numbers
    • GuidesStep-by-step playbooks
    • WebinarsLive every Thursday · on-demand
    • Contact UsTalk to sales or get support
    Build
    • DocsHow everything works
    • API referenceREST + webhooks
    • SDKsNode, Python, Go, Ruby
    • ChangelogEvery ship, in one place
    Trust
    • Status pageLive uptime + incidents
    • Security + complianceSOC 2 · GDPR · PCI
    • PrivacyWhat we collect & why
    • TermsThe contract, in chapters
    Fresh ink
    8,400 calls, measured.
    AI receptionist accuracy by language, accent, and call type — the unedited numbers.
    Read the post
Sign inFree Trial
Cloud Phone
Business PhoneCalling, SMS, video, one loginCustomer EngagementEvery channel, one threadPersonal AIAI sidekick for every repSMS & MMSText from the main business lineTeam ChatInternal chat, tied to customersVideo MeetingsRooms with AI notes + recapOnline FaxFax without the fax machineWebsite ChatbotAuto-resolves order status & returnsPhone SystemModern PBX with AI built in
Contact Center
OmnichannelOne queue for every channelOutbound DialerPredictive, power, previewAgent AssistLive whisper coachingSupervisor AssistSpot bad calls in real timeInteraction AnalyticsAuto-QA, topic trendsEnterprise500+ seat operations
AI Family
Ajoxi VoiceAI Receptionist that books appointmentsAI AssistantDrafts, summaries, follow-upsConversation AIReads every call so you don't miss a thing
AI Receptionist24/7 first answer · 32 languagesAI SentimentRoutes upset callers automaticallyAI Agent AssistWhisper scripts + next-best-actionConversation IntelligenceTranscripts, sentiment, objectionsCall RecordingFull fidelity + keyword searchAuto-attendantDrag-and-drop visual IVR builderSupervisor ToolsListen · whisper · barge · audit logToll-free Numbers800, 888, 877 — provisioned fast
FinanceSOC 2 · FINRA-ready audit trailsRetailOmnichannel + cart-recovery SMSSaaSAPIs + Personal AI on every seatLogisticsMulti-site dispatch routingSales TeamsPower dialer + live AI coachingSupport TeamsShared memory across 8 channelsRemote TeamsSame number on every deviceSMBAI receptionist as your front deskEnterprise ITSSO, SCIM, multi-site governance
HubSpotTwo-way sync · lifecycle triggersZohoCRM · Desk · Books · Bigin
Learn
BlogEngineering & product notesCustomer storiesReal outcomes, real numbersGuidesStep-by-step playbooksWebinarsLive every Thursday · on-demandContact UsTalk to sales or get support
Build
DocsHow everything worksAPI referenceREST + webhooksSDKsNode, Python, Go, RubyChangelogEvery ship, in one place
Trust
Status pageLive uptime + incidentsSecurity + complianceSOC 2 · GDPR · PCIPrivacyWhat we collect & whyTermsThe contract, in chapters
Sign inFree Trial
Ajoxi

Cloud phone and AI contact center on one carrier-grade network.

SOC 2GDPRPCI-DSS

Cloud Phone

  • Business Phone
  • Customer Engagement
  • SMS & MMS
  • Team Chat
  • Video Meetings
  • Phone System

Contact Center

  • Omnichannel
  • Outbound Dialer
  • Agent Assist
  • Interaction Analytics
  • Enterprise CCaaS

Wholesale

  • Wholesale VoIP
  • Wholesale Voice
  • SIP Trunking
  • CLI Routes

AI

  • AI Platform
  • AI Receptionist
  • AI Assistant
  • Conversational AI
  • AI Sentiment
  • Conversation Intelligence

Solutions

  • Finance
  • Retail & eCom
  • SaaS & Tech
  • Sales Teams
  • SMB

Company

  • Pricing
  • About
  • Customers
  • Contact Us
  • Country Codes
  • Area Codes
  • Docs
  • Status
  • Security

© 2026 Ajoxi. All rights reserved.

All systems normal
  • Privacy
  • Terms
  • Security
Blog/Product/Building the agent whisper without making it creepy

Building the agent whisper without making it creepy

A coach in the agent ear is useful. A coach that talks over the agent is not. The design constraints that kept the whisper helpful instead of intrusive.

Table of Contents
  • 1.Introduction
  • 2.The whisper modality
  • 3.What the whisper actually says
  • 4.The suppression rules
  • 5.What we removed mid-pilot
  • 6.The supervisor side
  • 7.Adoption and the metric we use

Introduction

Real-time agent assist has been the conversation-intelligence demo of choice for four years. The pitch is simple: an AI listens to the call, understands what is happening, and feeds the agent useful information in real time — the right phrase to use, the right policy, the answer to the question the customer just asked. In demos it looks miraculous. In production, more often than not, agents turn it off.

When we started designing our own version of the feature in 2025, we read every published interview we could find with agents who had used the major competing tools. The pattern was consistent. Agents reported three feelings: distracted by the volume of suggestions, second-guessed when the AI suggested something different from what they were about to say, and — most often — talked over. A coach who interrupts is not a coach.

We decided the design problem was not "how do we surface more information to the agent" but "how do we surface less, more carefully, only when it helps."

Why we called it a whisper, and what that constrained

The name we landed on, "agent whisper," was load-bearing. A whisper is quiet, occasional, easy to ignore, and feels intimate rather than authoritative. The name set the bar for the design: anything that did not fit the metaphor of a quiet voice in the agent's ear got cut.

Three design constraints came directly from the metaphor.

  • One whisper at a time. The agent sees at most one suggestion on screen. New suggestions replace the old one. There is no scrolling feed.
  • The whisper has to earn the moment. If the model is not at least 80% confident the suggestion would help, no whisper is shown. Most calls receive between zero and three whispers across their entire duration. The median call gets one.
  • The whisper never speaks aloud. Visual only. The agent cannot accidentally hear the AI talking while the customer is mid-sentence.

The third constraint took the most internal debate. Some on the team argued for an actual audio whisper, with the AI murmuring suggestions into the agent's headset on a side channel. We tested this in a closed pilot. Agents hated it. The cognitive load of parsing two voices simultaneously — even when one is much quieter — was higher than the value of the suggestion. Visual won.

What the whisper actually says

The whisper has three modes, and they were arrived at by watching where agents had been hesitating during calls.

Mode 1: Policy retrieval

The customer asks a specific factual question. "Does my plan include international roaming?" The whisper retrieves the relevant policy excerpt and shows it on the agent's screen. The agent's job is to read it, apply it, and say it in their own words. The whisper does not draft the response.

Mode 2: Compliance reminder

A regulated phrase is missing from the call. On a debt-collection call, the mini-Miranda has not been delivered. On a recorded support call, the recording disclosure was not stated. The whisper fires a single-line prompt: "say the disclosure." It does not suggest exact wording — the agent has been trained on the wording — it just reminds them of the missing element.

Mode 3: Risk signal

The conversation is heading somewhere the model has learned predicts a bad outcome. The customer has used a churn-indicator phrase. The sentiment trajectory is dropping. The whisper does not tell the agent what to say; it just flags the state with a short label: "escalation risk." The agent decides what to do with that information.

None of the modes ever draft full sentences. The choice not to draft sentences was deliberate. A drafted sentence the agent reads aloud feels rehearsed to the customer and, more importantly, feels like a script to the agent. The whisper is a prompt, not a teleprompter.

The suppression rules that made it usable

The hard work of agent assist is not generating suggestions; it is suppressing the bad ones. We built a layered suppression system.

First, the confidence threshold blocks any suggestion below 80% — but this is the obvious part. Second, the recency cooldown: after any whisper is shown, no new whisper can appear for at least 25 seconds. This prevents the "flurry" effect where five back-to-back suggestions arrive during a fast call segment.

Third, the conversation-state suppression: when the agent is in the middle of an utterance the system detects as a complete thought, no new whisper appears until they pause. Agents reported this as the most valuable rule — they were no longer being interrupted mid-sentence by a notification appearing on screen.

Fourth, the per-agent calibration: each agent has a personal "whisper density" setting that the system adjusts based on dismissal rates. An agent who dismisses 60% of whispers within two seconds of seeing them gets fewer whispers. An agent who acts on most of theirs gets more. We do not surface the setting to the agent; we let it adapt quietly in the background.

What we removed during the pilot

Four features were in the early designs and did not survive contact with agents.

Confidence percentages on whispers. Early versions showed "85% confidence" alongside the suggestion. Agents read the percentage as a verdict on their own competence ("the AI is only 85% sure I should say this"), which made them dismiss good suggestions defensively. We removed the percentage display entirely. The threshold still operates internally; the agent does not see it.

Suggestion explanations. "We suggest this because the customer used phrase X." Sounds helpful. In practice, agents found the explanations more distracting than the suggestions themselves. The explanation only matters when the suggestion looks wrong, and a suggestion that looks wrong should be dismissed regardless of its rationale.

Multi-suggestion lists. An early UI showed the top three suggestions ranked. Agents picked the first one regardless of fit. We collapsed to a single suggestion to avoid the ranking-bias problem.

Supervisor-visible adoption metrics. Originally the supervisor dashboard showed "% of whispers accepted" per agent. We pulled it. Agents who knew their acceptance was being measured started accepting whispers performatively to look engaged. The metric corrupted itself.

The supervisor side of the same product

The supervisor view of the whisper is deliberately different from the agent view. A supervisor monitoring a live call sees the whisper, the agent's response to it, and a short trace of why the system surfaced it. The supervisor can also push a manual whisper to the agent — a coaching note in real time — that arrives in the same UI as the AI-generated whispers.

The manual-whisper feature is used less than we expected. Supervisors generally trust the AI whisper for the routine prompts and reserve manual intervention for unusual situations. About 4% of whispers fired in production are supervisor-initiated. The other 96% are model-initiated. We thought it would be closer to 50/50; the gap is informative.

Adoption and the metric we actually track

We deliberately do not track "whisper adoption rate" — the percentage of whispers an agent acted on. Tracking it incentivises the wrong behaviour, as the supervisor-dashboard story above demonstrates.

Instead, we track agent retention against feature-on vs feature-off cohorts, and customer outcome scores on calls where whispers fired vs comparable calls where they did not. After eight months of running both, agents using the whisper had retention 11% above the no-whisper cohort, and the calls where whispers fired had outcome scores measurably above matched controls. Neither number is the kind of headline-friendly chart that goes in a marketing slide, but they are the numbers that tell us the feature is working without distorting agent behaviour.

If we have one rule about agent assist, it is this: never measure the feature on a metric the agent can influence by performing for the measurement. Measure it on a metric the agent does not know is being measured. That is the only way the data stays honest.

Run your voice on Ajoxi.

AI receptionists, wholesale routes, virtual numbers — built on one platform with transparent pricing and a 24/7 NOC.

See pricing Talk to us
Keep Reading

Related reading

Hand-picked next reads from the Ajoxi blog.

We measured AI receptionist accuracy across 8,400 real calls
AI

We measured AI receptionist accuracy across 8,400 real calls

For three months we tracked every call the AI handled — by language, by accent, by call type — and graded the transcript against a human reviewer. The accuracy numbers were better than we expected. The failure modes were more interesting.

Read article
Why we ship STIR/SHAKEN attestation on day one
Compliance

Why we ship STIR/SHAKEN attestation on day one

Most cloud-phone vendors treat caller-ID attestation as a higher-tier feature. Carriers do not. Here is why we made it default — and what it changed for outbound answer rates.

Read article
The case for ranking calls, not sampling them
Product

The case for ranking calls, not sampling them

Random sampling misses the calls that actually matter. We rebuilt the supervisor console around a risk score — and stopped pretending QA was a numbers game.

Read article