Last month I called a restaurant to move my reservation and add a person to the booking. I was helped brilliantly. Friendly, smooth, no wait. Only after hanging up did I realise I had spoken to an AI.
That unsettled me. Not because it went badly, but precisely because it went so well.
I build with AI myself, I work in it every day, and still I did not notice. That says something. Not about me, but about where voice AI now stands. The voice on the other end of the line was warm, patient and lightened the mood at exactly the right moment. Everything you want when you call a restaurant on a busy Friday. And none of it was actually felt.
In short
- Voice AI can now convincingly mimic warmth, empathy and a regional dialect — with a response time that simulates a real conversation.
- That gives your customer service three new dials: an empathetic voice for claims, a hunter for sales, a firm-but-fair tone for debt collection.
- That emotion is programmed, not felt. The performance is more consistent than a human, more patient, never grumpy.
- The question is no longer whether it can be done. The moment a customer is vulnerable, the question is how far you want to go.
What voice AI can really do now
A few years ago you could hear a computer voice within three words. Flat, mechanical, with that telltale stress on the wrong syllable. That is over. The new generation of voice models puts emotion into speech: warmth, hesitation, a smile you can hear, a brief pause before bad news comes.
Take Miso One (opent in nieuw venster), a recently released voice model from Miso Labs. It is built for expressive speech in conversations, but the interesting part is not in the technical specs. It is in what the model can take on: a character. You can instruct Miso One to speak like a friend, like a therapist, like a YouTuber or like a teacher. And that is not a cosmetic layer over a robot voice. The model adapts its word choice, its rhythm, its tone and its way of responding to that role.
That is a different order of magnitude than "the voice sounds real". A voice that sounds real is impressive. A voice that behaves like a therapist the moment someone files a claim, or like a driven salesperson the moment you discuss a quote: that is a different category.
Miso One is English-only for now and not a ready-made customer service bot. But that is not the point. The point is what it proves. The bar for synthetic speech with a convincing character is now so low that it is a matter of months, not years, before this runs on your customer service in Dutch.
And the market is already standing by. In the Netherlands, players like Voicelabs, Kollie and Klusio already offer voice agents that take phone calls, answer questions and schedule appointments. The step still missing is emotion. That step is being taken now.
Inbound, outbound, and a voice per situation
Once a voice can carry emotion, that emotion becomes a setting. A dial. You choose per situation which tone your customer gets.
Think it through along the conversations your business has every day.
Inbound, a claim notification. Someone calls because their car is a write-off, or worse. There you want a voice that stays calm, that gives space, that is empathetic. A human employee can do that, but not always. They too have a bad day, a full queue, or a previous conversation that lingered. A voice agent never does. It is just as patient at nine in the morning as at five in the afternoon after two hundred calls.
Outbound, a sales call. Here you want a different type of voice. Energetic, focused, a hunter that lands the appointment. Not soft and understanding, but persuasive and goal-oriented. The same technology, a different setting.
Debt collection. Yet another tone. Firm but fair, as the saying goes. Friendly but clear, without the irritation or discomfort a human feels when having to call someone for the third time about an outstanding invoice.
One technology, three completely different personalities, all on demand. That is not a distant future. The building blocks are here now.
The agent that speaks your dialect
And then the layer that makes it truly personal. Suppose you link the voice to the caller's hometown. Someone from Brabant gets a soft "g" and a warm, slightly slower tone. Someone from Friesland hears the sound of their own region. Someone from Amsterdam gets a more direct, faster voice.
Technically that is no longer science fiction. If a model can mimic emotion and prosody, it can also mimic a regional accent. In Dutch this does not yet exist ready-made at this level, but the direction is clear. And the effect is powerful, because nothing disarms a person faster than a voice that sounds like home.
That is where it starts to chafe.
Because why would you do that? Not because it informs your customer better. A soft "g" does not make your claims process faster. You do it because a familiar sound puts the caller at ease. You tune the voice to whatever removes resistance, at precisely the moment that caller needs something from you: a payout, an extension, a solution.
A voice that never has a bad day
I took part in a pilot with a voice agent for customer contact myself. We had loaded the company's full FAQ into the system, and the bot effortlessly gave the right answers. Even when we were deliberately vague, it correctly guessed the question behind the question. I had not expected that. The previous generation of speech bots already broke down at a sentence that was just slightly different from the standard.
And yet.
A human employee who is empathetic feels something. Maybe not always deeply, maybe worn thin after a long day, but there is a human behind that warmth. A voice agent feels nothing. The empathy is programmed. It is a performance of empathy, not empathy itself. And the uncomfortable part is: the performance is often better than the original. More consistent, more patient, never grumpy.
A human has a bad day, and that is honest. You hear it, you feel it, you know where you stand. A voice that never has a bad day is not warmer than a human. It is more convincing at something it does not feel.
That may sound like philosophy, but it is a hard business question. Because you are going to decide whether you switch this on. And if you switch it on, you also decide how far you go. Do you turn the empathy dial up to ten on a claim? Do you give your sales agent the Brabant voice because it converts better? Those are not technical choices. Those are choices about how you treat your customers at the moments they are vulnerable.
What the market does not tell you about this
The vendors of voice AI talk about efficiency. Available 24/7, shorter wait times, lower cost per call. That is all true, and it is a serious reason to look into this. Especially as an SME entrepreneur without a large service team.
What I see in practice aligns with what McKinsey calculated in February 2026: generative AI can add between 50 and 70 billion dollars of value in the insurance sector, with the greatest impact on marketing, sales and customer contact. But the most interesting thing in that report is not in the headline. AI reshapes existing models, McKinsey says, it does not replace them. The human stays. Their work changes. You can read it at McKinsey (opent in nieuw venster).
What that means for customer contact: the voice agent does not replace your employee but it does change what your employee does. The simple, repetitive conversations go to the AI. The complex, the emotional, the conversations where something is genuinely at stake stay with the human. If you choose that, at least. Because technically the AI will soon handle those emotional conversations too. The question is whether you want it to.
I wrote earlier about everyone being able to build now, but not everyone having something to say. The same applies here. Soon everyone can switch on an empathetic voice agent. The question is whether you have thought about what you are saying to your customer by doing so.
The question you have to ask tomorrow
Regulation is coming too. The EU AI Act sets transparency requirements for AI that communicates with people. In many cases a customer has the right to know they are talking to a machine. That is not a detail in the fine print. It touches the core of what you are building here.
Because suppose you have to disclose it. Suppose at the start of every conversation it sounds: "You are speaking with a digital assistant." Does that change anything about the warmth the customer experiences? About the trust? I think it does. And that is precisely the test of whether you are building something honest, or something that only works as long as the customer does not know.
My answer to that question is: I would do it. From the customer's interest. My experience, certainly with the generation under 40, is that they simply want to be helped. Quickly and well. 24/7. Whether that happens via a chatbot or a voice agent matters little to them, as long as the question gets answered. The ethics debate about feigned empathy plays out mainly in boardrooms and on LinkedIn. The customer on the line is not thinking about that. They want to know when their claim will be paid out.
That is no reason to close your eyes to the governance question. But it is the reason you must answer that question quickly. Because your customer is not waiting for the debate.
I was helped brilliantly by that restaurant. Truly, nothing to complain about. And yet I hung up with a strange feeling, because the warmth was so convincing that I had fallen for it. A restaurant moving a reservation, fine. But the same technology will soon handle your insurance claim, discuss your payment arrears, reassure you in your own dialect at a moment when you are struggling.
That it can be done is certain. Whether you should want it, that is for you to decide.
