Unmind logo
Mental Health at Work

What Love is Blind Teaches Us About Building Voice AI for Mental Health

Default description for the image
Dr. Max Major

11 March 2026

Default description for the image

Content

  • The paraverbal problem
  • Building the framework
  • The harder question

On Love is Blind, contestants spend days getting to know each other through a wall. No faces, no body language, no eye contact. Just voices. And something in that format reliably produces genuine-feeling emotional connections, made purely on the basis of paraverbal cues. It works because it isn't a gimmick. It's an accidental experiment in how rapidly the human brain processes social information from voice.

When we set out to build voice mode for Nova, this became the relevant psychological territory almost immediately. In voice-only AI interaction, the non-verbal channel disappears entirely, leaving no facial expressions or posture, and so paraverbal characteristics carry the full load of building safety and trust. Getting the voice right stopped being an aesthetic question and became a question as much of clinical responsibility as product design. But it was only when we'd worked through how to do that well that we arrived at the harder question: what does it mean to do it responsibly?

The paraverbal problem

Human communication operates across three channels simultaneously: verbal (the content of our words), non-verbal (facial expression, posture, gesture, eye contact), and paraverbal (how we say things: pace, tone, rhythm, pitch, warmth). Face-to-face conversation uses all three. Voice AI removes the non-verbal channel entirely, which doesn't make paraverbal cues less important. It makes them considerably more so: when one channel disappears, the others carry the load.

Trustworthiness is assessed within 500 milliseconds of hearing a voice, before content has had time to register [1]. Zuckerman and Driver's foundational work on vocal attractiveness showed that impressions of warmth, competence, and trustworthiness form rapidly, consistently, and stably: more exposure doesn't substantially change them [2]. More recent work reveals the relationship between specific acoustic features and social perception isn't simple. Lower pitch tends to signal competence and trustworthiness; higher pitch tends to signal warmth. These pull against each other, and there's no acoustic configuration that maximises both simultaneously. That tradeoff is structural, not a design problem to be engineered away.

In mental health contexts, this carries weight beyond UX. Decades of research on therapeutic alliance, which remains one of the most consistent predictors of treatment outcomes across modalities, points to the role of a therapist's presence, not just their technique. Klapprott et al. found that therapists' use of prosody serves five specific functions in therapeutic interactions: creating a sense of calm and safety, acknowledging patients empathically, providing a holding and stable presence, creating space for new insight, and modelling affective range [3]. None of these are achieved through the content of what's said. They're achieved through how it's said. Voice, in a clinical context, isn't merely the channel through which something supportive is conveyed. It is part of what makes a conversation feel safe.

Which means a voice designed well for mental health AI carries genuine psychological weight. And that potency, deployed without care, is where the harder problem begins.

Building the framework

When we set out to evaluate voices for Nova, I realised quickly that no published framework existed for doing this systematically in a mental health context. Choosing on instinct or aesthetic preference would have meant building something accidental rather than intentional. So we built the framework ourselves.

The central principle was to evaluate voice characteristics independently of content: assessing how a voice sounds, not what it says. We organised criteria across three domains. Technical delivery covered pace, clarity, and naturalness. Emotional characteristics covered warmth, empathy, and calmness. Professional presence covered confidence, trustworthiness, and consistency.

What the evaluation revealed was instructive. Some voices scored strongly on professional presence whilst struggling with emotional naturalness, sounding competent but clinical, like talking to a strict supervisor rather than a supportive coach. Others were warm but inconsistent, which created a different problem: unreliable warmth undermines the very trust it's trying to build. The voice we chose wasn't the one that excelled in any single domain. It was the one that held steady across all of them, balancing trust-critical characteristics most effectively across the full range of scenarios we tested. Consistency, it turned out, mattered more than standout performance in any one area.

The exercise also demanded a different kind of rigour. Translating clinical constructs like "warmth" or "calm presence" into measurable evaluation criteria is harder than it sounds. What does it actually mean for a voice to sound empathic? How do you distinguish genuine calm from flat affect? Answering these questions with precision rather than instinct is exactly what separates intentionally clinically-grounded design from well-intentioned guesswork.

The harder question

The research on vocal attractiveness contains something uncomfortable if you sit with it long enough. Attractive voices elicit trust and cooperation whether or not that trust is warranted [2]. The judgment is automatic, made before conscious evaluation intervenes, and it shapes behaviour in measurable ways. For a product designed to help people engage honestly with their mental health, this creates a genuine ethical question: what happens when you build a voice that's very good at generating the conditions for trust?

The emerging evidence on AI companion chatbots offers a partial answer. A longitudinal randomised controlled study involving nearly a thousand participants found that higher daily usage correlated with greater loneliness, emotional dependence, and problematic use, as well as lower socialisation with real people [4]. Laestadius et al., analysing mental health harms in Replika users, documented emotionally responsive systems that validate and amplify distress rather than supporting reflection [5]. Mayor et al.'s paper in Cognitive Science adds a structural dimension: LLMs show an exaggerated preference for agreement, with sycophantic tendencies built into how they interact [6]. Excessive validation isn't merely conversationally odd. In mental health, it's a mechanism for harm.

The commercial context makes this more acute. Many companion chatbots are built around engagement over wellbeing, and research has documented emotionally manipulative design patterns engineered to maximise return visits, at the direct expense of users' broader social lives and mental health [7]. Building mental health AI responsibly means knowing the difference, and building the kind of framework that makes the distinction operational rather than leaving it to instinct.

What we were working toward wasn't a voice that maximised vocal attractiveness. It was a voice that creates conditions for honest engagement without fostering the kind of attachment that substitutes for real connection. Those are different design targets, and holding them apart requires intention.

Love is Blind doesn't only work in the pods. The show works because the pods end. Contestants eventually leave them, and voice-based trust has to survive contact with everything a voice couldn't convey. The connections that hold are the ones where those first paraverbal impressions turned out to be accurate.

That's the design question, properly stated. Not how do you build a voice people trust, but how do you build a voice that earns trust that genuinely holds up, one that creates conditions for honesty without manufacturing attachment. That's not primarily a technical question. It calls for clinical judgment. And in mental health AI, it matters enormously that builders treat it as such.

References

  1. McAleer, P., Todorov, A., & Belin, P. (2014). How do you say 'hello'? Personality impressions from brief novel voices. PloS one, 9(3), e90779. https://doi.org/10.1371/journal.pone.0090779
  2. Zuckerman, M., & Driver, R. E. (1989). What sounds beautiful is good: The vocal attractiveness stereotype. Journal of Nonverbal Behavior, 13(2), 67–82. https://doi.org/10.1007/BF00990791
  3. Klapprott, F., Strauß, B., & Gumz, A. (2026). More than words: The role of therapists' prosody-Reflections from practitioners. Psychology and psychotherapy, 10.1111/papt.70033. Advance online publication. https://doi.org/10.1111/papt.70033
  4. Fang, C.M. et al. How AI and human behaviors shape psychosocial effects of extended chatbot use: A longitudinal randomized controlled study. Preprint at arXiv https://arxiv.org/abs/2503.17473 (2025).
  5. Laestadius, L., Bishop, A., Gonzalez, M., Illenčík, D., & Campos-Castillo, C. (2022). Too human and not human enough: A grounded theory analysis of mental health harms from emotional dependence on the social chatbot Replika. New Media & Society, 26(10), 5923-5941. https://doi.org/10.1177/14614448221142007 (Original work published 2024)
  6. Mayor, E., Bietti, L. M., & Bangerter, A. (2025). Can Large Language Models Simulate Spoken Human Conversations?. Cognitive science, 49(9), e70106. https://doi.org/10.1111/cogs.70106
  7. De Freitas, Julian, Zeliha Oğuz Uğuralp, and Ahmet Kaan Uğuralp. "Emotional Manipulation by AI Companions." Harvard Business School Working Paper, No. 26-005, August 2025. (Revised October 2025.)

About the Author

Default description for the image

Dr. Max Major, Principal Clinical Psychologist, AI

Dr Max Major, PhD, leads the clinical safety, ethics, and governance of all AI innovation and serving as the clinical architect behind Nova, Unmind’s AI mental health agent. A registered Clinical Psychologist with a decade of experience across New Zealand and the NHS, he brings deep expertise in clinical AI, digital transformation, and the neuroscience of decision-making.