Voice variation types: Normal, Whisper, and Ecstasy

A Voice Clone built from a single tone of voice is limited. If every message your chatters send sounds the same — same energy, same mood, same intimacy level — fans start to notice. Deliah addresses this by training your clone on three distinct emotional variations, each designed for a different kind of fan interaction. Recording all three variations gives your clone the emotional range to match the moment: a warm morning check-in lands differently than a late-night intimate message, and both land better when they sound genuinely in character.

Why variations matter

Fans interact with you across a spectrum of moods and contexts. Some messages are casual and friendly. Others are intimate and personal. Others are intensely passionate. Your clone needs to be able to match each of these registers convincingly — and it can only do that if you give Deliah recordings that demonstrate those registers in your real voice. You can also supplement your variation recordings with existing content:

Existing videos — Deliah extracts the audio and uses it as training material
Existing audio files — submitted directly alongside your new recordings
Voice messages from real fan chats — authentic, unscripted material that is often the most natural-sounding input of all

The more supplementary content you include alongside your recorded variations, the more natural and versatile your final clone will be. Real, unscripted audio is especially valuable because it captures how you actually speak, not how you think you should sound.

The three variations

Normal
Whisper
Ecstasy

Goal: natural, warm, and authenticThe Normal variation captures your everyday conversational voice — the tone you use when you are relaxed, happy to hear from someone, and just talking like yourself.Scenario to have in mind when recording: It is morning. A fan has just messaged asking what you have planned for the day. You are sending them a friendly, personal audio reply. You are not performing. You are just chatting.What to aim for:

Relaxed and natural pace — not rushed, not overly careful
Warm and personal, as if you are genuinely pleased they asked
Conversational energy, with natural variation in your tone

Tips for this variation:

Talk through your actual morning plans, or improvise a believable scenario
Let yourself smile while you speak — it comes through in the audio
Do not try to be too polished; natural is the goal
Aim for 1–2 minutes of continuous, comfortable speech

This is the foundation of your clone. If you only record one variation, make it this one — but all three together produce a far more capable model.

Recording all three variations

You do not have to record all three variations in a single session. In fact, recording them separately — when you are in the right headspace for each — tends to produce better results than trying to move through all three back-to-back. See the recording guide for full technical guidance on environment, equipment, and delivery, and the tips page for advice on how to get the most out of each session.

​Why variations matter

​The three variations

​Recording all three variations

Why variations matter

The three variations

Recording all three variations