A Voice Clone built from a single tone of voice is limited. If every message your chatters send sounds the same — same energy, same mood, same intimacy level — fans start to notice. Deliah addresses this by training your clone on three distinct emotional variations, each designed for a different kind of fan interaction.
Recording all three variations gives your clone the emotional range to match the moment: a warm morning check-in lands differently than a late-night intimate message, and both land better when they sound genuinely in character.
Why variations matter
Fans interact with you across a spectrum of moods and contexts. Some messages are casual and friendly. Others are intimate and personal. Others are intensely passionate. Your clone needs to be able to match each of these registers convincingly — and it can only do that if you give Deliah recordings that demonstrate those registers in your real voice.
You can also supplement your variation recordings with existing content:
- Existing videos — Deliah extracts the audio and uses it as training material
- Existing audio files — submitted directly alongside your new recordings
- Voice messages from real fan chats — authentic, unscripted material that is often the most natural-sounding input of all
The more supplementary content you include alongside your recorded variations, the more natural and versatile your final clone will be. Real, unscripted audio is especially valuable because it captures how you actually speak, not how you think you should sound.
The three variations
Goal: natural, warm, and authenticThe Normal variation captures your everyday conversational voice — the tone you use when you are relaxed, happy to hear from someone, and just talking like yourself.Scenario to have in mind when recording:
It is morning. A fan has just messaged asking what you have planned for the day. You are sending them a friendly, personal audio reply. You are not performing. You are just chatting.What to aim for:
- Relaxed and natural pace — not rushed, not overly careful
- Warm and personal, as if you are genuinely pleased they asked
- Conversational energy, with natural variation in your tone
Tips for this variation:
- Talk through your actual morning plans, or improvise a believable scenario
- Let yourself smile while you speak — it comes through in the audio
- Do not try to be too polished; natural is the goal
- Aim for 1–2 minutes of continuous, comfortable speech
This is the foundation of your clone. If you only record one variation, make it this one — but all three together produce a far more capable model. Goal: quiet, intimate, and seductiveThe Whisper variation trains your clone on the softer, more private register of your voice — the tone you use for late-night, one-on-one intimacy.Scenario to have in mind when recording:
It is nighttime. You are sending a fan an intimate voice message in a sexy whisper — describing what you are wearing, what you are doing, making the message feel private and personal, as if it is just for them.What to aim for:
- Genuinely quiet — a real whisper or near-whisper, not just a softer version of your normal voice
- Intimate and unhurried, as if you have all the time in the world
- Seductive and personal, as if the listener is the only one who gets to hear this
Tips for this variation:
- Record this one in a very quiet environment — whispered audio picks up background noise more easily than full-volume speech
- Speak slowly and with intention; pauses and breath are part of the mood
- Stay in character throughout; inconsistency in tone is harder for the model to work with
- Keep the microphone a consistent distance — closer than your Normal recording, but not so close you get distortion
Whisper recordings are more sensitive to background noise than other variation types. Make sure your environment is completely silent before you start. Even a small ambient sound will be far more audible in a quiet whisper recording than in a normal-volume one.
Goal: emotional, intense, and passionateThe Ecstasy variation captures the most emotionally heightened register of your voice — passionate, uninhibited, and fully present in the moment.Scenario to have in mind when recording:
You are close to climax. You are describing your feelings and desires. The message is intense and completely honest about what you are experiencing and wanting.What to aim for:
- Genuine emotional intensity — not performed, but felt
- Passionate and expressive, with natural variation in pace and volume
- Unfiltered and present, without self-consciousness
Tips for this variation:
- This is the hardest variation to record authentically, because it requires real uninhibitedness
- Do not rush through it to get it done — take your time and actually inhabit the scenario
- Variation in breath and vocal texture is natural and valuable here; the model will learn from it
- Do not censor or smooth out intensity — the whole point is emotional range
The Ecstasy variation gives your clone the ability to generate messages that feel genuinely passionate rather than scripted. It is the variation that most separates a flat clone from one that fans respond to emotionally.
Recording all three variations
You do not have to record all three variations in a single session. In fact, recording them separately — when you are in the right headspace for each — tends to produce better results than trying to move through all three back-to-back.
See the recording guide for full technical guidance on environment, equipment, and delivery, and the tips page for advice on how to get the most out of each session.