Getting a great Voice Clone is less about technical perfection and more about consistently making good decisions during the recording and submission process. The tips below address the most common ways creators undermine their results — and how to avoid them.Documentation Index
Fetch the complete documentation index at: https://docs.deliah.ai/llms.txt
Use this file to discover all available pages before exploring further.
The quality of your clone is a direct reflection of the quality of what you submit. Better input produces a better clone. There is no processing step that rescues noisy, flat, or minimal recordings — the model learns exactly from what you give it.
Record in multiple sessions rather than all at once
Record in multiple sessions rather than all at once
You do not have to submit everything in a single sitting. Recording across multiple sessions often produces better results because your voice sounds more natural when you are relaxed and not rushing to complete a task.Schedule short sessions — 10 to 15 minutes each — across a few days. Record one variation per session if that helps you stay in the right headspace. Submit each batch of recordings as you go.Consistency matters across sessions: try to record in the same environment, at roughly the same distance from the microphone, so the model receives audio with a consistent acoustic character.
Always listen back before you submit
Always listen back before you submit
Playing back your recording before uploading takes two minutes and can save you from submitting unusable audio. Listen for:
- Any background noise that crept in (traffic, AC hum, a washing machine starting up)
- Muffling or distortion from holding the microphone too close or moving it
- Sections where your voice drops off or becomes inconsistent
- Any other voice or sound overlapping with yours
Submit more than the minimum
Submit more than the minimum
The minimum accepted recording length is 30 seconds, but submitting exactly the minimum produces a clone with limited range and naturalness. Think of the minimum as the floor, not the target.Aim for at least 1–3 minutes of clean audio per variation. If you can submit more across multiple sessions, do. The more data your clone has to learn from, the more naturally it will handle a wide range of text — different sentence lengths, emotional tones, pacing, and phrasing.
Include all three variation types
Include all three variation types
Recording only the Normal variation produces a clone that can handle everyday conversational messages — but not much else. Including Whisper and Ecstasy recordings gives your clone the emotional range to generate messages that match whatever the moment calls for.Fans interact with creators across a wide emotional spectrum. A clone that can shift convincingly between warm and casual, intimate and whispering, and passionate and intense is one that keeps fans engaged in a way that a single-register clone simply cannot.See the audio variations guide for full details on how to record each type.
Submit supplementary content alongside your recordings
Submit supplementary content alongside your recordings
You can supplement your dedicated recordings with existing content that features your voice:
- Existing videos — Deliah extracts the audio automatically
- Existing audio files — submitted directly
- Voice messages from real fan chats — some of the most useful material because it is completely unscripted
Speak naturally — do not perform
Speak naturally — do not perform
The most common mistake creators make is trying to sound good instead of sounding like themselves. If you approach the recording as a performance — more careful, more polished, more “on” than usual — the clone learns that version of your voice, not the real one.Your fans know what you sound like when you are relaxed and genuine. That is the version they want to hear. Speak as if you are sending a voice message to someone you actually like. Let yourself meander a little. Let your energy drop naturally at the end of sentences. Let a small laugh slip in if something amuses you.Natural imperfections are not flaws in your recording — they are exactly what makes the clone sound human.
Do not record when your voice is not at its best
Do not record when your voice is not at its best
Your clone learns from whatever audio you submit. If you record while sick, hoarse, congested, or otherwise not sounding like yourself, the model will incorporate that atypical vocal state — and your clone will sound slightly off.Wait until your voice is back to normal. If you have a session scheduled and you are not feeling well, reschedule. This applies to unusual vocal states beyond illness as well: extreme fatigue, a sore throat from a long night out, or a day when your voice is just not sitting right.The few days you save by recording anyway are not worth the permanent effect on your clone’s quality.
Be consistent across your recording environment
Be consistent across your recording environment
If you record your Normal variation in a bedroom with carpets and soft furnishings, and your Whisper variation in a tiled bathroom, the acoustic character of the two recordings will be noticeably different. This inconsistency makes it harder for the model to build a coherent representation of your voice.Try to record all your variations in the same room, with the same setup, at the same distance from the microphone. Consistency in the acoustic environment means the model receives a unified picture of your voice rather than several conflicting ones.