Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.deliah.ai/llms.txt

Use this file to discover all available pages before exploring further.

Getting a great Voice Clone is less about technical perfection and more about consistently making good decisions during the recording and submission process. The tips below address the most common ways creators undermine their results — and how to avoid them.
The quality of your clone is a direct reflection of the quality of what you submit. Better input produces a better clone. There is no processing step that rescues noisy, flat, or minimal recordings — the model learns exactly from what you give it.
You do not have to submit everything in a single sitting. Recording across multiple sessions often produces better results because your voice sounds more natural when you are relaxed and not rushing to complete a task.Schedule short sessions — 10 to 15 minutes each — across a few days. Record one variation per session if that helps you stay in the right headspace. Submit each batch of recordings as you go.Consistency matters across sessions: try to record in the same environment, at roughly the same distance from the microphone, so the model receives audio with a consistent acoustic character.
Playing back your recording before uploading takes two minutes and can save you from submitting unusable audio. Listen for:
  • Any background noise that crept in (traffic, AC hum, a washing machine starting up)
  • Muffling or distortion from holding the microphone too close or moving it
  • Sections where your voice drops off or becomes inconsistent
  • Any other voice or sound overlapping with yours
If you hear any of these issues, re-record. A shorter, clean recording is always more valuable than a longer one with problems.
The minimum accepted recording length is 30 seconds, but submitting exactly the minimum produces a clone with limited range and naturalness. Think of the minimum as the floor, not the target.Aim for at least 1–3 minutes of clean audio per variation. If you can submit more across multiple sessions, do. The more data your clone has to learn from, the more naturally it will handle a wide range of text — different sentence lengths, emotional tones, pacing, and phrasing.
Recording only the Normal variation produces a clone that can handle everyday conversational messages — but not much else. Including Whisper and Ecstasy recordings gives your clone the emotional range to generate messages that match whatever the moment calls for.Fans interact with creators across a wide emotional spectrum. A clone that can shift convincingly between warm and casual, intimate and whispering, and passionate and intense is one that keeps fans engaged in a way that a single-register clone simply cannot.See the audio variations guide for full details on how to record each type.
You can supplement your dedicated recordings with existing content that features your voice:
  • Existing videos — Deliah extracts the audio automatically
  • Existing audio files — submitted directly
  • Voice messages from real fan chats — some of the most useful material because it is completely unscripted
Unscripted audio is particularly valuable. When you are not thinking about how you sound, you sound the most like yourself. Real fan messages capture your natural cadence, your instinctive vocabulary, the small verbal habits that make your voice recognizable.Apply the same quality standards to supplementary content as to new recordings: your voice only, no background noise, clearly audible throughout.
The most common mistake creators make is trying to sound good instead of sounding like themselves. If you approach the recording as a performance — more careful, more polished, more “on” than usual — the clone learns that version of your voice, not the real one.Your fans know what you sound like when you are relaxed and genuine. That is the version they want to hear. Speak as if you are sending a voice message to someone you actually like. Let yourself meander a little. Let your energy drop naturally at the end of sentences. Let a small laugh slip in if something amuses you.Natural imperfections are not flaws in your recording — they are exactly what makes the clone sound human.
If you are finding it hard to sound natural, try not to think of it as a recording session at all. Imagine you are responding to a specific fan you enjoy talking to, and just talk to them. The audio will take care of itself.
Your clone learns from whatever audio you submit. If you record while sick, hoarse, congested, or otherwise not sounding like yourself, the model will incorporate that atypical vocal state — and your clone will sound slightly off.Wait until your voice is back to normal. If you have a session scheduled and you are not feeling well, reschedule. This applies to unusual vocal states beyond illness as well: extreme fatigue, a sore throat from a long night out, or a day when your voice is just not sitting right.The few days you save by recording anyway are not worth the permanent effect on your clone’s quality.
If you record your Normal variation in a bedroom with carpets and soft furnishings, and your Whisper variation in a tiled bathroom, the acoustic character of the two recordings will be noticeably different. This inconsistency makes it harder for the model to build a coherent representation of your voice.Try to record all your variations in the same room, with the same setup, at the same distance from the microphone. Consistency in the acoustic environment means the model receives a unified picture of your voice rather than several conflicting ones.