How to record your voice for the best clone quality

The quality of your Voice Clone depends almost entirely on the quality of the audio you submit. Deliah’s AI can only work with what you give it — clean, natural recordings produce a clone that sounds genuinely like you, while recordings with noise, distortion, or an unnatural delivery produce one that falls flat. This guide walks you through everything you need to record well.

Background noise is the single most common reason a Voice Clone sounds unnatural or robotic. Even low-level noise — a fan running, traffic outside, a TV in another room — is picked up by the microphone and baked into the model. Record in the quietest environment you can find.

What you need

You do not need professional studio equipment. A modern smartphone with a decent built-in microphone is sufficient if your recording environment is clean. That said, if you have access to a dedicated microphone — even an entry-level USB condenser mic — use it. Better equipment gives you more headroom. What matters most is not the hardware. It is the environment and your delivery.

The recording process

Choose a quiet room

Find a space where external sounds cannot reach the microphone. A bedroom with soft furnishings works well — fabric absorbs echo and dampens ambient noise. Avoid kitchens, rooms with hard floors, or any space near a road, air conditioning unit, or appliance. Close windows and doors. Turn off fans, air conditioning, and any device that makes a continuous sound.Background sources to eliminate before you start:

Music or TV playing anywhere in the space
Air conditioning or heating units
Street noise or open windows
Washing machines, dishwashers, or other appliances running nearby
Other people talking

Set up your device

Place your phone or microphone at a consistent distance — roughly 15–30 cm (6–12 inches) from your mouth. Hold it steady or prop it on a surface. Do not hold it in your hand while speaking, as movement and handling noise will be picked up. If using a phone, open your preferred voice memo or recording app and do a 5-second test clip. Play it back and listen critically for hum, echo, or background sounds before you begin the full recording.

Record your audio

Aim for at least 30 seconds of clean audio. For the best results, record 1–3 minutes. The more material you submit, the more your clone can learn — more natural pauses, more variation in pace and emphasis, a richer emotional range.

Speak as if you are sending a voice message to a fan you genuinely like — relaxed, warm, and natural. Do not read from a script in a flat, careful voice. Do not try to “perform.” The AI is learning your real voice, not a polished version of it. Natural imperfections — a small laugh, a slight pause, the way your voice softens on certain words — are exactly what makes the clone feel authentic.

You can talk about anything: what you are doing today, something you are excited about, a story from your week. The content is secondary to the delivery. What matters is that your voice sounds like you at your most natural and comfortable.

Listen back before submitting

Before you upload, play the recording back in full. Listen for:

Audible background noise at any point
Muffling or distortion (usually caused by holding the mic too close or moving it)
Long silences where nothing is happening
Other voices or sounds overlapping with yours

If you notice any of these, re-record. A two-minute clean recording is far more valuable than a five-minute noisy one.

Submit your recording

Upload your audio through the Deliah platform. You can submit multiple files — in fact, recording in several shorter sessions and submitting them all is a great approach if you find it hard to speak naturally for a full minute at once.See the audio variations guide for how to record the three emotional variation types (Normal, Whisper, and Ecstasy) that give your clone its full range.

Minimum and recommended lengths

Amount	What it produces
Under 30 seconds	Not accepted — insufficient data for a usable clone
30 seconds – 1 minute	Minimum viable clone; limited range and naturalness
1–3 minutes	Recommended — produces a natural, versatile clone
3+ minutes across multiple variations	Best possible results; maximum authenticity and emotional range

What counts as acceptable audio

You do not need to record everything fresh. You can also submit:

Existing audio files where only your voice is present
Existing video files (Deliah extracts the audio)
Voice messages from real fan chats, provided the audio quality is good

The same rules apply to all submitted audio: your voice only, no background noise, clean and clear.

Get Started

Voice Clone

For Creators

How to record your voice for the best clone quality

What you need

The recording process

Minimum and recommended lengths

What counts as acceptable audio

Get Started

Voice Clone

For Creators

Documentation Index

​What you need

​The recording process

​Minimum and recommended lengths

​What counts as acceptable audio

What you need

The recording process

Minimum and recommended lengths

What counts as acceptable audio