Training a Voice Model
Training is the process where Applio learns to replicate a voice from a dataset of audio files. This guide will walk you through each step of the training process, from preparing your dataset to exporting your final model.
Step 1: Prepare Your Dataset
Section titled “Step 1: Prepare Your Dataset”The first and most important step is to prepare a high-quality audio dataset.
- Duration: Aim for 10-30 minutes of clean audio.
- Format: Your audio files must be in a lossless format, such as
.wavor.flac. - Quality: The audio should be free of background noise, reverb, and other artifacts.
For a detailed guide on creating a high-quality dataset, please see our Dataset Creation Guide.
Once your dataset is ready, you need to place it in the applio/assets/datasets directory. Create a new folder inside this directory for your model.
Multi-Speaker Models (Optional)
Section titled “Multi-Speaker Models (Optional)”If you want to train a model with multiple speakers, create a subfolder for each speaker inside your model’s dataset folder. The speaker folders must be named numerically, starting from 0.
Directoryapplio/assets/datasets/your-model-name/
Directory0/
- speaker0-audio1.wav
- speaker0-audio2.wav
Directory1/
- speaker1-audio1.wav
- speaker1-audio2.wav
Step 2: Pre-process the Dataset
Section titled “Step 2: Pre-process the Dataset”Now it’s time to pre-process your dataset.
- In the Train tab of Applio, enter a name for your model.
- Select the correct sample rate for your audio files (
32k,40k, or48k). - Click the Pre-process Dataset button.
Step 3: Extract Features
Section titled “Step 3: Extract Features”Next, you need to extract the features from your pre-processed dataset.
- Choose a Pitch Extraction Algorithm: We recommend using RMVPE for the best results.
- Select an Embedder Model: Make sure to choose the correct embedder for your model.
- Click the Extract Features button.
This process will take some time. You can monitor the progress in the command line window.
Step 4: Train the Model and Index
Section titled “Step 4: Train the Model and Index”This is the final and most time-consuming step.
- Set the “Save Every Epoch” Value: This determines how often the model is saved. A value between 10 and 50 is recommended.
- Set the “Total Epochs”: This is the total number of times the model will train on the entire dataset. A good starting point is 200-400 epochs, but you should use TensorBoard to monitor your model’s progress and decide when to stop.
- Set the “Batch Size”: This depends on your GPU’s VRAM. For an 8GB GPU, a batch size of 6-8 is a good starting point.
- Click the Train Model button.
- Once the model training is complete, click the Train Index button.

Step 5: Export Your Model
Section titled “Step 5: Export Your Model”Your trained models are saved in the logs folder. You can also export them directly from the Applio interface.
- Go to the Export Model section in the Train tab.
- Click the Refresh button.
- Select the
.pthfile and the corresponding.indexfile for your model. - Click the Export Model button.
Resume Training (Optional)
Section titled “Resume Training (Optional)”If you want to continue training a model you’ve already started, follow these steps to resume from where you left off:
- Select Your Model from the dropdown menu.
- Make sure to select the same original sample rate that you used when you started training (e.g.,
32k,40k, or48k). - Scroll down to the Training section.
- Choose the same batch size you used previously.
- Set a new max epoch value that is higher than your current one. For example, if your last completed epoch was 200, you can set this to 400 to continue training up to that point.
- Click the Start Training button to resume training from the latest saved checkpoint.