Using Pre-trained Models

What are Pre-trained Models?

A pre-trained model is a base model that has already been trained on a large amount of data. When you train your own voice model, you can start from scratch or you can use a pre-trained model as a starting point.

Using a pre-trained model can save you a significant amount of time and effort, and it can often lead to better results, especially if you have a small dataset.

An illustration showing that using a pre-trained model saves you time and effort when training your own model.

How to Use a Pre-trained Model

To use a pre-trained model in Applio:

In the Train tab, check the Custom Pretrained box.
Upload the pre-trained model’s .pth and .index files.
Select the pre-trained model from the Pretrained G/D Path dropdown menus.
Proceed with your training as usual.

A screenshot of the 'Custom Pretrained' section in Applio's training tab.

How to Create Your Own Pre-trained Model

You can also create your own pre-trained models to use as a base for future projects.

From Scratch: To create a pre-trained model from scratch, you’ll need a large, diverse dataset (50+ hours of audio is recommended). Train a model on this dataset as you normally would, but without using an existing pre-trained model.
Fine-tuning an Existing Model: You can also fine-tune an existing pre-trained model to create a new one. For example, you could fine-tune a general-purpose English model on a dataset of a specific accent to create a pre-trained model for that accent.

When creating pre-trained models, it’s important to use high-quality, non-copyrighted audio.

Community Pre-trained Models

Here are some popular pre-trained models created by the community.

DMR V1

Fine-tuned for e-girl, soft male/female, and deep male/female voices. Works best with clean datasets and the Mangio-Crepe/Crepe pitch extraction algorithm.
Sample Rate: 32k
Download D file | Download G file

KLM 4.1

Trained on Korean, Japanese, and English data. Ideal for creating vocal guides from short, high-quality studio recordings. Sensitive to noise.
Sample Rates: 32k, 48k
Download 32k D file | Download 32k G file
Download 48k D file | Download 48k G file

Nanashi V1.7

Trained on Brazilian music. Works well for Portuguese and other languages. Handles noise well and requires fewer training epochs.
Sample Rate: 32k
Download D file | Download G file

Ov2 Super

Works well for small, clean English datasets. Trained on bright, emotional voices. Requires fewer training epochs.
Sample Rates: 32k, 40k
Download 32k D file | Download 32k G file
Download 40k D file | Download 40k G file

RIN_E3

Trained from scratch on a large English dataset. Best used with high-quality datasets due to its sensitivity to noise.
Sample Rate: 40k
Download D file | Download G file

SingerPreTrain

Fine-tuned for English singers. Suitable for a wide range of vocal types, from bass to soprano.
Sample Rate: 32k
Download D file | Download G file

SnowieV3.1

Trained on Russian and Japanese data. Helps to improve pronunciation in other languages.
Sample Rates: 32k, 40k, 48k
Download 32k D file | Download 32k G file
Download 40k D file | Download 40k G file
Download 48k D file | Download 48k G file

TITAN

A robust, general-purpose model that gives clean results and handles accents and noise well. Requires fewer training epochs.
Sample Rates: 32k, 40k, 48k
Download 32k D file | Download 32k G file
Download 40k D file | Download 40k G file
Download 48k D file | Download 48k G file