Skip to content

Using Pre-trained Models

A pre-trained model is a base model that has already been trained on a large amount of data. When you train your own voice model, you can start from scratch or you can use a pre-trained model as a starting point.

Using a pre-trained model can save you a significant amount of time and effort, and it can often lead to better results, especially if you have a small dataset.

An illustration showing that using a pre-trained model saves you time and effort when training your own model.

To use a pre-trained model in Applio:

  1. In the Train tab, check the Custom Pretrained box.
  2. Upload the pre-trained model’s .pth and .index files.
  3. Select the pre-trained model from the Pretrained G/D Path dropdown menus.
  4. Proceed with your training as usual.

A screenshot of the 'Custom Pretrained' section in Applio's training tab.

You can also create your own pre-trained models to use as a base for future projects.

  • From Scratch: To create a pre-trained model from scratch, you’ll need a large, diverse dataset (50+ hours of audio is recommended). Train a model on this dataset as you normally would, but without using an existing pre-trained model.
  • Fine-tuning an Existing Model: You can also fine-tune an existing pre-trained model to create a new one. For example, you could fine-tune a general-purpose English model on a dataset of a specific accent to create a pre-trained model for that accent.

When creating pre-trained models, it’s important to use high-quality, non-copyrighted audio.

Here are some popular pre-trained models created by the community.

DMR V1

Fine-tuned for e-girl, soft male/female, and deep male/female voices. Works best with clean datasets and the Mangio-Crepe/Crepe pitch extraction algorithm.
Sample Rate: 32k
Download D file | Download G file

Nanashi V1.7

Trained on Brazilian music. Works well for Portuguese and other languages. Handles noise well and requires fewer training epochs.
Sample Rate: 32k
Download D file | Download G file

RIN_E3

Trained from scratch on a large English dataset. Best used with high-quality datasets due to its sensitivity to noise.
Sample Rate: 40k
Download D file | Download G file

SingerPreTrain

Fine-tuned for English singers. Suitable for a wide range of vocal types, from bass to soprano.
Sample Rate: 32k
Download D file | Download G file