How we cloned the voice of Emanuele Filiberto of Savoy with AI

We have cloned the voice of our CEO Emanuele Filiberto di Savoia: discover the process that made it possible

In the realm of modern technology, artificial intelligence (AI) has emerged as a transformative force, reshaping industries and redefining possibilities. One of its most fascinating applications is in the field of voice synthesis, where AI can generate human-like voices that are almost indistinguishable from the real thing. But how does this process work? And what goes into creating a synthetic voice that truly captures the essence of a person? As a case study, let’s delve into the journey of crafting a synthetic voice for Emanuele Filiberto di Savoia, our CEO. Even if you’re new to the concept of AI and voice synthesis, this exploration will provide a comprehensive overview, shedding light on the intricate steps and considerations involved.

 

Introduction to the Project

When we embarked on the journey to develop a synthetic voice for Emanuele Filiberto di Savoia, our first task was data collection. We sourced recordings from various channels, including his interviews, public speeches, and other media appearances. Ensuring a diverse range of recordings was crucial to encapsulate the unique nuances of his voice..

 

Data Pre-processing

Having amassed a substantial dataset, we transitioned to the pre-processing phase. We segmented the recordings into shorter clips, typically ranging from 1 to 20 seconds. From these clips, we extracted vital vocal features, focusing on elements like spectrograms and mel-frequency cepstral coefficients (MFCC).

 

Model Selection

The decision of selecting the right model was pivotal. We evaluated several options, including Tacotron 2, which translates text input into a mel spectrogram, and WaveNet, which generates sound waves directly from a mel spectrogram. After rigorous testing, we chose a model that best mirrored the essence of Emanuele Filiberto’s voice.

 

Training and Optimization

The training phase was rigorous. We partitioned the data into training, validation, and test sets. Leveraging optimization algorithms like Adam, we meticulously adjusted the model’s weights. Throughout this phase, we kept a close eye on various metrics to ensure the model was on the right track.

 

Evaluation and Feedback

Post-training, we undertook a comprehensive evaluation of the model. Objective metrics, such as the mel spectrogram loss, were employed. Additionally, we listened to the outputs and sought external feedback to ascertain how closely our synthetic voice resembled Emanuele Filiberto’s authentic voice.

 

Post-processing for Quality

To refine the output, we employed post-processing techniques. A low-pass filter was applied to eliminate any high-frequency artifacts, and we equalized the sound to enhance its naturalness.

 

Conclusion

Reflecting on the endeavor, crafting a high-quality synthetic voice for Emanuele Filiberto di Savoia was both challenging and enlightening. It demanded a harmonious blend of technical acumen and creative problem-solving. This project will allow us to use the voice of Emanuele Filiberto of Savoy to make videos with his virtual avatar and insert dedicated content into the game.

 

Join the project