# Voice Core

VIRTUAL Agent is designed to have a distinct voice that aligns with its personality and role. Therefore, training the voice models is a critical process to ensure that each character's voice is not only realistic but also consistent with their designed persona.

## There are two modules used in Voice Core.&#x20;

**Speech-to-text module**: STT module is trained with a wide range of voice data. This training allows the module to accurately transcribe various accents, dialects, and speech patterns, making it versatile and reliable in different user scenarios.

**Text-to-speech module**: For the TTS module, we utilize Variational Inference for Text-to-Speech (VITS) training. VITS is known for its ability to produce high-quality, natural-sounding speech. This training is particularly important for our platform, as each AI character requires a specific voice that matches its unique personality and characteristics. The VITS model allows for this level of customization and quality in voice synthesis.

Before model is trained, data processing is performed.&#x20;

### **Techniques used for data preprocessing**

1. **Format Consistency**: Having all audio files in the same format (WAV) and specifications (22050 Hz, mono) ensures consistency, which is essential for machine learning models to perform optimally. Inconsistent audio formats can lead to variability in the input data, which can confuse the model and degrade performance.
2. **Sampling Rate Normalization (22050 Hz)**: The sampling rate determines how many samples per second are in the audio file. A standard sampling rate like 22050 Hz is often used because it's sufficient to capture the frequency range of human speech while keeping the file size manageable. It also aligns with the Nyquist theorem for capturing all frequencies up to 11025 Hz, which covers most of the human hearing range.
3. **Mono Channel**: Converting stereo or multi-channel audio files to mono ensures that the model trains on a single channel, which simplifies the learning process.&#x20;

<details>

<summary>Sample Code</summary>

```python
import os
from pydub import AudioSegment

upload_dir = 'upload_dir'
output_dir = 'out'

# Ensure the output directory exists
os.makedirs(output_dir, exist_ok=True)

extensions = ['wav', 'mp3', 'ogg']

# Process all files in the upload directory
for filename in os.listdir(upload_dir):
    if any(filename.lower().endswith(ext) for ext in extensions):
        # Construct file paths
        file_path = os.path.join(upload_dir, filename)
        output_path = os.path.join(output_dir, os.path.splitext(filename)[0] + '.wav')

        # Load the audio file
        audio = AudioSegment.from_file(file_path)

        # Convert to WAV, 22050 Hz, mono
        audio = audio.set_frame_rate(22050).set_channels(1)

        # Export the processed audio
        audio.export(output_path, format='wav')

```

</details>

[<mark style="color:red;">Learn more about contributing to Voice Core.</mark>](/builders-hub/build-with-virtuals/agent-contribution/contribute-to-voice-core.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://whitepaper.virtuals.io/about-virtuals-1/the-protocol/co-contribution-and-provenance/modular-consensus-framework/decentralized-contribution/voice-core.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
