> For the complete documentation index, see [llms.txt](https://whitepaper.virtuals.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://whitepaper.virtuals.io/virtuals-protocol-whitepaper-ko/virtuals-1/the-protocol/co-contribution-and-provenance/modular-consensus-framework/decentralized-contribution/voice-core.md).

# 음성 코어

VIRTUAL Agent는 그 성격과 역할에 맞는 고유한 목소리를 갖도록 설계되었습니다. 따라서 음성 모델을 학습하는 것은 각 캐릭터의 목소리가 현실적일 뿐만 아니라 설계된 페르소나와도 일관되도록 보장하는 데 중요한 과정입니다.

## Voice Core에서 사용되는 모듈은 두 가지가 있습니다.&#x20;

**음성-텍스트 모듈**: STT 모듈은 다양한 음성 데이터로 학습됩니다. 이 학습을 통해 모듈은 다양한 억양, 방언, 말투를 정확하게 전사할 수 있으며, 다양한 사용자 시나리오에서 다재다능하고 신뢰할 수 있게 됩니다.

**텍스트-음성 모듈**: TTS 모듈에는 Text-to-Speech를 위한 변분 추론(VITS) 학습을 사용합니다. VITS는 고품질의 자연스러운 음성을 생성하는 능력으로 잘 알려져 있습니다. 각 AI 캐릭터는 고유한 성격과 특성에 맞는 특정 음성이 필요하므로, 이 학습은 우리 플랫폼에서 특히 중요합니다. VITS 모델은 음성 합성에서 이러한 수준의 맞춤화와 품질을 가능하게 합니다.

모델을 학습하기 전에 데이터 처리가 수행됩니다.&#x20;

### **데이터 전처리에 사용되는 기법**

1. **형식 일관성**: 모든 오디오 파일을 동일한 형식(WAV)과 사양(22050 Hz, 모노)으로 맞추면 일관성이 보장되며, 이는 머신러닝 모델이 최적으로 동작하는 데 필수적입니다. 일관되지 않은 오디오 형식은 입력 데이터의 변동성을 초래할 수 있으며, 이는 모델을 혼란스럽게 하고 성능을 저하시킬 수 있습니다.
2. **샘플링 레이트 정규화(22050 Hz)**: 샘플링 레이트는 오디오 파일에 초당 몇 개의 샘플이 있는지를 결정합니다. 22050 Hz와 같은 표준 샘플링 레이트는 사람의 음성 주파수 범위를 충분히 포착하면서 파일 크기를 관리하기 쉬운 수준으로 유지할 수 있기 때문에 자주 사용됩니다. 또한 최대 11025 Hz까지의 모든 주파수를 포착할 수 있도록 하는 나이퀴스트 정리와도 맞아떨어지며, 이는 인간이 들을 수 있는 범위의 대부분을 포괄합니다.
3. **모노 채널**: 스테레오 또는 다채널 오디오 파일을 모노로 변환하면 모델이 단일 채널에서 학습하게 되어 학습 과정이 단순해집니다.&#x20;

<details>

<summary>샘플 코드</summary>

```python
import os
from pydub import AudioSegment

upload_dir = 'upload_dir'
output_dir = 'out'

# 출력 디렉터리가 존재하는지 확인
os.makedirs(output_dir, exist_ok=True)

extensions = ['wav', 'mp3', 'ogg']

# 업로드 디렉터리의 모든 파일 처리
for filename in os.listdir(upload_dir):
    if any(filename.lower().endswith(ext) for ext in extensions):
        # 파일 경로 구성
        file_path = os.path.join(upload_dir, filename)
        output_path = os.path.join(output_dir, os.path.splitext(filename)[0] + '.wav')

        # 오디오 파일 로드
        audio = AudioSegment.from_file(file_path)

        # WAV, 22050 Hz, 모노로 변환
        audio = audio.set_frame_rate(22050).set_channels(1)

        # 처리된 오디오 내보내기
        audio.export(output_path, format='wav')

```

</details>

[<mark style="color:red;">Voice Core에 기여하는 방법에 대해 자세히 알아보세요.</mark>](/virtuals-protocol-whitepaper-ko/builders-hub/virtuals/agent-contribution/contribute-to-voice-core.md)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://whitepaper.virtuals.io/virtuals-protocol-whitepaper-ko/virtuals-1/the-protocol/co-contribution-and-provenance/modular-consensus-framework/decentralized-contribution/voice-core.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.