> For the complete documentation index, see [llms.txt](https://whitepaper.virtuals.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://whitepaper.virtuals.io/virtuals-bai-pi-shu/gou-jian-zhe-zhong-xin/shi-yong-virtuals-gou-jian/zhi-neng-ti-gong-xian/wei-yu-yin-he-xin-gong-xian.md).

# 为语音核心贡献

## 为 Voice Core 贡献&#x20;

贡献者可以通过两种方式增强 Voice Core：

* 改进 TTS 模型
* 贡献新的语音数据

{% hint style="success" %}
无需对 STT 模型做出贡献，因为它目前使用的是 Azure 服务。准确率已经是最优的，不需要进一步改进。
{% endhint %}

### TTS 模型的改进

与模型相关的所有文件 **必须** 提交。生态系统支持多种服务提供商。以下是它们及其要求的列表：

{% tabs %}
{% tab title="GPT-Sovits" %}

* **sovits.pth**：这是你的主模型文件。请确保按要求命名为 "sovits.pth"。
* **reference1.wav**：.wav 格式的参考音频文件。请确保文件名与 "config.json" 文件中的参考项一致。
* **gpt.ckpt**：该模型的检查点文件。请确认其命名为 "gpt.ckpt"。
* **config.json**：这是你模型的配置文件。必须命名为 "config.json"。

下面是一个完整模型的示例文件夹提交结构。&#x20;

```
AudioModelSubmission/
├── sovits.pth                # 主模型文件
├── reference1.wav            # 参考音频文件（名称按 config.json 所示）     
├── gpt.ckpt                  # 模型检查点文件
└── config.json               # 模型配置文件

```

下面是一个示例 config.json 文件。&#x20;

```json
{
    "refFile": "Olyn.wav",
    "refText": "然而，即使如此，我依然屹立，作为人类精神韧性的见证"
}

```

{% endtab %}

{% tab title="XTTS" %}

* **model.pth**：这是主模型文件。它必须命名为 "model.pth"。
* **audio.wav**：这是 .wav 格式的参考音频文件。请确保其命名为 "audio.wav"。
* **vocab.json**：此 JSON 文件包含 TTS 系统使用的词汇表。它必须命名为 "vocab.json"。
* **config.json**：这是你模型的配置文件。必须命名为 "config.json"。

下面是一个完整模型的示例文件夹提交结构。&#x20;

```
AudioModelProject/
├── model.pth                # 主模型文件
├── audio.wav                # 参考音频文件
├── vocab.json               # TTS 系统的词汇表文件
└── config.json              # 模型配置文件
```

下面是一个示例 config.json 文件。&#x20;

{% code overflow="wrap" lineNumbers="true" %}

````json
```json
{
    "output_path": "output",
    "logger_uri": null,
    "run_name": "run",
    "project_name": null,
    "run_description": "\ud83d\udc38Coqui 训练运行。",
    "print_step": 25,
    "plot_step": 100,
    "model_param_stats": false,
    "wandb_entity": null,
    "dashboard_logger": "tensorboard",
    "save_on_interrupt": true,
    "log_model_step": null,
    "save_step": 10000,
    "save_n_checkpoints": 5,
    "save_checkpoints": true,
    "save_all_best": false,
    "save_best_after": 10000,
    "target_loss": null,
    "print_eval": false,
    "test_delay_epochs": 0,
    "run_eval": true,
    "run_eval_steps": null,
    "distributed_backend": "nccl",
    "distributed_url": "tcp://localhost:54321",
    "mixed_precision": false,
    "precision": "fp16",
    "epochs": 1000,
    "batch_size": 32,
    "eval_batch_size": 16,
    "grad_clip": 0.0,
    "scheduler_after_epoch": true,
    "lr": 0.001,
    "optimizer": "radam",
    "optimizer_params": null,
    "lr_scheduler": null,
    "lr_scheduler_params": {},
    "use_grad_scaler": false,
    "allow_tf32": false,
    "cudnn_enable": true,
    "cudnn_deterministic": false,
    "cudnn_benchmark": false,
    "training_seed": 54321,
    "model": "xtts",
    "num_loader_workers": 0,
    "num_eval_loader_workers": 0,
    "use_noise_augment": false,
    "audio": {
        "sample_rate": 22050,
        "output_sample_rate": 24000
    },
    "use_phonemes": false,
    "phonemizer": null,
    "phoneme_language": null,
    "compute_input_seq_cache": false,
    "text_cleaner": null,
    "enable_eos_bos_chars": false,
    "test_sentences_file": "",
    "phoneme_cache_path": null,
    "characters": null,
    "add_blank": false,
    "batch_group_size": 0,
    "loss_masking": null,
    "min_audio_len": 1,
    "max_audio_len": Infinity,
    "min_text_len": 1,
    "max_text_len": Infinity,
    "compute_f0": false,
    "compute_energy": false,
    "compute_linear_spec": false,
    "precompute_num_workers": 0,
    "start_by_longest": false,
    "shuffle": false,
    "drop_last": false,
    "datasets": [
        {
            "formatter": "",
            "dataset_name": "",
            "path": "",
            "meta_file_train": "",
            "ignored_speakers": null,
            "language": "",
            "phonemizer": "",
            "meta_file_val": "",
            "meta_file_attn_mask": ""
        }
    ],
    "test_sentences": [],
    "eval_split_max_size": null,
    "eval_split_size": 0.01,
    "use_speaker_weighted_sampler": false,
    "speaker_weighted_sampler_alpha": 1.0,
    "use_language_weighted_sampler": false,
    "language_weighted_sampler_alpha": 1.0,
    "use_length_weighted_sampler": false,
    "length_weighted_sampler_alpha": 1.0,
    "model_args": {
        "gpt_batch_size": 1,
        "enable_redaction": false,
        "kv_cache": true,
        "gpt_checkpoint": null,
        "clvp_checkpoint": null,
        "decoder_checkpoint": null,
        "num_chars": 255,
        "tokenizer_file": "",
        "gpt_max_audio_tokens": 605,
        "gpt_max_text_tokens": 402,
        "gpt_max_prompt_tokens": 70,
        "gpt_layers": 30,
        "gpt_n_model_channels": 1024,
        "gpt_n_heads": 16,
        "gpt_number_text_tokens": 6681,
        "gpt_start_text_token": null,
        "gpt_stop_text_token": null,
        "gpt_num_audio_tokens": 1026,
        "gpt_start_audio_token": 1024,
        "gpt_stop_audio_token": 1025,
        "gpt_code_stride_len": 1024,
        "gpt_use_masking_gt_prompt_approach": true,
        "gpt_use_perceiver_resampler": true,
        "input_sample_rate": 22050,
        "output_sample_rate": 24000,
        "output_hop_length": 256,
        "decoder_input_dim": 1024,
        "d_vector_dim": 512,
        "cond_d_vector_in_each_upsampling_layer": true,
        "duration_const": 102400
    },
    "model_dir": null,
    "languages": [
        "en",
        "es",
        "fr",
        "de",
        "it",
        "pt",
        "pl",
        "tr",
        "ru",
        "nl",
        "cs",
        "ar",
        "zh-cn",
        "hu",
        "ko",
        "ja",
        "hi"
    ],
    "temperature": 0.75,
    "length_penalty": 1.0,
    "repetition_penalty": 5.0,
    "top_k": 50,
    "top_p": 0.85,
    "num_gpt_outputs": 1,
    "gpt_cond_len": 30,
    "gpt_cond_chunk_len": 4,
    "max_ref_len": 30,
    "sound_norm_refs": false
}
```
````

{% endcode %}
{% endtab %}
{% endtabs %}

要提交语音模型，请选择 Voice Core

<figure><img src="/files/68bbb7230b3f5c27318f67ebbfdf3cf93c316434" alt=""><figcaption></figcaption></figure>

然后选择“我有一个语音模型”，并按照上述指南上传模型文件。

<figure><img src="/files/fe8208dd0f56ef2bf22f5059be5ea77efce33e0c" alt=""><figcaption></figcaption></figure>

### 新的语音数据贡献

* 提交的语音数据必须是合法获取且拥有分享权利的。
* 获取的语音数据必须来自真实来源。&#x20;
* 语音数据应无背景噪音，音频中只保留要训练的人声。&#x20;
* 语音数据必须为 .wav 格式。&#x20;


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://whitepaper.virtuals.io/virtuals-bai-pi-shu/gou-jian-zhe-zhong-xin/shi-yong-virtuals-gou-jian/zhi-neng-ti-gong-xian/wei-yu-yin-he-xin-gong-xian.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.