Contribute to Cognitive Core
Last updated
Last updated
Contributors aiming to enrich the Character Core of an Agent have several key avenues for contribution, each focusing on different aspects of AI development:
Contributors may contribute models in two forms:
Model Enhancement Submission/ New Model Submission: Training or updating the Large Language Model (LLM) with the collected data. This can be done using either the collective data repository or proprietary datasets, aiming to tailor the AI's responses to specific domains.
Pre-trained Models: Developing new models pre-trained with a specific set of domain knowledge, enhancing the LLM's performance and breadth of knowledge in particular areas.
Character Card Submission: Using an existing foundational model from the Protocol App to submit a new Character Card to an Agent.
Model Naming: Use all lowercase, no spaces, and ensure the name is meaningful.
Model Specifications:
Quantize the model file to at least 4 bits.
Limit the model to no more than 13 billion parameters.
Template Indication: Clearly state the chat template used, like "Alpaca template."
Response Format: Model should use Alichat format, with actions wrapped in asterisks.
Compatibility Check: Ensure model compatibility with existing AI systems.
Documentation: Provide comprehensive documentation of the model’s features and use cases.
Ethical Considerations: Adhere to ethical AI practices to avoid biases.
Performance Metrics: Include validation results or performance metrics.
Update and Maintenance Plan: Outline plans for future model updates and maintenance.
Contributors can provide diverse datasets that cover a wide range of topics, enriching the AI's knowledge base and enhancing its ability to respond accurately across various domains.
The primary use of these datasets will be for instruction-based finetuning. This process involves adjusting the AI model to better understand and follow specific instructions or guidelines based on the provided data.
Submissions should ideally be in .csv (comma-separated values) format.
To submit new dataset, select "I have a new Dataset".
Other than that, dataset contribution can be submitted in other ways for pre-trained purposes. Below are the different types of dataset can be collected and other alternatives to utilize them in a model.
Data Collection and Transcription
Gathering Domain-Specific Information: Focus on collecting information pertinent to the Virtual's area of expertise from a variety of sources. This step is crucial for building a comprehensive knowledge base.
Annotating Transcribed Data: Highlight essential information and context within the transcribed data. Annotation is key to understanding and utilizing the collected data effectively.
Systematic Organization: Ensure the data is systematically organized. Proper classification is essential for efficiently training the AI in relevant knowledge areas.
Expanding a Virtual's Personality
Lore and Backstory Expansion: Submissions can include detailed lore or an extended backstory for the Virtual, adding depth and richness to its character.
Trait Elaboration: Contributions can elaborate on specific personality traits or characteristics of the Virtual, helping to create a more nuanced and relatable AI character.
This submission can also be integrated into prompt cards. For prompt card integration, please consult the 'Character Card Submission' section for detailed guidelines and formatting requirements.
Dataset Diversity and Inclusivity: Ensure representation of diverse data sources.
Quality Assurance: Perform thorough checks for accuracy and relevance.
Anonymization of Data: Anonymize sensitive information in user-generated content.
Legal Compliance: Ensure the dataset adheres to data protection laws.
Metadata Inclusion: Provide metadata detailing source, collection methods, and preprocessing.