Illustration: dApp interaction with Protocol using SDK
When decentralized applications (dApps) interact with our SDK, below steps are performed.
Convert input to readable text
Initially, user inputs, which can be either Text Messages or Voice Messages, are converted into readable text. For voice messages, this involves the use of a Speech-to-Text (STT) module that accurately transforms spoken words into text, ensuring that the input enters the correct interpretation pipeline.
Response Generation
Following the conversion of input, the next step involves generating a response. This is where Language Language Models (LLM) come into play. The LLM receives the input prompt and crafts a response. To enhance the context and relevance of this interaction, the LLM retrieves previous conversations between the character and the individual from the vector database. These past interactions are integrated as a memory layer using a threading mechanism, enriching the LLM's understanding and response accuracy. Additionally, new prompts are stored in the Vector DB, continuously updating the conversation history. The specific LLM used for response generation varies, depending on the champion models selected by the Character Owners.
Sentiment Analysis
An integral part of the process is Sentiment Analysis, performed by the TextModel. This model analyzes the text to understand and gauge the underlying sentiments, enhancing the response output from the LLM. This additional layer of analysis is crucial as it provides descriptive elements that assist the Voice Models and Animation System in creating outputs that are more accurate and lifelike.
Output Conversion & Presentation
The final stage of the process is the conversion of the LLM's output into an interactive character response. This involves several steps, starting with the transformation of the text response into speech using a Text-to-Speech module. This module is equipped with the character's trained voice, ensuring that the spoken output matches the character's unique vocal characteristics. Concurrently, corresponding lip-syncing and facial expressions will be defined, bringing the character to life. The output includes a synchronized character animation, a voice message, and a corresponding text message, providing a rich and immersive experience. An example of such a response.
Last updated