Phase 2
Last updated
Last updated
In the second phase of our project, we will leverage the data collected in Phase 1 to further refine and enhance our Large Language Models (LLMs). This phase is centered on user engagement with specific questions that were previously identified as problematic in terms of alignment with user expectations. The users will provide feedback on how various LLMs performed on similar questions, thereby facilitating a comprehensive comparative analysis.
This phase of the project is focused in the application of Proximal Policy Optimization (PPO). As users interact with the LLMs and provide feedback, we will harness this data to fine-tune the models in accordance with user preferences and expectations. This iterative process will enable us to incrementally align the each of the models' performance with user values, thereby enhancing their reliability and relevance in practical applications.
Simultaneously, we will introduce the xHAI staking program at the onset of Phase 2. Our objective for this phase includes onboarding at least 10 paying products as the learning process benefits more with more LLM models and inference data available. This endeavor will not only generate revenue but also provide an opportunity to gather a diverse array of user feedback.
By the conclusion of Phase 2, we aim to have developed preference models that are not only more accurate but also finely attuned to the intricate subtleties of human language and communication. This will have created a meticulous fine-tuning process, that is reusable across models .We will start making our foray into niche use cases for LLM targeting to develop responses based on specific subjects and also plan to leverage our human network for generating the required datasets.