VIRTUAL-specific Validation

Governance at the VIRTUAL level involves the validation of any proposals for changes to the Virtual environment. Validators are responsible for reviewing, discussing, validating, and voting on these proposed changes in the community forum. While all token holders and contributors can participate in discussions, only those with Validator status have the authority to validate or cast votes on proposals. The voting mechanism operates on the principles of Delegated Proof of Stake (DPos).

Validation work primarily focuses on assessing new contributions made to the Virtuals.

The Validation Process

Upon a contribution being made, an automatic proposal is generated. Whenever a model is submitted, it is hosted on Virtual Protocol's decentralized infrastructure. Validators interact with the model to determine if it merits replacing the existing one.

Validating a Model

When validating a model, validators are presented with two models anonymously for comparison. They go through 10 rounds of interaction with each model pair, selecting the better responses. After 10 rounds, a vote is submitted with the final outcomes.

Anonymity in model comparison prevents collusion and bias among validators and contributors, ensuring a fair model selection process.

Virtual Protocol has opted for the Elo Rating System for model comparison.

Refining the Elo Rating System

Building on the foundation laid by pioneers like Fastchat, we acknowledge the challenges in stability with traditional Elo ratings. Hence, we've implemented a refined, bootstrap version of the Elo Rating System, enhancing stability and reliability in our model validation outcomes.

A standard Elo Rating Mechanism works as below:

def compute_elo(battles, K=4, SCALE=400, BASE=10, INIT_RATING=1000):
    rating = defaultdict(lambda: INIT_RATING)

    for rd, model_a, model_b, winner in battles[['model_a', 'model_b', 'winner']].itertuples():
        ra = rating[model_a]
        rb = rating[model_b]
        ea = 1 / (1 + BASE ** ((rb - ra) / SCALE))
        eb = 1 / (1 + BASE ** ((ra - rb) / SCALE))
        if winner == "model_a":
            sa = 1
        elif winner == "model_b":
            sa = 0
        elif winner == "tie" or winner == "tie (bothbad)":
            sa = 0.5
        else:
            raise Exception(f"unexpected vote {winner}")
        rating[model_a] += K * (sa - ea)
        rating[model_b] += K * (1 - sa - eb)

    return rating

    def preety_print_elo_ratings(ratings):
    df = pd.DataFrame([
        [n, elo_ratings[n]] for n in elo_ratings.keys()
    ], columns=["Model", "Elo rating"]).sort_values("Elo rating", ascending=False).reset_index(drop=True)
    df["Elo rating"] = (df["Elo rating"] + 0.5).astype(int)
    df.index = df.index + 1
    return df

elo_ratings = compute_elo(battles)
preety_print_elo_ratings(elo_ratings)

Model Assessment

Beyond mere model comparison, our system assigns a definitive score to each model, forming the basis for Elo Ratings. This score reflects a model's sophistication and reliability, encouraging our community to focus on impactful developments.

Assessing Dataset Quality

When a model is fine-tuned using a dataset contributed by others, the Elo rating scored by the model indicates the quality of the dataset. The impact score, derived from the score differences between the proposed model and the existing one, determines whether the proposed model is superior after fine-tuning with the dataset. This enables Virtual Protocol to establish standards for contributed datasets and reject those of lower impact.

Last updated