
On Monday, Meta executives denied rumors that the company was well suggested in a specific benchmark, hiding the weakness of the model by training a new AI model.
AHMAD Al-Dahle, vice president of META, said in X’s post that META trained Llama 4 Maverick and Llama 4 scouts about the “test set.” In the AI benchmark, the test set is a data collection used to assess the performance of the model after the model is trained. Training for test sets can expand the benchmark score of the model to make the model look more competent than it is actually.
During the weekend, rumors that meta had no evidence of artificially enhancing the new model’s benchmark results began to circulate in X and Reddit. This rumor appears to be stems from the posts of the user’s social media site who claims that he has resigned from the company by protesting the company’s benchmarking practices.
Maverick and Scouts have been reported to be just rumors, just like the decision to use the MAVERICK version, which was not experimental in certain tasks, to achieve a better score in the benchmark LM Arena. X researchers observed differences in Maverick’s actions that can be downloaded openly compared to the hosted models in LM Arena.
Al-Dahle admitted that some users see the “mixed quality” of Maverick and Scout from another cloud provider that hosts the model.
Al-Dahle said, “We expect all public implementation to dial because the model has been deleted as soon as the model is ready.” We will continue to work through bug modifications and on boarding partners. “