Technology

META’s vanilla Maverick AI model ranks under rivals in a popular chat benchmark

Earlier this week, Meta landed in a hot water to get high scores from the crowdsourcing benchmark LM Arena using an experimental and unclean version of LLAMA 4 Maverick Model. In this case, the manager of LM Arena apologized, changed the policy, and recorded an unmodified vanilla Maverick.

As a result, there is no competitive edge.

Unmodified Mavericks, “LLAMA-4-MAVERICK-17B-128E-Instruct” ranked under the model including Openai’s GPT-4O, Anthropic’s Claude 3.5 Sonnet and Friday, including Google’s Gemini 1.5 PRO. Many of these models have been a few months.

The release version of the LLAMA 4 would not have been seen because it was added to LMARENA after discovering tricks, but had to scroll to the 32nd place where the ranking was ranked. pic.twitter.com/a0bxkdx4lx
-P: âsn (@pigeon__s) April 11, 2025

Why isn’t performance not good? META’s experimental Maverick, LLAMA-4-MAVERICK-03-03-26-EXPERIMENTAL, explained that it was “optimized for dialogue” in the chart announced last Saturday. This optimization seems to work in LM Arena, which selects human evaluators compare and prefer the output of the model.

As we wrote before, LM Arena is not the most reliable measurement of the AI model’s performance. Nevertheless, in addition to adjusting the model to the benchmark, the developer has difficulty to predict how well the model will do in another context.

In the statement, a meta spokesman told TechCrunch that TechCrunch experimented with “all types of custom modifications.”

The spokesman said, “LLAMA-4-MAVERICK-03-26-EXPERIMENTAL” is a chat optimization version that is well performed in LMARENA. ”We now publish an open source version and we will see how developers customize LLAMA 4 for their use cases. We are happy to see what they will build and expect their continuous feedback. “

META’s vanilla Maverick AI model ranks under rivals in a popular chat benchmark

Popular Category

POPULAR POSTS

Avianca will resume daily Caracas-Bogotá flights, connecting Venezuela again to 83...

How to Identify Authentic David Yurmann Jewelry

EU holds off on approving US tariff talks

POPULAR CATEGORY

Time Flies on the World’s Longest Flight (JFK-SIN) : AirlineReporter

Highlights of our YouTube channel: AirlineReporter