
Openai’s recently released O3 and O4-Mini AI models are cutting edge in many ways. But the new model still gives hallucinations or creating things. more Than the old model of Openai.
Pictures are proven to be one of the biggest and most difficult problems to be solved in AI, which affects today’s most performance. Historically, each new model has improved slightly in hallucinations, reducing hallucinations than predecessors. But it doesn’t seem to be the case for O3 and O4-Mini.
According to the internal test of Openai, according to the so-called reasoning models O3 and O4-Mini, More often Traditional “non-class” models of the company’s previous reasoning model (O1, O1-Mini and O3-Mini) and Openai.
Perhaps the Chatgpt producer will not know why it is happening.
In a technical report on O3 and O4-Mini, Openai wrote that more research is needed to understand why hallucinations worsen as hallucinations expand their reasoning models. O3 and O4-Mini perform better in some areas, including tasks related to coding and mathematics. However, they often created “more accurate claims and more accurate/hallucinations” according to the report because they “overall more claims.”
Openai found that the O3 was hallucinated by reaction to 33%of the questions about the company’s in -house benchmark, Personqa, to measure the accuracy of the knowledge of models. This is approximately twice that of Openai’s previous reasoning models O1 and O3-Mini’s hallucinations. O4-Mini was worse in PersonQa. This hallucinated 48%of the time.
The third -party test of Transluce, a non -profit AI Research Lab, also found evidence that the O3 tends to construct the actions taken in the process of reaching the answer. In one example, Transluce insisted that the O3 had run the code in the 2021 MacBook Pro “Chatgpt outside” and then copied the numbers to the answer. O3 can access some tools, but you can’t do it.
Neil Chowdhury, a researcher and former Openai employee translated into TechCrunch’s email, said, “Our hypothesis is that the reinforcement learning used in the O-series model can amplify the problem that is not generally unlocked by the pipeline after standard training.
Sarah Schwettmann, co -founder of Transluce, added that the hallucinations of the O3 could be useful unless they did.
KIAN KATANFOROOOSH, a stanford professor and CEO of Stanford Startup Workera, has already been tested for O3 in coding workflows on TechCrunch and is one level higher than competitors. But Katanforoosh says that O3 tends to hallen the website link where the O3 is broken. This model provides a link that does not work by clicking.
The hallucinations can help the model reaches interesting ideas and helps to create creativity in “thinking”, but some models make it a difficult sales for business in the most important market. For example, a legal company will not be satisfied with the model that inserts a lot of errors in the customer contract.
One of the promising approaches to increase the accuracy of the model is to provide web search functions. Openai’s GPT-4O, which has a web search in Simpleqa, one of Openai’s accuracy benchmarks, achieves 90% accuracy. Potentially search can improve the hallucinations of the reasoning model if the user is willing to expose the prompt to a third -party search provider.
Expanding the reasoning model actually makes the solution more urgent if the hallucinations continue to exacerbate.
Niko Felix, a spokesman for Openai, told TechCrunch, “It’s a continuous research field to deal with hallucinations in all models, and we’re continuously trying to improve their accuracy and reliability.
Last year, the wide range of AI industries were pivonged to focus on after -based models after the technology, which started to reduce the technology for improving traditional AI models. Inference improves model performance in various tasks without the need for a large amount of computing and data during training. But reasoning can also lead to more hallucinations.









