
ArXiv, a widely used open repository for preprint research, is doing more to crack down on the careless use of large-scale language models in scientific papers.
Although papers are posted to the site before they have been peer-reviewed, arXiv (pronounced “archive”) has become one of the primary ways in which research circulates in fields such as computer science and mathematics, and the site itself has become a data source for trends in scientific research.
ArXiv has already taken steps to combat the growing number of low-quality AI-generated papers. For example, we took steps such as requiring the first posters to be endorsed by famous authors. And the organization, which has been hosted by Cornell for more than 20 years, has become an independent nonprofit, allowing it to raise more funds to tackle problems like AI slop.
Recently, arXiv’s Computer Science Chair Thomas Dietterich posted on Thursday: “If a submission contains incontrovertible evidence that the authors did not check the LLM-generated results, this means that nothing in the paper can be trusted.”
Incontrovertible evidence could include things like “psychedelic references” or comments about LLM, Dietterich said. If such evidence is discovered, the paper’s authors face “a one-year ban from arXiv and a requirement that any subsequent arXiv submissions must first be accepted by a reputable peer review venue.”
This is not a complete ban on the use of LLMs, but rather an insistence that authors take “full responsibility” for their content, as Dietterich said, “regardless of how the content was created.” Therefore, researchers are still liable if they copy and paste “inappropriate language, plagiarized content, biased content, errors, mistakes, misreferences, or misleading content” directly from LLM.
Dietterich told 404 Media that this would be a “one strike” rule, but the arbitrator would have to flag the issue and the section chair would have to see the evidence before imposing a penalty. Authors may appeal the decision.
A recent peer-reviewed study found that falsified citations in biomedical research are increasing due to LLMs. To be fair, scientists aren’t the only ones caught using AI-generated quotes.
If you purchase through links in our articles, we may receive a small commission. This does not affect our editorial independence.