
As LLMs become more powerful, avoiding hallucinations has proven stubbornly difficult. Even the smartest models make errors, and while there are ways to catch these errors, the industry is still figuring out the best method.
Perhaps the company, which has raised $9 million in seed funding from Andreessen Horowitz, is trying to build a more rigorous way to catch such errors.
As founder Peter Elias (pictured above) puts it, the company’s goal is to prevent hallucinations and simple factual errors from reaching users and to achieve 99.99% accuracy, which is common in deterministic systems but much harder to achieve in AI. As a result, bringing LLM to that level of accuracy requires rethinking many of the fundamental assumptions of AI engineering.
Perhaps your first product will be a data science tool built to generate fast answers from complex data sets. Each result is provided with a citation and audit trail of how it was developed, a practice that is increasingly common in AI tools.
But preventing errors from entering these summaries required a sophisticated harnessing system that Elias describes as a “data science machine suit.” The LLM’s first-pass answers are checked against a deterministic validator system, which returns all results that do not match the dataset. Crucially, LLM has trained verifiers and the entire system is optimized for fast and accurate answers, the company said.
“What we learned from building this is that the better the harness engineering, the weaker the model can be,” says Elias. “If you can refine the context sufficiently, the model doesn’t have to work as hard to do the right thing. Basically, it’s an exercise in reducing ambiguity.”
This allows Maybe’s data science tools to run on much smaller AI models. Elias says the current version runs on a model that is “four classes weaker than the frontier model.” This means it can run on local hardware (i.e. desktop computers instead of data centers), drastically reducing the token costs associated with using AI.
With token costs rising and many customers reevaluating their AI budgets, this is a welcome idea. And Elias’ ideas don’t end with data science. That’s because the same engine can be extended to use cases like accounting or healthcare (“any use case that is sensitive to precision,” as Elias puts it).
“I think it’s really interesting that no large AI lab has even attempted this,” says Elias. “It’s better not to do that because the more times you have to modify your model, the more money you make.”
If you purchase through links in our articles, we may receive a small commission. This does not affect our editorial independence.









