
Flash floods are one of the world’s deadliest weather phenomena, claiming more than 5,000 lives every year. They are also one of the most difficult to predict. But I think Google solved this problem in an unexpected way by reading the news.
Although humans have collected a lot of meteorological data, flash floods are too brief and localized to be measured comprehensively, nor are temperatures or river flows monitored over time. This data gap means that deep learning models, which are increasingly capable of predicting weather, are unable to predict flash floods.
To solve this problem, Google researchers used Gemini, Google’s large-scale language model, to classify 5 million news articles from around the world, isolate 2.6 million different reports of flooding, and then transform those reports into a geotagged time series called ‘Groundsource’. According to Google Research product manager Gila Loike, this is the first time the company has used language models for this kind of task. The study and data set were shared publicly Thursday morning.
Using Groundsource as a real-world baseline, the researchers trained a model built on long-short-term memory (LSTM) neural networks to collect global weather forecasts and generate flash flood probabilities for specific regions.
Google’s flash flood prediction model is now highlighting risks in urban areas in 150 countries on the company’s Flood Hub platform and sharing that data with emergency response agencies around the world. António José Beleza, an emergency response officer at the Southern African Development Community who tested the prediction model with Google, said it helped his organization respond more quickly to flooding.
The model still has limitations. First of all, the resolution to identify hazards in a 20 square kilometer area is quite low. And it’s not as accurate as the National Weather Service’s flood warning system. One reason is that Google’s model doesn’t incorporate local radar data that can track precipitation in real time.
But importantly, the project is designed to work where local governments cannot afford to invest in expensive weather sensing infrastructure or do not have extensive weather data records.
Tech Crunch Event
San Francisco, California
|
October 13-15, 2026
“Because we’re aggregating millions of reports, the Groundsource dataset really helps us rebalance the map,” Juliet Rothenberg, a program manager on the Google Resilience team, told reporters this week. “This allows us to extrapolate to other areas where we don’t have as much information.”
Rothenberg said the team hopes that using the LLM to develop quantitative data sets from written, qualitative sources can be applied to efforts to build data sets for other phenomena that are transient but important for forecasting, such as heat waves and mudslides.
Marshall Moutenot, CEO of Upstream Tech, a company that uses similar deep learning models to predict river flow for customers such as hydroelectric companies, said Google’s contribution is part of a growing effort to collect data for deep learning-based weather forecast models. Moutenot co-founded Dynamical.org, a group that curates collections of machine learning-enabled weather data for researchers and startups.
“Data scarcity is one of the most difficult challenges in geophysics,” Moutenot said. “At the same time, there’s so much Earth data out there and not enough when you want to weigh it against the truth. This was a really creative approach to getting that data.”









