What a Wildfire Algorithm and a Supply Chain Model Have in Common

The organizations that win at AI are the ones that build data engineering infrastructure that can handle heterogeneous, imperfect, multi-source data — not the ones that wait for perfect data before starting.

In August 2021, the Dixie Fire burned across four California counties, becoming the largest single fire in California history at the time. During its 104-day progression, the U.S. Forest Service's fire behavior analysts used a combination of satellite imagery updated every 12 minutes, weather station data updated every hour, lidar terrain models updated annually, historical fire behavior records spanning 40 years in inconsistent formats, and real-time ground observations from incident commanders transmitted via radio.

Integrating these data sources into a coherent fire spread model that supports real-time suppression and evacuation decisions is one of the hardest data engineering problems in applied AI. The data is heterogeneous in format, inconsistent in update frequency, variable in spatial resolution, and incomplete in ways that are not random — the areas with the least data coverage are often the most remote areas where fires are most likely to start. It is the Bronze-Silver-Gold problem in its most extreme, highest-stakes form.

The Wildfire Data Stack

The Bronze layer of wildfire prediction data is a heterogeneous collection of raw sources that would be unrecognizable as a unified dataset to anyone who had not spent years building the pipelines to connect them. Satellite imagery from NASA's GOES-West and Aqua satellites arrives as raw radiance measurements in scientific data formats. Weather station data arrives as comma-separated text files in dozens of slightly different schemas from stations operated by different agencies. Terrain models are stored as raster files in geographic coordinate systems that require projection transformation to be combined with other data. Historical fire records exist in paper documents that have been partially digitized at varying quality levels.

The Silver layer performs the transformations that make these sources combinable: imagery is converted from radiance to temperature and vegetation index values and resampled to a common spatial grid; weather data is quality-controlled, spatially interpolated to fill gaps between stations, and harmonized across the different schemas; terrain models are derived into slope, aspect, and fuel moisture holding capacity values that are meaningful for fire behavior; historical records are digitized, geocoded, and normalized to a common event schema.

The Gold layer is purpose-built for specific decision support applications: a fire spread probability surface for incident commander briefings, updated every 30 minutes; a structure vulnerability index for evacuation prioritization; a resource allocation optimization model for air tanker and crew deployment. Each Gold dataset is designed around the specific decision it supports and the specific consumer who uses it.

The Imperfect Data Framework

The most important lesson from wildfire AI is that waiting for perfect data is not an option — and it should not be an option in enterprise AI either. The Forest Service does not wait for complete satellite coverage before modeling fire spread. It builds uncertainty quantification into its models, represents areas of incomplete data as confidence intervals rather than hard predictions, and designs its decision support tools to communicate data quality explicitly to the human decision-makers who use them.

This is the mature posture toward imperfect data: not to treat data gaps as a barrier to AI deployment, but to model the gaps explicitly and build systems that help human decision-makers understand where the AI is confident and where it is not.

A Risk-Weighted Approach to Bronze-Layer Data Triage

Identify the highest-consequence decisions your AI will support and the data sources those decisions depend on most heavily. These are your highest-priority data quality investments — not the sources that are easiest to improve, but the ones where improvement most reduces decision risk.
For each critical data source, assess two dimensions: completeness (what percentage of relevant events are captured?) and accuracy (when events are captured, how reliable are the measurements?). These two dimensions require different interventions.
Build explicit data quality metadata into every Bronze table: a completeness score, an accuracy assessment, the date of the last quality audit, and the known systematic gaps. Make this metadata visible to every downstream consumer, including AI models.
Design Gold tables and AI model outputs to communicate uncertainty proportional to data quality. A prediction made on complete, high-accuracy data should communicate higher confidence than a prediction made on sparse, uncertain data.

Homogeneous confidence signals in an AI output are a red flag that uncertainty is being hidden, not resolved.

The Supply Chain Connection

The parallel to supply chain AI is exact. Supply chain models depend on data from suppliers who report at different frequencies in different formats with different levels of accuracy, from logistics providers who have varying levels of EDI integration, from demand signals that are sparse in some categories and dense in others, and from inventory systems that may be accurate at the time of a physical count and progressively less accurate until the next count.

Supply chain organizations that have built effective AI — Amazon, Walmart, Zara — treat their data supply chain with the same engineering rigor that the Forest Service applies to its wildfire data: explicit quality metadata, uncertainty quantification in model outputs, and investment in data quality proportional to the consequence of the decisions those data sources support. The method is identical. The domain is different.

THE UNIVERSAL LESSON

You will never have perfect data. No one does. The question is not whether your data is perfect but whether you have built the data infrastructure to know exactly how imperfect it is, in what ways, and with what consequences for the AI models that consume it. The Forest Service knows its wildfire data has gaps. It models those gaps. It communicates them. It makes better decisions as a result. Enterprise AI programs that pretend their data is better than it is produce AI systems that fail in production. Enterprise AI programs that honestly characterize their data's limitations produce systems that work — within those limitations — and that earn the organizational trust to be improved over time. The Dixie Fire was eventually contained. The data infrastructure that supported that effort was not perfect. It was honest about its imperfections, designed to function despite them, and continuously improving. That is the model for enterprise AI that works in the real world — not the demonstration environment, not the controlled pilot, but the messy, heterogeneous, imperfect world that every production AI system eventually has to face.

AIintheWild DataIntelligenceSeries OneBigTable

Follow OBT on LinkedIn

What a Wildfire Algorithm and a Supply Chain Model Have in Common

The Wildfire Data Stack

The Imperfect Data Framework

A Risk-Weighted Approach to Bronze-Layer Data Triage

The Supply Chain Connection

Is your data ready for what AI needs?