A collection of climate, air pollution, disaster, and nature resources data for the US and the world

Climate data

gridMET (historical)

gridMET is a dataset of daily high-spatial resolution (~4-km, 1/24th degree) surface meteorological data covering the contiguous US from 1979-yesterday (updated daily). The dataset blends the PRISM with the high temporal resolution data from the National Land Data Assimilation System (NLDAS). See the website

  • Coverage: The Contiguous United States
  • Temporal resolution: Daily
  • Spatial resolution: 4km * 4km (1/25 degree)
  • Variables: Maximum temperature, minimum temperature, precipitation accumulation, downward surface shortwave radiation, wind-velocity, humidity (maximum and minimum relative humidity and specific humidity and other derived variables.

Related reference

  • Noah S. Diffenbaugh, Frances V. Davenport and Marshall Burke, 2021. Historical warming has increased US crop insurance losses. Environmental Research Letters. Link.
  • David B. Lobell and Jennifer A. Burney, 2021. Cleanter air has contributed one-fifth of US maize and soybean yield gains since 1999. Environmental Research Letters. Link.

MACA (projected)

Multivariate Adaptive Constructed Analogs (MACA) is a statistical method for downscaling Global Climate Models (GCMs) from their native coarse resolution to a higher spatial resolution that reflects observed patterns of daily near-surface meteorology and simulated changes in GCMs experiments. This method has been shown to be slightly preferable to direct daily interpolated bias correction in regions of complex terrain due to its use of a historical library of observations and multivariate approach.

The dataset embraces 20 CMIP5 GCMs that provided daily output for historical (1950-2005) and future projections (2006-2100) under RCP4.5 and RCP8.5. See its website.

  • Coverage: The Contiguous United States
  • Temporal resolution: Daily
  • Spatial resolution: 4km * 4km (1/25 degree)
  • Variables: Maximum temperature, minimum temperature, maximum daily relative humidity, minimum, average daily specific humidity near surface, average daily precipitation, average daily downward shortwave radiation, average daily wind speed, average daily eastward component of wind, and average daily northward component of wind.

Note that Although the dataset has performed bias-corrections, these corrections are just to downscale the GCMs outputs to high-resolution data. When these data are used in empirical climate change assessments, they cannot be directly compared with historical observations that are used to estimate your model. This will cause inconsistency because you are comparing GCMs-based data with observed data. Instead, you have to compare the GCMs-based projected data with GCMs-based historical data and calculate the changes (i.e., delta).

WorldClim (historical + projected)

WorldClim is a database of high spatial resolution global weather and climate data. You can download gridded weather and climate data for historical (near current) and future conditions.

Historical data

The newest version was released in Jan 2020. Download here.

  • Coverage: Global
  • Temporal resolution: Monthly
  • Spatial resolution: Available at four spatial resolutions, between 1km * 1km (30 seconds) to 340km * 340km (10 minutes).
  • Variables: Maximum temperature, minimum temperature, total precipitation, solar radiation, wind speed, and water vapor pressure.

Projected data

The projected data are CMIP6 downscaled future climate projections. The downscaling and calibration (bias correction) was done with WorldClim v2.1 as baseline climate.

The monthly values were averages over 20 year periods (2021-2040, 2041-2060, 2061-2080, 2081-2100). The following spatial resolutions are available (expressed as minutes of a degree of longitude and latitude): 10 minutes, 5 minutes, 2.5 minutes, and 30 seconds (click the links to download).

  • Coverage: Global
  • Temporal resolution: Monthly
  • Spatial resolution: Available at four spatial resolutions, between 1km * 1km (30 seconds) to 340km * 340km (10 minutes).
  • Variables: Maximum temperature, minimum temperature, and total precipitation
  • Scenarios: SSP126, SSP245, SSP370, and SSP585.
  • GCMs: 23 models.

Related Reference

  • Shuai Chen and Binlei Gong, 2021. Response and adaptation of agriculture to climate change: Evidence from China. Journal of Development Economics. Link.

NEX-GDDP from NASA

NEX-GDDP stands for the NASA Earth Exchange Global Daily Downscaled Projections. The dataset includes downscaled projections for RCP 4.5 and RCP 8.5 from the 21 models and scenarios for which daily scenarios were produced and distributed under CMIP5. Each of the climate projections includes daily maximum temperature, minimum temperature, and precipitation for the periods from 1950 through 2100. The spatial resolution of the dataset is 0.25 degrees (~25 km x 25 km). See the link for more info.

Also note a new dataset developed for CMIP6 is also available, which consists of more climate models (35) and is generated for four SSP scenarios (SSP2-4.5, SSP5-8.5, SSP1-2.6, and SSP3-7.0). In addition to temperature and precipitation, climate variables such as near-surface relative humidity, specific humidity, radiations, wind speed, etc. are also available. See the link for more info.

  • Coverage: Global
  • Temporal resolution: Daily
  • Spatial resolution: 0.25 * 0.25 (25km * 25km)
  • Variables: Daily maximum temperature, minimum temperature, and precipitation
  • Scenarios: RCP4.5 and RCP8.5
  • GCMs: 21 models.

Related Reference

  • Garth Heutel, Nolan H. Miller, David Molitor; Adaptation and the Mortality Effects of Temperature across U.S. Climate Regions. The Review of Economics and Statistics 2021; 103 (4): 740–753. link.

ERA5-Land Hourly Data (historical)

ERA5-Land is a reanalysis dataset providing a consistent view of the evolution of land variables over several decades at an enhanced resolution compared to ERA5. The most attactive feature of ERA5-Land is that it provides hourly climate data at a fine-scaled saptial resolution. See the link for more details.

  • Coverage: Global

  • Temporal resolution: Hourly

  • Spatial resolution: 0.1 * 0.1 (9km * 9km)

  • Variables: Wind speed, temperature, evaporation, lake temperature, potential evaporation, runoff, snowfall, snowmelt, soil temperature, total precipitation, etc.

  • Dylan Hogan and Wolfram Schlenker. ERA5-Land and GMFD Uncover The Effect of Daily Temperature Extremes On US Agricultural Yield. Working Paper. link.

Air pollution data

While the focus of this post is climate data and natural disaster data, I will also try to cover some air pollution data. Empirically, we would prefer “real” air pollution data, i.e., readings from monitor stations. The problem is that those data are usually not consistent in spatial and temporal scales. For instance, monitors tend to be placed in urban areas with high pollution density, whereas monitors in rural areas are pretty limited. Also due to regular maintenance and unexpected events, monitors may miss records to a large extent.

To address the above concerns, practitioners could turn to model-based air pollution data. An alternative is MERRA-2 which contains a long list of air pollutants that are available at a wide variety of spatial and temporal resolutions.

Besides the model-generated data, I will also cover alternative data sources, especially those featuring machine learning. A recent study integrates multiple machine learning algorithms and predictor variables (i.e., satellite data, meteorological variables, land-use variables, chemical transport model predictions, reanalysis datasets, etc.) to estimate daily PM2.5 at a resolution of 1km * 1km across the contiguous United States.

  • Coverage: Contiguous United States

  • Temporal resolution: Daily

  • Spatial resolution: 1km * 1km

  • Variables: PM2.5

  • Qian Di et al. An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution. Environment International 2019; 130: 104909. link.

  • A Patrick Behrer and David Lobell. Higher levels of no-till agriculture associated with lower PM2.5 in the Corn Belt. Environmental Research Letters 2022; 17: 094012. Link.

Natural disaster data

This collection includes data on but not limited to storms, wildfires, floods, hurricanes, hail, tornado, etc.

EM-DAT: A complete dataset for disasters

This database documents a long list of many disasters globally, including natural disasters, technological disaster (i.e., transport accident, industrial accident, etc.), and complex disaster (i.e., famine).

The database has several levels of data entry. The disaster level records general information about the disaster, including its type (i.e., earthquake, storm, flood, etc.) and group (i.e., geophysical, meteorological, hydrological, etc.). The county-level provides information on the specific location of the disaster and the temporal information (i.e., starting and ending date) as well as the magnitude scale of the disaster. Level three data provide the source of information, human impact, and economic impacts.

Note that the disasters are collected into the database if they meet one of the following criteria:

  • Deaths: 10 or more people deaths

  • Affected: 100 or more people affected/injured/homeless.

  • Declaration/international appeal: Declaration by the country of a state of emergency and/or an appeal for international assistance.

The dataset is publicly accessable. But you need to first register an account on the website.

NOAA NCEI Storm Events Database

The Storm Events Database covers overall 48 disasters (mostly storms) spanning from January 1950 to January 2022, as entered by NOAA’s National Weather Service (NWS). More importantly, it also covers the damages caused by the disaster, such as deaths, injuries, damages on property and a lot of others. Refer to the Database Details page for more information.

  • Coverage: US
  • Time period: Jan 1950 to Jan 2022 (regularly updated)

Related Reference

  • Deryugina, T., & Marx, B. 2021. Is the Supply of Charitable Donations Fixed? Evidence from Deadly Tornadoes. American Economic Review: Insights. Link.

International Best Track Archive for Climate Stewardship (IBTrACS)

IBTrACS is the most complete global collection of tropical cyclones available. It creates a unified, publicly available, best-track dataset that improves inter-agency comparisons. IBTrACS was developed collaboratively with all the World Meteorological Organization (WMO) Regional Specialized Meteorological Centres, as well as other organizations and individuals from around the world, and is maintained by NOAA.

See its page for more information.

There are three key variables in the dataset: Maximum sustained wind speed (in knots), minimum central pressure (in mb), and storm center of circulation (in degrees lat/lon). The data are recorded at a temporal resolution of 3 hourly and a spatial resolution of 0.1 degree (~10 km). It documents events even back to 1920s. But data are more reliable after 1950s.

The data are provided in the format of CSV, netCDF, and shapefiles.

Related Reference

  • Bao, X., Sun, P., Li, J. 2022. The impacts of tropical storms on food prices: Evidence from China. American Journal of Agricultural Economics. Link.

The Tropical Cyclone Extended Best Track Dataset (EBTRK)

While IBTrACS is a global collection of tropical cyclones available, EBTRK focuses primarily on the United States. It contains all Atlantic tropical cyclones since 1851, including estimates of the latitude, longitude, 1-minute maximum sustained surface winds, etc., at 6-hour intervals. The term “extended” was created by supplementing the original storm parameters. Those additional parameters include the following:

  • Maximum radial extent of 34, 50 and 64 knot wind speeds in four quadrants for the years 1988 – 2003
  • Radius of maximum wind speed
  • Eye diameter (if available)
  • Pressure and radius of the outer closed isobar

See its page for more information.

The data were designed for easy importation into spreadsheets.

Related Reference

  • Zivin, G.J., Liao, Y., Panassie, Y. 2023. How hurricanes sweep up housing markets: Evidence from Florida. Journal of Environmental Economics and Management. Link.

Fire and Resource Assessment Program (FRAP) in California

This program provides the most complete digital record of fire perimeters in California. For each fire, the data includes the year of the fires, the cause of the fires, and a GIS layers depicting the area burned. See the website for more details. The data are regularly updated. As of May 2022, the newest data is 2017.

Besides the fire perimeter data, the program also provides projected (2030-2050) fire probability and discloses information of properties with high risks of fire.

Related Reference

  • Yanjun Liao and Carolyn Kousky. 2022. The Fiscal Impacts of Wildfires on California Municipalities. Journal of the Association of Environmental and Resource Economists. Link.

FireCCILT11 Long-term Burned Area Dataset

This dataset provides the long-term beta global burned area (BA) dataset generated by the Fire_cci project. The scope of the project is to provide long-term burned area information for global vegetation and atmospheric modelers. Go to this link to download.

  • Coverage: Global
  • Time period: 1982 to 2018
  • Temporal resolution: 5km * 5km and 25km * 25km

Related Reference

  • Hemant Pullabhotla, Mustafa Zahid, Sam Heft-Neal, Vaibhav Rathi, Marshall, Burke. 2022. Global biomass fires and infant mortality. Submitted to Nature.

Natural resource data

FAO Land Data - Global Agro-Ecological Zones (GAEZ v4)

The Food and Agriculture Organization of the United Nations (FAO) and the International Institute for Applied Systems Analysis (IIASA) have cooperated over several decades to develop and implement the Agro-Ecological Zones (AEZ) modelling framework and databases. The GAEZ v4 spatial data are organized in six themes: (1) Land and Water Resources, (2) Agro-climatic Resources, (3) Agro-climatic Potential Yield, (4) Suitability and Attainable Yield, (5) Actual Yields and Production, and (6) Yield and Production Gaps. There are a list of variables available in the dataset, including land quality, land cover, soil resources, soil suitability, terrain resources, water resources, etc. See this link for more details.

  • Coverage: Global
  • Time period: Most recent
  • Temporal resolution: 1km * 1km and 10km * 10km

Related Reference

  • Adamopoulos, T. and Restuccia, D. 2021. Geography and agricultural productivity: Cross-country evidence from micro plot-level data. Review of Economic Studies. Link.
  • Adamopoulos, T., Brandt L., and Leight Jessica. 2022. Misallocation, selection and productivity: A quantitative analysis with panel data from China. Econometrica.

Vegetation products

Landsat Normalized Difference Vegetation Index (NDVI)

The Landsat NDVI is one of the most widely used vegetation index. It is produced from Landsat 4–5 Thematic Mapper (TM), Landsat 7 Enhanced Thematic Mapper Plus (ETM+), and Landsat 8-9 Operational Land Imager (OLI)/Thermal Infrared Sensor (TIRS) Collection 1 and Collection 2 scenes that have been processed to Landsat Level-2 Surface Reluctance products. NDVI is used to quantify vegetation greenness and is useful in understanding vegetation density and assessing changes in plant health. See this link%20%2F%20(NIR%20%2B%20R)) for more details.

Besides the NDVI product, the USGS also hosts other widely used data sets, including Enhanced Vegetation Index (EVI), Soil Adjusted Vegetation Index (SAVI), Modified Soil Adjusted Vegetation Index (MSAVI), Normalized Burn Ratio (NBR), Normalized Difference Snow Index (NDSI). See here for more details.

Normally, we use data the Level-3 (also known as Level-2 Science Products, Analysis Ready Data in USGS), such as Landsat Surface Reluctance, Landsat Surface Temperature, Landsat Surface Reluctance-Derived Spectral Indices (vegetation, moisture index, etc.).

NOAA Vegetation Products

The Office of Satellite And Product Operations at NoAA provides a list of vegetation products at various spatial (4km, 1km) and temporal levels (daily, weekly, bi-weekly), including Vegetation Index (VI), Green Vegetation Fraction (GVF), Vegetation Health Product (VVHP), and VIIRS Surface Reluctance (NDVI). See the link for more details.

Importantly, The Office of Satellite And Product Operations at NoAA also provides other satellite-based data, specifically Atmosphere Products (Aerosol, Precipitation, Clouds, Winds, Volcanic Ash, Ozone, Soundings, etc), Imagery Products, Land Products (Fire and Smoke, Surface and Hydrology, Snow and Ice, Vegetation), Ocean Products (Coral Bleaching, Ocean Heat Content, Marine Pollution, Sea Surface Temperature, Ocean Color, Sea Surface Temperature).