A collection of climate, air pollution, disaster, and nature resources data for the US and the world

Climate data

gridMET (historical)

gridMET is a dataset of daily high-spatial resolution (~4-km, 1/24th degree) surface meteorological data covering the contiguous US from 1979-yesterday (updated daily). The dataset blends the PRISM with the high temporal resolution data from the National Land Data Assimilation System (NLDAS). See the website

  • Coverage: The Contiguous United States
  • Temporal resolution: Daily
  • Spatial resolution: 4km * 4km (1/25 degree)
  • Variables: Maximum temperature, minimum temperature, precipitation accumulation, downward surface shortwave radiation, wind-velocity, humidity (maximum and minimum relative humidity and specific humidity and other derived variables.

Related reference

  • Noah S. Diffenbaugh, Frances V. Davenport and Marshall Burke, 2021. Historical warming has increased US crop insurance losses. Environmental Research Letters. Link.
  • David B. Lobell and Jennifer A. Burney, 2021. Cleanter air has contributed one-fifth of US maize and soybean yield gains since 1999. Environmental Research Letters. Link.

MACA (projected)

Multivariate Adaptive Constructed Analogs (MACA) is a statistical method for downscaling Global Climate Models (GCMs) from their native coarse resolution to a higher spatial resolution that reflects observed patterns of daily near-surface meteorology and simulated changes in GCMs experiments. This method has been shown to be slightly preferable to direct daily interpolated bias correction in regions of complex terrain due to its use of a historical library of observations and multivariate approach.

The dataset embraces 20 CMIP5 GCMs that provided daily output for historical (1950-2005) and future projections (2006-2100) under RCP4.5 and RCP8.5. See its website.

  • Coverage: The Contiguous United States
  • Temporal resolution: Daily
  • Spatial resolution: 4km * 4km (1/25 degree)
  • Variables: Maximum temperature, minimum temperature, maximum daily relative humidity, minimum, average daily specific humidity near surface, average daily precipitation, average daily downward shortwave radiation, average daily wind speed, average daily eastward component of wind, and average daily northward component of wind.

Note that Although the dataset has performed bias-corrections, these corrections are just to downscale the GCMs outputs to high-resolution data. When these data are used in emprical climate change assessments, they cannot be directly compared with historical observations that are used to estimate your model. This will cause inconsistency because you are comparing GCMs-based data with observed data. Instead, you have to compare the GCMs-based projected data with GCMs-based historical data and calculate the changes (i.e., delta).

WorldClim (historical + projected)

WorldClim is a database of high spatial resolution global weather and climate data. You can download gridded weather and climate data for historical (near current) and future conditions.

Historical data

The newest version was released in Jan 2020. Download here.

  • Coverage: Global
  • Temporal resolution: Monthly
  • Spatial resolution: Available at four spatial resolutions, between 1km * 1km (30 seconds) to 340km * 340km (10 minutes).
  • Variables: Maximum temperature, minimum temperature, total precipitation, solar radiation, wind speed, and water vapor pressure.

Projected data

The projected data are CMIP6 downscaled future climate projections. The downscaling and calibration (bias correction) was done with WorldClim v2.1 as baseline climate.

The monthly values were averages over 20 year periods (2021-2040, 2041-2060, 2061-2080, 2081-2100). The following spatial resolutions are available (expressed as minutes of a degree of longitude and latitude): 10 minutes, 5 minutes, 2.5 minutes, and 30 seconds (click the links to download).

  • Coverage: Global
  • Temporal resolution: Monthly
  • Spatial resolution: Available at four spatial resolutions, between 1km * 1km (30 seconds) to 340km * 340km (10 minutes).
  • Variables: Maximum temperature, minimum temperature, and total precipitation
  • Scenarios: SSP126, SSP245, SSP370, and SSP585.
  • GCMs: 23 models.

Related Reference

  • Shuai Chen and Binlei Gong, 2021. Response and adaptation of agriculture to climate change: Evidence from China. Journal of Development Economics. Link.

NEX-GDDP from NASA

NEX-GDDP stands for the NASA Earth Exchange Global Daily Downscaled Projections. The dataset includes downscaled projections for RCP 4.5 and RCP 8.5 from the 21 models and scenarios for which daily scenarios were produced and distributed under CMIP5. Each of the climate projections includes daily maximum temperature, minimum temperature, and precipitation for the periods from 1950 through 2100. The spatial resolution of the dataset is 0.25 degrees (~25 km x 25 km). See the link for more info.

Also note a new dataset developed for CMIP6 is also availiable, which consists of more climate models (35) and is generated for four SSP scenarios (SSP2-4.5, SSP5-8.5, SSP1-2.6, and SSP3-7.0). In addition to temperature and precipitation, climate variables such as near-surface relative humidity, specific humidity, radiations, wind speed, etc. are also availiable. See the link for more info.

  • Coverage: Global
  • Temporal resolution: Daily
  • Spatial resolution: 0.25 * 0.25 (25km * 25km)
  • Variables: Daily maximum temperature, minimum temperature, and precipitation
  • Scenarios: RCP4.5 and RCP8.5
  • GCMs: 21 models.

Related Reference

  • Garth Heutel, Nolan H. Miller, David Molitor; Adaptation and the Mortality Effects of Temperature across U.S. Climate Regions. The Review of Economics and Statistics 2021; 103 (4): 740–753. link.

Air pollution data

While the focus of this post is climate data and natural disaster data, I will also try to cover some air pollution data. Empirically, we would prefer “real” air pollution data, i.e., readings from monitor stations. The problem is that those data are usually not consistent in spatial and temporal scales. For instance, monitors tend to be placed in urban areas with high pollution density, whereas monitors in rural areas are pretty limited. Also due to regular maintenance and unexpected events, monitors may miss records to a large extent.

To address the above concerns, practitioners could turn to model-based air pollution data. An alternaive is MERRA-2 which contains a long list of air pollutants that are available at a wide variety of spatial and temporal resolutions.

Besides the model-generated data, I will also cover alternative data sources, especially those featuring machine learning. A recent study intergrates multiple machine learning algorithms and predictor variables (i.e., satellite data, meteorological variables, land-use variables, chemical transport model predictions, reanalysis datasets, etc.) to estimate daily PM2.5 at a resolution of 1km * 1km across the contiguous United States.

  • Coverage: Contiguous United States

  • Temporal resolution: Daily

  • Spatial resolution: 1km * 1km

  • Variables: PM2.5

  • Qian Di et al. An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution. Environment International 2019; 130: 104909. link.

  • A Patrick Behrer and David Lobell. Higher levels of no-till agriculture associated with lower PM2.5 in the Corn Belt. Environmental Research Letters 2022; 17: 094012. Link.

Natural disaster data

This collection includes data on but not limited to storms, wildfires, floods, hurricanes, hail, tornado, etc.

EM-DAT: A complete dataset for disasters

This database documents a long list of many disasters globally, including natural disasters, technological disaster (i.e., transport accident, industrial accident, etc.), and complex disaster (i.e., famine).

The database has several levels of data entry. The disaster level records general information about the disaster, including its type (i.e., earthquake, storm, flood, etc.) and group (i.e., geophysical, meteorological, hydrological, etc.). The county-level provides information on the specific location of the disaster and the temporal information (i.e., starting and ending date) as well as the magnitude scale of the disaster. Level three data provide the source of information, human impact, and economic impacts.

Note that the disasters are collected into the database if they meet one of the following criteria:

  • Deaths: 10 or more people deaths

  • Affected: 100 or more people affected/injured/homeless.

  • Declaration/international appeal: Declaration by the country of a state of emergency and/or an appeal for international assistance.

The dataset is publically assessable. But you need to first register an account on the website.

NOAA NCEI Storm Events Database

The Storm Events Database covers overall 48 disasters (mostly storms) spanning from January 1950 to January 2022, as entered by NOAA’s National Weather Service (NWS). More importantly, it also covers the damages caused by the disaster, such as deaths, injuries, damages on property and a lot of others. Refer to the Database Details page for more information.

  • Coverage: US
  • Time period: Jan 1950 to Jan 2022 (regularly updated)

Related Reference

  • Deryugina, T., & Marx, B. 2021. Is the Supply of Charitable Donations Fixed? Evidence from Deadly Tornadoes. American Economic Review: Insights. Link.

International Best Track Archive for Climate Stewardship (IBTrACS)

IBTrACS is the most complete global collection of tropical cyclones available. It creates a unified, publicly available, best-track dataset that improves inter-agency comparisons. IBTrACS was developed collaboratively with all the World Meteorological Organization (WMO) Regional Specialized Meteorological Centres, as well as other organizations and individuals from around the world, and is maintained by NOAA.

See its page for more information.

There are three key variables in the dataset: Maximum sustained wind speed (in knots), minimum central pressure (in mb), and storm center of circulation (in degrees lat/lon). The data are recorded at a temporal resolution of 3 hourly and a spatial resolution of 0.1 degree (~10 km). It documents events even back to 1920s. But data are more reliable after 1950s.

The data are provided in the format of CSV, netCDF, and shapefiles.

Related Reference

  • Bao, X., Sun, P., Li, J. 2022. The impacts of tropical storms on food prices: Evidence from China. American Journal of Agricultural Economics. Link.

Fire and Resource Assessment Program (FRAP) in California

This program provides the most complete digital record of fire perimeters in California. For each fire, the data includes the year of the fires, the cause of the fires, and a GIS layers depicting the area burned. See the website for more details. The data are regularly updated. As of May 2022, the newest data is 2017.

Besides the fire perimeter data, the program also provides projected (2030-2050) fire probability and discloses information of properities with high risks of fire.

Related Reference

  • Yanjun Liao and Carolyn Kousky. 2022. The Fiscal Impacts of Wildfires on California Municipalities. Journal of the Association of Environmental and Resource Economists. Link.

FireCCILT11 Long-term Burned Area Dataset

This dataset provides the long-term beta global burned area (BA) dataset generated by the Fire_cci project. The scope of the project is to provide long-term burned area information for global vegetation and atmospheric modellers. Go to this link to download.

  • Coverage: Global
  • Time period: 1982 to 2018
  • Temporal resolution: 5km * 5km and 25km * 25km

Related Reference

  • Hemant Pullabhotla, Mustafa Zahid, Sam Heft-Neal, Vaibhav Rathi, Marshall, Burke. 2022. Global biomass fires and infant mortality. Submitted to Nature.

Natural resource data

FAO Land Data - Global Agro-Ecological Zones (GAEZ v4)

The Food and Agriculture Organization of the United Nations (FAO) and the International Institute for Applied Systems Analysis (IIASA) have cooperated over several decades to develop and implement the Agro-Ecological Zones (AEZ) modelling framework and databases. The GAEZ v4 spatial data are organized in six themes: (1) Land and Water Resources, (2) Agro-climatic Resources, (3) Agro-climatic Potential Yield, (4) Suitability and Attainable Yield, (5) Actual Yields and Production, and (6) Yield and Production Gaps. There are a list of variables available in the dataset, including land quality, land cover, soil resources, soil suitability, terrain resources, water resources, etc. See this link for more details.

  • Coverage: Global
  • Time period: Most recent
  • Temporal resolution: 1km * 1km and 10km * 10km

Related Reference

  • Adamopoulos, T. and Restuccia, D. 2021. Geography and agricultural productivity: Cross-country evidence from micro plot-level data. Review of Economic Studies. Link.
  • Adamopoulos, T., Brandt L., and Leight Jessica. 2022. Misallocation, selection and productivity: A quantitative analysis with panel data from China. Econometrica.