Very high resolution canopy height maps from RGB imagery using self-supervised vision transformer and convolutional decoder trained on aerial lidar

doi:10.1016/j.rse.2023.113888

Remote Sensing of Environment

Volume 300, 1 January 2024, 113888

https://doi.org/10.1016/j.rse.2023.113888 Get rights and content

Under a Creative Commons license

Open access

Highlights

•
Very high resolution canopy height maps at jurisdictional scale are released.
•
Improved performance from vision transformers based on Self-Supervised Learning (SSL).
•
First use of SSL and vision transformers for canopy height estimation.
•
Low resolution GEDI and high resolution aerial lidar predictions are combined.
•
Model generalizes well to aerial imagery, even though trained with satellite images.

Abstract

Vegetation structure mapping is critical for understanding the global carbon cycle and monitoring nature-based approaches to climate adaptation and mitigation. Repeated measurements of these data allow for the observation of deforestation or degradation of existing forests, natural forest regeneration, and the implementation of sustainable agricultural practices like agroforestry. Assessments of tree canopy height and crown projected area at a high spatial resolution are also important for monitoring carbon fluxes and assessing tree-based land uses, since forest structures can be highly spatially heterogeneous, especially in agroforestry systems. Very high resolution satellite imagery (less than one meter (1 m) Ground Sample Distance) makes it possible to extract information at the tree level while allowing monitoring at a very large scale. This paper presents the first high-resolution canopy height map concurrently produced for multiple sub-national jurisdictions. Specifically, we produce very high resolution canopy height maps for the states of California and São Paulo, a significant improvement in resolution over the ten meter (10 m) resolution of previous Sentinel / GEDI based worldwide maps of canopy height. The maps are generated by the extraction of features from a self-supervised model trained on Maxar imagery from 2017 to 2020, and the training of a dense prediction decoder against aerial lidar maps. We also introduce a post-processing step using a convolutional network trained on GEDI observations. We evaluate the proposed maps with set-aside validation lidar data as well as by comparing with other remotely sensed maps and field-collected data, and find our model produces an average Mean Absolute Error (MAE) of 2.8 m and Mean Error (ME) of 0.6 m.

Graphical abstract

Keywords

LIDAR

GEDI

Canopy height

Deep learning

Self-supervised learning

Vision transformers

Data availability

Input imagery is licensed by Maxar, and not available publicly. We share the derived maps of canopy height under Creative Commons 4.0 and are available for public download.