Yang, L., et al. (2018). "A new generation of the United States National Land Cover Database: Requirements, research priorities, design, and implementation strategies." ISPRS Journal of Photogrammetry and Remote Sensing 146: 108–123.
These data were collected using funding from the U.S. Government and can be used without additional permissions or fees. If you use these data in a publication, presentation, or other research product please use the following citation:
USDA Forest Service. 2019. NLCD 2016 Tree Canopy Cover (CONUS). Salt Lake City, UT.
Appropriate use includes regional to national assessments of tree cover, total extent of tree cover, aggregated summaries of tree cover, and construction of cartographic products.
The random forests regression algorithm (R Core Team 2017; Cutler et al. 2007; Breiman 2001) employed in creating this product calculates the mean of squared residuals along with percent variability explained by the model for assessing prediction reliability. The random forests models consisted of 500 decision trees, which were used to determine the final response value. The response of each tree depended on a randomly chosen subset of predictor variables chosen independently (with replacement) for evaluation by that tree. The responses of the trees were averaged to obtain an estimate of the dependent variable. Because the random forests bias correction option was used, it was possible to obtain estimates less than 0 or greater than 100. These estimates were reset to either 0 or 100. The estimates were also rounded to the nearest integer. The standard error is the square root of the variance of the estimates given by all trees. A summary of the random forests models is available in the supplemental metadata for the “FS-Analytical” version of the TCC products.
Breiman, L. 2001. Random forests. Machine Learning 45:15–32.
Cutler, R.D.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J. 2007. Random forests for classification in ecology. Ecology 88 (11):2783-2792.
R Core Team. 2017. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL www.R-project.org.
Baig, M.H.A.; Zhang, L.; Shuai, T.; Tong, Q. 2014 Derivation of a tasselled cap transformation based on Landsat 8 at-satellite reflectance, Remote Sensing Letters 5(5):423-431
Chander, G.; Markham, B.L.; Helder, D.L. 2009. Summary of current radiometric calibration coefficients for Landsat MSS, TM, ETM+, and EO-1 ALI sensors. Remote Sensing of Environment 113(2009): 893-903.
Ruefenacht, B. 2016. Comparison of three Landsat TM compositing methods: a case study using modeled tree canopy cover. Photogrammetric Engineering & Remote Sensing 82(3):199-211.
Zhu, Z.; Woodcock, C.E. 2012. Object-based cloud and cloud shadow detection in Landsat imagery. Remote Sensing of Environment. 118(2012): 83-94.
Seven major steps were employed to map tree canopy cover: collection of reference data, acquisition and/or creation of predictor layers, calibration of random forests regression models for each mapping area using reference data and predictor layers, application of those models to predict per-pixel tree canopy cover across the entire mapping area, development of thresholds for filtering pixels with high uncertainty, and creation of the CONUS-wide mosaic. A seventh step was applied to build a three-layer integrated data stack, of which this 2016 NLCD TCC dataset is a component. This 2016 NLCD dataset is one of three integrated layers that were designed to fit the criterion of “Time 1 + change = Time 2”, a need of many of NLCD users. The methodology is described further below and in Coulston et al. (2012) and Ruefenacht (2016).
Step 1: Reference data for the nominal 2016 TCC products were generated via photographic interpretation of high spatial resolution images acquired by the National Agricultural Imagery Program (NAIP). The initial reference data were collected and supplied by the U.S. Forest Service Forest Inventory and Analysis (FIA) program. Of the initial 63,000 sites, about 2100 sites were identified as potentially changed between the nominal years of 2011 and 2016, through analysis of fire and NDVI data. For those sites identified as potentially changed, remote sensing analysts reviewed and then reinterpreted the tree canopy cover conditions if needed. The spatial distribution of the sample points follows the FIA quasi-systematic grid (Brand et al. 2000).
Step 2: Predictor layers included Landsat 8 OLI composite imagery and spectral derivatives thereof (NDMI, NDVI, and tasseled cap); elevation data and spatial derivatives thereof (slope, aspect, sine of aspect, cosine of aspect); EWMA (exponentially weighted moving average) data, provided by and generated by Oregon State University through an implementation of a harmonic regression-based algorithm (Brooks et al. 2012) built by Virginia Polytechnic University. The processes for creating the derived layers are described separately (see related Process Steps).
Step 3: Modeling was carried out using the random forests regression algorithm (R Core Team 2017; Breiman 2001) with the bias correction option as outlined in the Attribute Accuracy Report above.
Step 4: The models were applied to individual WRS-2 path/rows intersecting each mapping area, producing a 2-layered image. The first layer was the random forests regressions estimate of tree canopy cover and the second layer was the standard error, which is the per-pixel square root of the variance of the random forests regression estimates from the individual trees.
Step 5: Threshold values were determined for each of the mapping areas using data from 500 runs of the random forests regression algorithm on bootstrap samples. From these data, t-statistics were calculated. For each mapping area, the t-statistic at the 95th quantile was selected as the threshold value. Threshold values ranged from 0.50 to 2.80. For each pixel, the product of the t-statistic threshold value and the pixel standard error was compared to the pixel percent tree canopy value and if this product was greater than the pixel percent tree canopy, the percent tree canopy value for that pixel was set to zero; otherwise, the percent tree canopy of the pixel was left unchanged. A pixel was also set to zero if it fell within a NLCD 2011 Landcover class of 11 (open water) or 12 (perennial snow/ice) or was considered to be agriculture as defined by the cultivated layer (CL).
Step 6: Since models were applied to each mapping area independently, there were multiple estimates for pixels in overlapping areas. For these pixels, the estimate with the lowest standard error was carried into the CONUS-wide mosaic. Due to the use of the bias correction option in the random forests modeling, estimates could be outside the range of 0 to 100. These estimates were reset to either 0 or 100. Estimates were also rounded to the nearest integer.
Step 7: The NLCD-TCC production workflow included a step after the production of the “FS-Analytical” and “FS-Cartographic” TCC products in order to integrate three layers (2011 NLCD TCC, 2016 NLCD TCC, and a TCC change layer) into a common data stack. In this three-layer data stack, all pixels with valid TCC values (0 to 100%) in both years (2011 and 2016) meet the criterion of “Time 1 TCC + change = Time 2 TCC”. To satisfy that criterion, pixels were first identified as changed or not based on analysis of disturbance data from the USFS Forest Inventory and Analysis (FIA) program and standard error values in the “FS-Analytical” TCC product. For pixels identified as changed, the pixel value in the 2016 NLCD TCC layer was taken from the 2016 “FS-Cartographic” TCC product. For pixels in which confidence in actual change was low or non-existent, the pixel value in the 2016 NLCD TCC product was set to an average of the TCC values from the 2011 and 2016 “FS-Cartographic” TCC products. Spatial filtering was also applied to clean up noise and speckle. While the integrated data stack was achieved, minor visual artifacts (e.g., very small islands of “No Change” surrounded by pixels identified as change) may still be present within the tree canopy cover products included in the overall 2016 NLCD Product Suite.
Brand, G.J.; Nelson, M.D.; Wendt, D.G.; Nimerfro, K.K. 2000. The hexagon/panel system for selecting FIA plots under an annual inventory. In: McRoberts, R.E.; Reams, G.A.; Van Deusen, P.C., eds. Proceedings of the First Annual Forest Inventory and Analysis Symposium; Gen. Tech. Rep. NC-213. St. Paul, MN: U.S. Department of Agriculture, Forest Service, North Central Research Station: 8-13.
Breiman, L. 2001. Random forests. Machine Learning 45:15–32.
Brooks, E.B.; Thomas, V.A.; Wynne, R.H.; Coulston, J.W. 2012. Fitting the multitemporal curve: a fourier series approach to the missing data problem in remote sensing analysis. IEEE Transactions on Geoscience and Remote Sensing 50(9):3340-3353.
Coulston, J.W.; Moisen, G.G.; Wilson, B.T.; Finco, M.V.; Cohen, W.B.; Brewer, C.K. 2012. Modeling percent tree canopy cover: a pilot study. Photogrammetric Engineering & Remote Sensing 78(7): 715–727.
Homer, C.; Gallant, A. 2001. Partitioning the conterminous United States into mapping zones for Landsat TM land cover mapping, USGS Draft White Paper. landcover.usgs.gov/pdf/homer.pdf
R Core Team. 2017. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL www.R-project.org.
Ruefenacht, B. 2016. Comparison of three Landsat TM compositing methods: a case study using modeled tree canopy cover. Photogrammetric Engineering & Remote Sensing 82(3):199-211.