1. A permafrost site location inventory was produced by compiling information relating to permafrost occurrences from field observations collected by the Alberta Geological Survey, and data collected by the Geological Survey of Canada through the collaborative GSC/AGS Shallow Gas and Diamonds and the Targeted Geoscience Initative-2 projects. These observations are compiled along with information derived from published reports, environmental impact assessments, journal publications, and university theses.
The data represent point locations where the presence or absence of permafrost had been established using soil probes, augers, hand-dug soil pits, or shallow coring equipment. Other field observations, such as those based on geological sections and borrow pits, were excluded because it is possible that permafrost could have occurred in these locations, but was not observable.
The permafrost site location inventory also includes a small number of locations where permafrost had been mapped as polygon features. In these cases, a single random point location was selected within the polygon boundary.
2. The permafrost site location inventory was augmented by new mapping of permafrost-related landforms and features using 1 m resolution airborne LiDAR DEM data and high-resolution SPOT 6 satellite imagery and orthoimagery.
To ensure that mapping occurred systematically and was evenly distributed, 4000 sample tiles of 1 km2 were randomly generated across the model area. Excluding water features, 3996 sample tiles were used for the new mapping. Each sample tile was manually inspected for the presence or absence of permafrost using the previously described remote sensing datasets. The criteria used in this interpretation consisted of: a) peat plateaus and bogs containing palsas, usually showing collapse scars, were assumed to contain permafrost; b) veneer bogs containing collapse scars and being drained by runnels were also assumed to contain permafrost in the locations that are adjacent to the collapse scars; c) other bogs that either contain, or do not contain internal lawns, as well as other wetland and non-wetland land cover types, were assumed not to contain permafrost.
Within each sample tile, a stratified sample consisting of a single point per wetland type (permafrost, bog, fen, swamp, non-wetland, excluding open water) was manually collected. These points represent the "Mapped wetland types in sample tiles" features in the "Source" attribute of the tabular dataset. For sample tiles that did not contain permafrost features upon initial examination, a stratified random sample following the above scheme was collected based on a wetland map derived from the Alberta Ground Cover Classification Mosaic (Alberta Agriculture and Forestry). These points represent the "Stratified random points in sample tiles" features in the "Source" attribute of the tabular dataset.
3. LiDAR DEM data at 15 m resolution was used as the primary source of topographic information. The LiDAR DEM covered 85% of the model area. To complete the remaining area, topographic information was derived from the Advanced Land Observing Satellite (ALOS) Global Digital Surface Model (AW3D30) DSM at 30 m grid resolution (Japan Aerospace Exploration Agency, 2018). These DEMs were used to generate a suite of geomorphometric related measures using automated scripting in the R Statistical Computing Environment, and SAGA-GIS. These metrics included Topographic Openness, the SAGA Wetness Index, the Terrain Ruggedness Index, the Vector Roughness Measure, the Multiresolution Index of Valley Bottom Flatness, Terrain Texture, Profile Curvature, and Vertical Distance above Channel Networks.
4. Spectral information was derived from a Landsat-8 Google Earth Engine mosaic (Gorlick et al., 2017) for the model area. The mosaic used the Fmask procedure (Zhu et al., 2015) to produce a cloud-free Landsat best pixel composite from multiple Landsat 8 scenes dating from 2013 to 2014 and acquired during the summer months (June 1st to September 10th).
In addition, areas of recent forest fires and intense burning provide little spectral information in order to separate upland from wetland, and to distinguish permafrost features. These areas were classified using the Normalized Burn Ratio Index (NBRI) and were backfilled with earlier Google Earth Engine Landsat scenes by selecting the least-burnt pixels. Information relating to burn intensity was included in the model by using the NBRI from the most recent Landsat 8 2013-2014 composite.
Additional spectral indices, comprising the Normalized Difference Vegetation Index and the Normalized Difference Water Index were added to the stack of Landsat data. Wetlands containing permafrost also tend to exhibit more spectral variability than non-permafrost wetlands due to thermokarst. Therefore, measures of spatial heterogeneity were included as predictors by using the standard deviation of pixel values within a 5x5 circular neighbourhood derived from Landsat bands 4 (red), 5 (near infrared) and 7 (short-wave infrared 2).
5. Additional raster grids relating to average climatic conditions from 1961-1991 were used from down-scaled climate data at 1300 m resolution from the climateWNA dataset (Hamann et al., 2013). The bioclimatic variables consisting of beginning of the frost-free period (bFFP), number of degree days < 0C (DD0), end of the frost-free period (eFFP), duration of the frost-free period (FFP), mean annual precipitation (MAP), mean annual temperature (MAT), mean summer precipitation (MSP), number of frost-free days (NFFD), precipitation as snow (PAS), and average summer and winter temperatures (Tave_sm, Tave_wt) were used as predictors for permafrost occurrence.
6. All raster grids were resampled to a 15 m or 30 m resolution prior to the statistical modelling procedure using bilinear resampling.
7. Permafrost probability modelling was performed in Python programming language using the LightGBM gradient boosting algorithm. The permafrost inventory point features were used to inform the model about the geological, topographic, and climatic conditions that are associated with permafrost. The two sources of DEM data were modelled separated because of their different topographic characteristics, and then coverage gaps in the LiDAR derived predictions were patched with the predictions derived from the ALOS 30 m DEM. This process resulted in a prediction probability raster which shows the probability of membership to the ‘permafrost’ class. The classifier probabilities were calibrated using the isotonic regression method.
K-fold spatial cross-validation was used to assess model performance (k=10) with discrete spatial subgroups derived by grouping the training data locations based on the nearest sampling tile. This reduces/eliminates potential overestimation of model performance due to autocorrelation in the predictors at the training data locations.
8. To create the permafrost classification raster, the permafrost probability model was thresholded into a binary classification (permafrost present = 1) using a probability threshold of 0.5.
Model performance measures from spatial cross-validation:
| Accuracy | ROC AUC | Precision | Recall | F1 |
LiDAR Model | 96.0% | 97.4% | 82.1% | 70.0% | 75.3% |
ALOS Model | 93.3% | 95.0% | 72.4% | 56.0% | 63.0% |
Accuracy – the overall proportion of correctly classified permafrost locations.
ROC AUC – Area under the Receiver Operating Characteristic Curve: area under the receiver operating characteristic curve is based on the area under a curve formed by the ratio of true positives to false positives across all classification probability cutoff thresholds. AUC does not depend on the choice of cutoff value in which to assign the predicted probabilities to either a permafrost or non-permafrost class.
Precision – Positive Predictive Value: the ratio between the number of true positives, divided by the number of predicted positives (true positives + false positives). In other words, of all the locations predicted as containing permafrost, what fraction of them actually contain permafrost?
Recall – True Positive Rate: the ratio between the number of true positives, divided by the number of actual positives (true positives + false negatives). In other words, of all the locations that contain permafrost, what fraction were correctly classified as permafrost?
F1 – F1 Score: the harmonic mean of precision and recall.