OpenGeoHub Summer School 2020, Wageningen
Part I: Introduction to Earth observation (EO) data cubes and the gdalcubes
R package
Part II: Examples on using data cubes to combine imagery from different satellite-based EO missions
Part III: Summary, discussion, and practical exercises
All the material of this tutorial is online at GitHub, including R markdown sources, rendered HTML output, and solutions to the practical exercises.
Please notice that the data used in this part of the tutorial is not available online.
At first, we make sure to load the needed packages again.
library(gdalcubes)
library(magrittr)
library(colorspace)
In this example, we are interested in monitoring the vegetation with NDVI as target variable. To get as much information as possible, we will use observations from Sentinel-2, Landsat 8, and MODIS (product MOD09GA). These datasets have very different properties:
Dataset | Spatial Resolution (NDVI bands) | Temporal Resolution | File Format |
---|---|---|---|
Sentinel-2 Level 2A | 10m | 5 days | .jp2 |
Landsat 8 surface reflectance | 30m | 16 days | GeoTIFF |
MODIS MOD09GA | 500m | daily | HDF4 |
Our study area covers a region around Münster, Germany with mostly agricultural and some smaller forest areas.
We start with an overview of available data:
MOD.files = list.files("/media/marius/Samsung_T5/eodata/MOD09GA_MS", pattern = ".hdf",full.names = TRUE, recursive = TRUE)
head(MOD.files, n = 3)
## [1] "/media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.01/MOD09GA.A2019091.h18v03.006.2019093031830.hdf"
## [2] "/media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.02/MOD09GA.A2019092.h18v03.006.2019094031643.hdf"
## [3] "/media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.03/MOD09GA.A2019093.h18v03.006.2019095030640.hdf"
length(MOD.files)
## [1] 214
sum(file.size(MOD.files)) / 1024^3 # GiB
## [1] 25.506
L8.files = list.files("/media/marius/Samsung_T5/eodata/L8_MS2019", pattern = ".tif",full.names = TRUE, recursive = TRUE)
head(L8.files, n = 3)
## [1] "/media/marius/Samsung_T5/eodata/L8_MS2019/LC081960242019041001T1-SC20200716082607/LC08_L1TP_196024_20190410_20190422_01_T1_pixel_qa.tif"
## [2] "/media/marius/Samsung_T5/eodata/L8_MS2019/LC081960242019041001T1-SC20200716082607/LC08_L1TP_196024_20190410_20190422_01_T1_radsat_qa.tif"
## [3] "/media/marius/Samsung_T5/eodata/L8_MS2019/LC081960242019041001T1-SC20200716082607/LC08_L1TP_196024_20190410_20190422_01_T1_sr_aerosol.tif"
length(L8.files)
## [1] 270
sum(file.size(L8.files)) / 1024^3 # GiB
## [1] 41.80039
S2.files = list.files("/media/marius/Samsung_T5/eodata/S2L2A_MS", pattern = ".zip",full.names = TRUE, recursive = TRUE)
head(S2.files, n = 3)
## [1] "/media/marius/Samsung_T5/eodata/S2L2A_MS/S2A_MSIL2A_20190410T103021_N0211_R108_T31UGT_20190410T163924.zip"
## [2] "/media/marius/Samsung_T5/eodata/S2L2A_MS/S2A_MSIL2A_20190410T103021_N0211_R108_T32ULC_20190410T163924.zip"
## [3] "/media/marius/Samsung_T5/eodata/S2L2A_MS/S2A_MSIL2A_20190410T103021_N0211_R108_T32UMC_20190410T163924.zip"
length(S2.files)
## [1] 108
sum(file.size(S2.files)) / 1024^3 # GiB
## [1] 91.42826
Notice that the Sentinel-2 images come from three different tiles covering two UTM zones. As in the first part, we now need to create separate image collections, which requires a bit of research to find the correct collection formats (e.g., using collection_formats()
).
if (!file.exists("MOD09GA.db")) {
MOD.col = create_image_collection(MOD.files, "MxD09GA", "MOD09GA.db")
}
MOD.col = image_collection("MOD09GA.db")
MOD.col
## A GDAL image collection object, referencing 214 images with 9 bands
## Images:
## name
## 1 /media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.01/MOD09GA.A2019091.h18v03.006.2019093031830
## 2 /media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.02/MOD09GA.A2019092.h18v03.006.2019094031643
## 3 /media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.03/MOD09GA.A2019093.h18v03.006.2019095030640
## 4 /media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.04/MOD09GA.A2019094.h18v03.006.2019096033933
## 5 /media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.05/MOD09GA.A2019095.h18v03.006.2019097032840
## 6 /media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.06/MOD09GA.A2019096.h18v03.006.2019098143830
## left top bottom right datetime
## 1 0 60 50 20 2019-04-01T00:00:00
## 2 0 60 50 20 2019-04-02T00:00:00
## 3 0 60 50 20 2019-04-03T00:00:00
## 4 0 60 50 20 2019-04-04T00:00:00
## 5 0 60 50 20 2019-04-05T00:00:00
## 6 0 60 50 20 2019-04-06T00:00:00
## srs
## 1 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 2 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 3 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 4 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 5 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 6 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## [ omitted 208 images ]
##
## Bands:
## name offset scale unit nodata image_count
## 1 QC_500m 0 1 bit field 214
## 2 state_1km 0 1 bit field 214
## 3 sur_refl_b01 0 10000 reflectance -28672.000000 214
## 4 sur_refl_b02 0 10000 reflectance -28672.000000 214
## 5 sur_refl_b03 0 10000 reflectance -28672.000000 214
## 6 sur_refl_b04 0 10000 reflectance -28672.000000 214
## 7 sur_refl_b05 0 10000 reflectance -28672.000000 214
## 8 sur_refl_b06 0 10000 reflectance -28672.000000 214
## 9 sur_refl_b07 0 10000 reflectance -28672.000000 214
if (!file.exists("L8_MS.db")) {
L8.col = create_image_collection(L8.files, "L8_SR", "L8_MS.db")
}
L8.col = image_collection("L8_MS.db")
L8.col
## A GDAL image collection object, referencing 27 images with 10 bands
## Images:
## name left top bottom right
## 1 LC08_L1TP_196024_20190410_20190422_01_T1 6.369848 52.77484 50.57697 9.913380
## 2 LC08_L1GT_196024_20190426_20190508_01_T2 6.414248 52.77445 50.57787 9.962292
## 3 LC08_L1TP_196024_20190512_20190521_01_T1 6.409648 52.77722 50.57778 9.953458
## 4 LC08_L1TP_196024_20190528_20190605_01_T1 6.400928 52.77456 50.57760 9.948953
## 5 LC08_L1TP_196024_20190613_20190619_01_T1 6.400928 52.77456 50.57760 9.948953
## 6 LC08_L1TP_196024_20190629_20190706_01_T1 6.409808 52.77453 50.57509 9.953399
## datetime srs
## 1 2019-04-10T00:00:00 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## 2 2019-04-26T00:00:00 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## 3 2019-05-12T00:00:00 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## 4 2019-05-28T00:00:00 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## 5 2019-06-13T00:00:00 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## 6 2019-06-29T00:00:00 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## [ omitted 21 images ]
##
## Bands:
## name offset scale unit nodata image_count
## 1 AEROSOL 0 1 27
## 2 B01 0 1 -9999.000000 27
## 3 B02 0 1 -9999.000000 27
## 4 B03 0 1 -9999.000000 27
## 5 B04 0 1 -9999.000000 27
## 6 B05 0 1 -9999.000000 27
## 7 B06 0 1 -9999.000000 27
## 8 B07 0 1 -9999.000000 27
## 9 PIXEL_QA 0 1 27
## 10 RADSAT_QA 0 1 27
if (!file.exists("S2_MS.db")) {
S2.col = create_image_collection(S2.files, "Sentinel2_L2A", "S2_MS.db")
}
S2.col = image_collection("S2_MS.db")
S2.col
## A GDAL image collection object, referencing 108 images with 15 bands
## Images:
## name left
## 1 S2A_MSIL2A_20190410T103021_N0211_R108_T31UGT_20190410T163924 5.870216
## 2 S2A_MSIL2A_20190410T103021_N0211_R108_T32ULC_20190410T163924 6.065798
## 3 S2A_MSIL2A_20190410T103021_N0211_R108_T32UMC_20190410T163924 7.531529
## 4 S2A_MSIL2A_20190420T103031_N0211_R108_T31UGT_20190420T111631 5.870216
## 5 S2A_MSIL2A_20190420T103031_N0211_R108_T32ULC_20190420T111631 6.065798
## 6 S2A_MSIL2A_20190420T103031_N0211_R108_T32UMC_20190420T111631 7.531529
## top bottom right datetime
## 1 52.31404 51.27893 7.539980 2019-04-10T10:30:21
## 2 52.34305 51.32805 7.704578 2019-04-10T10:30:21
## 3 52.35039 51.35443 9.143291 2019-04-10T10:30:21
## 4 52.31404 51.27893 7.539980 2019-04-20T10:30:31
## 5 52.34305 51.32805 7.704578 2019-04-20T10:30:31
## 6 52.35039 51.35443 9.143291 2019-04-20T10:30:31
## srs
## 1 +proj=utm +zone=31 +datum=WGS84 +units=m +no_defs
## 2 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## 3 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## 4 +proj=utm +zone=31 +datum=WGS84 +units=m +no_defs
## 5 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## 6 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## [ omitted 102 images ]
##
## Bands:
## name offset scale unit nodata image_count
## 1 AOT 0 1e+00 108
## 2 B01 0 1e-04 0.000000 108
## 3 B02 0 1e-04 0.000000 108
## 4 B03 0 1e-04 0.000000 108
## 5 B04 0 1e-04 0.000000 108
## 6 B05 0 1e-04 0.000000 108
## 7 B06 0 1e-04 0.000000 108
## 8 B07 0 1e-04 0.000000 108
## 9 B08 0 1e-04 0.000000 108
## 10 B09 0 1e-04 0.000000 108
## 11 B11 0 1e-04 0.000000 108
## 12 B12 0 1e-04 0.000000 108
## 13 B8A 0 1e-04 0.000000 108
## 14 SCL 0 1e+00 108
## 15 WVP 0 1e+00 108
After carefully reading the product information from the data providers, we define separate masks to consider high-quality pixels only.
MOD.mask = image_mask("state_1km", values=0, bits = 0:1, invert = TRUE)
L8.mask = image_mask("PIXEL_QA", values=c(322, 386, 834, 898, 1346, 324, 388, 836, 900, 1348), invert = TRUE)
S2.mask = image_mask("SCL", values=c(1,3,8,9,10))
We first build three data cubes with separate cube views to get an initial overview of the collections.
MOD.v1 = cube_view(extent = MOD.col, dx = 5000, dt = "P1M", resampling = "average", srs = "EPSG:3857", aggregation = "median")
raster_cube(MOD.col, MOD.v1, mask = MOD.mask) %>%
select_bands(c("sur_refl_b01", "sur_refl_b04", "sur_refl_b03")) %>%
plot(rgb = 1:3, zlim=c(0, 2000), ncol = 4)
L8.v1 = cube_view(extent = L8.col, dx = 500, dt = "P1M", resampling = "average", srs = "EPSG:3857", aggregation = "median")
raster_cube(L8.col, L8.v1, mask = L8.mask) %>%
select_bands(c("B02", "B03", "B04")) %>%
plot(rgb = 3:1, zlim=c(0,2000), ncol=4)
S2.v1 = cube_view(extent = S2.col, dx = 500, dt = "P1M", resampling = "average", srs = "EPSG:3857", aggregation = "median")
raster_cube(S2.col, S2.v1, mask = S2.mask) %>%
select_bands(c("B02", "B03", "B04")) %>%
plot(rgb = 3:1, zlim=c(0,2000), ncol=4)
Now, we define our spatiotemporal area of interest, desired resolution, and build separate data cubes with NDVI measurements.
srs = "EPSG:32632"
aoi = list(left = 383695, right = 399749, bottom = 5751683, top = 5760444, t0="2019-04-01", t1 = "2019-10-31")
v.aoi = cube_view(srs = srs, extent = aoi, dx = 200, dt = "P1D", resampling = "average", aggregation = "median")
raster_cube(L8.col, v.aoi, mask = L8.mask) %>%
select_bands(c("B04", "B05")) %>%
apply_pixel("(B05-B04)/(B05+B04)", "NDVI") -> L8.cube
raster_cube(S2.col, v.aoi, mask = S2.mask) %>%
select_bands(c("B04", "B08")) %>%
apply_pixel("(B08-B04)/(B08+B04)", "NDVI") -> S2.cube
raster_cube(MOD.col, cube_view(v.aoi, resampling = "bilinear"), mask = MOD.mask) %>%
select_bands(c("sur_refl_b01", "sur_refl_b02")) %>%
apply_pixel("(sur_refl_b02 - sur_refl_b01)/(sur_refl_b02 + sur_refl_b01)", "NDVI") -> MOD.cube
Now, we can combine all data cubes into a single data cube with three variables using the join_bands()
function. The cube_names
argument here is used to add prefixes and ensure unique band names of the output data cube. As a first step, we count the number of observations of pixel time series from the different datasets.
combined.cube = join_bands(list(L8.cube, S2.cube, MOD.cube), cube_names = c("L8", "S2", "MOD"))
combined.cube
## A GDAL data cube proxy object
##
## Dimensions:
## low high count pixel_size chunk_size
## t 2019-04-01 2019-10-31 214 P1D 1
## y 5751663.5 5760463.5 44 200 256
## x 383622 399822 81 200 256
##
## Bands:
## name offset scale nodata unit
## 1 L8.NDVI 0 1 NaN
## 2 S2.NDVI 0 1 NaN
## 3 MOD.NDVI 0 1 NaN
combined.cube %>%
reduce_time("count(L8.NDVI)", "count(S2.NDVI)", "count(MOD.NDVI)") %>%
plot(key.pos=1, zlim=c(0, 80), col = viridis::viridis, nbreaks=19, ncol = 3)
Having all data available in one data cube for example allows to create a simple best available NDVI image time series.
ndvi.col = function(n) {
rev(sequential_hcl(n, "Green-Yellow"))
}
join_bands(list(L8.cube, S2.cube, MOD.cube), cube_names = c("L8", "S2", "MOD")) %>%
apply_time(names="NDVI_BEST", FUN = function(x) {
out = x["MOD.NDVI",]
L8.idx = which(!is.na(x["L8.NDVI",]))
out[L8.idx] = x["L8.NDVI",L8.idx]
S2.idx = which(!is.na(x["S2.NDVI",]))
out[S2.idx] = x["S2.NDVI",S2.idx]
out
}) %>%
plot(key.pos = 1, t = 150:169, col = ndvi.col)
This is of course a very optimistic approach, assuming that NDVI values of different sensors are comparable. Another (equally optimistic) approach, which completely fills the NDVI pixel time series, might be to use MODIS observations only to identify the temporal trend and then apply the trend to Sentinel-2 observations to fill gaps, as in the following.
join_bands(list(S2.cube, MOD.cube), cube_names = c("S2", "MOD")) %>%
apply_time(names="NDVI", FUN = function(x) {
library(zoo)
mod.ndvi = na.approx(x["MOD.NDVI",], na.rm = FALSE)
mod.ndvi.diff = c(0, diff(mod.ndvi))
out = rep(NA, ncol(x))
S2.idx = which(!is.na(x["S2.NDVI",]))
if (length(S2.idx) > 1) {
for (i in S2.idx[1]:ncol(x)){
if (!is.na(x["S2.NDVI",i])) {
out[i] = x["S2.NDVI",i]
}
else {
out[i] = out[i-1] + mod.ndvi.diff[i]
}
}
}
out
}) %>%
plot(key.pos = 1, t = 150:169, col = ndvi.col)
There are of course a lot of issues here, too. For example, the result contains NDVI values > 1. However, it is easy to think about improvements to this method (e.g., learning from days when observations from all sensors are available).
Instead of combining data from different sensors but the same target variable, we may use data cubes to analyze interactions or correlations between different variables. In the example below, we combine NDVI vegetation index observations from MODIS (MOD13A2), daily precipitation from the Global precipitation measurement mission (GPM_3IMERGDF, Huffman et al. (2019)), and daily soil moisture from ESA CCI ESA CCI (Gruber et al. 2019, @dorigo2017esa, @gruber2017triple).
As in the previous example, we start by creating image collections for all different datasets.
MOD13A2.files = list.files("/media/marius/Samsung_T5/eodata/MOD13A2", recursive = TRUE, full.names = TRUE, pattern=".hdf")
head(MOD13A2.files)
## [1] "/media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h17v02.006.2018226105916.hdf"
## [2] "/media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h17v03.006.2018226105932.hdf"
## [3] "/media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h17v04.006.2018226105905.hdf"
## [4] "/media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h17v05.006.2018226105912.hdf"
## [5] "/media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h18v02.006.2018226105911.hdf"
## [6] "/media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h18v03.006.2018226105918.hdf"
if (!file.exists("MOD13A2.db")) {
MOD13A2.collection = create_image_collection(MOD13A2.files, "MxD13A2", "MOD13A2.db")
}
MOD13A2.collection = image_collection("MOD13A2.db")
MOD13A2.collection
## A GDAL image collection object, referencing 1404 images with 7 bands
## Images:
## name
## 1 /media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h17v02.006.2018226105916
## 2 /media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h17v03.006.2018226105932
## 3 /media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h17v04.006.2018226105905
## 4 /media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h17v05.006.2018226105912
## 5 /media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h18v02.006.2018226105911
## 6 /media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h18v03.006.2018226105918
## left top bottom right datetime
## 1 -29.23804 70 60 0.00000 2013-12-19T00:00:00
## 2 -20.00000 60 50 0.00000 2013-12-19T00:00:00
## 3 -15.55724 50 40 0.00000 2013-12-19T00:00:00
## 4 -13.05407 40 30 0.00000 2013-12-19T00:00:00
## 5 0.00000 70 60 29.23804 2013-12-19T00:00:00
## 6 0.00000 60 50 20.00000 2013-12-19T00:00:00
## srs
## 1 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 2 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 3 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 4 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 5 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 6 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## [ omitted 1398 images ]
##
## Bands:
## name offset scale unit nodata image_count
## 1 BLUE 0 10000 reflectance -3000.000000 1404
## 2 EVI 0 10000 EVI -3000.000000 1404
## 3 MIR 0 10000 reflectance -3000.000000 1404
## 4 NDVI 0 10000 NDVI -3000.000000 1404
## 5 NIR 0 10000 reflectance -3000.000000 1404
## 6 RED 0 10000 reflectance -3000.000000 1404
## 7 VIQ 0 1 bit field -3000.000000 1404
GPM.files = list.files("/media/marius/Samsung_T5/eodata/GPM", recursive = TRUE, full.names = TRUE, pattern=".tif")
head(GPM.files)
## [1] "/media/marius/Samsung_T5/eodata/GPM/IMERG_3B_DAY_GIS_V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A.ice.accum.tif"
## [2] "/media/marius/Samsung_T5/eodata/GPM/IMERG_3B_DAY_GIS_V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A.ice.rate.tif"
## [3] "/media/marius/Samsung_T5/eodata/GPM/IMERG_3B_DAY_GIS_V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A.liquid.accum.tif"
## [4] "/media/marius/Samsung_T5/eodata/GPM/IMERG_3B_DAY_GIS_V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A.liquid.rate.tif"
## [5] "/media/marius/Samsung_T5/eodata/GPM/IMERG_3B_DAY_GIS_V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A.liquidPercent.tif"
## [6] "/media/marius/Samsung_T5/eodata/GPM/IMERG_3B_DAY_GIS_V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A.total.accum.tif"
if (!file.exists("GPM.db")) {
GPM.collection = create_image_collection(GPM.files, "GPM_IMERG_3B_DAY_GIS_V06A", "GPM.db")
}
GPM.collection = image_collection("GPM.db")
GPM.collection
## A GDAL image collection object, referencing 1462 images with 7 bands
## Images:
## name left top bottom right datetime
## 1 20150101 -180 90 -90 180 2015-01-01T00:00:00
## 2 20150102 -180 90 -90 180 2015-01-02T00:00:00
## 3 20150103 -180 90 -90 180 2015-01-03T00:00:00
## 4 20150104 -180 90 -90 180 2015-01-04T00:00:00
## 5 20150105 -180 90 -90 180 2015-01-05T00:00:00
## 6 20150106 -180 90 -90 180 2015-01-06T00:00:00
## srs
## 1 +proj=longlat +a=6378137 +rf=298.257232666016 +no_defs
## 2 +proj=longlat +a=6378137 +rf=298.257232666016 +no_defs
## 3 +proj=longlat +a=6378137 +rf=298.257232666016 +no_defs
## 4 +proj=longlat +a=6378137 +rf=298.257232666016 +no_defs
## 5 +proj=longlat +a=6378137 +rf=298.257232666016 +no_defs
## 6 +proj=longlat +a=6378137 +rf=298.257232666016 +no_defs
## [ omitted 1456 images ]
##
## Bands:
## name offset scale unit nodata image_count
## 1 ice_accum 0 1 29999.000000 1462
## 2 ice_rate 0 1 29999.000000 1462
## 3 liquid_accum 0 1 29999.000000 1462
## 4 liquid_percent 0 1 1462
## 5 liquid_rate 0 1 29999.000000 1462
## 6 total_accum 0 1 29999.000000 1462
## 7 total_rate 0 1 29999.000000 1462
SM.files = list.files("/media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE",
recursive = TRUE, full.names = TRUE, pattern=".*nc")
head(SM.files)
## [1] "/media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130101000000-fv04.7.nc"
## [2] "/media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130102000000-fv04.7.nc"
## [3] "/media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130103000000-fv04.7.nc"
## [4] "/media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130104000000-fv04.7.nc"
## [5] "/media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130105000000-fv04.7.nc"
## [6] "/media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130106000000-fv04.7.nc"
if (!file.exists("SM.db")) {
SM.collection = create_image_collection(SM.files, "ESA_CCI_SM_PASSIVE.json", "SM.db")
}
SM.collection = image_collection("SM.db")
SM.collection
## A GDAL image collection object, referencing 2556 images with 8 bands
## Images:
## name
## 1 /media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130101000000-fv04.7
## 2 /media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130102000000-fv04.7
## 3 /media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130103000000-fv04.7
## 4 /media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130104000000-fv04.7
## 5 /media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130105000000-fv04.7
## 6 /media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130106000000-fv04.7
## left top bottom right datetime srs
## 1 -180 90 -90 180 2013-01-01T00:00:00 +proj=longlat +datum=WGS84 +no_defs
## 2 -180 90 -90 180 2013-01-02T00:00:00 +proj=longlat +datum=WGS84 +no_defs
## 3 -180 90 -90 180 2013-01-03T00:00:00 +proj=longlat +datum=WGS84 +no_defs
## 4 -180 90 -90 180 2013-01-04T00:00:00 +proj=longlat +datum=WGS84 +no_defs
## 5 -180 90 -90 180 2013-01-05T00:00:00 +proj=longlat +datum=WGS84 +no_defs
## 6 -180 90 -90 180 2013-01-06T00:00:00 +proj=longlat +datum=WGS84 +no_defs
## [ omitted 2550 images ]
##
## Bands:
## name offset scale unit nodata
## 1 dnflag 0 1 0.000000
## 2 flag 0 1 127.000000
## 3 freqbandID 0 1 0.000000
## 4 mode 0 1 0.000000
## 5 sensor 0 1 0.000000
## 6 sm 0 1 m3 m-3 -9999.000000
## 7 sm_uncertainty 0 1 m3 m-3 -9999.000000
## 8 t0 0 1 days since 1970-01-01 00:00:00 UTC -9999.000000
## image_count
## 1 2556
## 2 2556
## 3 2556
## 4 2556
## 5 2556
## 6 2556
## 7 2556
## 8 2556
We now define our area of interest, select the variables of interest, and plot separate data cubes (using the same data cube view) for one year to get an initial overview.
view_de = cube_view(srs = "EPSG:3035", extent = list(left=4039313, bottom=2775429,right=4670348, top=3482115, t0 = "2018-01", t1="2018-12"), dx = 5000, dt = "P1M", aggregation = "mean", resampling = "bilinear")
ndvi.col = function(n) {
rev(sequential_hcl(n, "Green-Yellow"))
}
prcp.col = function(n) {
rev(sequential_hcl(n, "Purple-Blue"))
}
raster_cube(MOD13A2.collection, view_de) %>%
select_bands("NDVI") %>%
plot(key.pos = 1, col = ndvi.col)
raster_cube(GPM.collection, view_de) %>%
select_bands("total_accum") %>%
plot(key.pos = 1, col = prcp.col, zlim = c(0, 100), nbreaks = 100)
raster_cube(SM.collection, view_de) %>%
select_bands("sm") %>%
plot(key.pos = 1, zlim = c(0,1), col = prcp.col, nbreaks=100)
Next, we preprocess our data cubes separately as described below:
For the NDVI cube, we decompose the pixel time series by removing seasonal and trend components using the R function stl()
. We refer to the remainder as anomalies. This can be done with the apply_time()
function, which expects a user-defined R function over pixel time series but does no reduction (compared to reduce_time()
. In the provided function, we first convert time series to ts
objects, fill NA values by linear interpolation using zoo::na_approx()
and afterwards use the stl()
function to decompose the time series.
For the soil moisture cube, we remove seasonal and trend components but compute a rolling mean of the last two months before.
We calculate the monthly standard precipitation index from the precipitation cube using the R function spi()
from package precintcon
.
Notice that to keep this example simple and computation times in the order of minutes, we still aggregate our observations monthly and limit our analysis to four years.
view_de = cube_view(view_de, extent = list(t0 = "2015-01", t1 = "2018-12"))
raster_cube(MOD13A2.collection, view_de) %>%
select_bands("NDVI") %>%
apply_time(name = "NDVI_anomaly", FUN = function(x) {
library(zoo)
y = na.approx(x["NDVI",], na.rm = FALSE)
if (any(is.na(y))) return(rep(NA, ncol(x)))
y = ts(y, frequency = 12, start = c(2015, 1))
y_decomposed = stl(y, "periodic")
res = as.vector(y_decomposed$time.series[,"remainder"])
return(res)
}) -> ndvi.anom
raster_cube(GPM.collection, view_de) %>%
select_bands("total_accum") %>%
apply_time(names = "spi", FUN = function(x) {
library(precintcon)
year = as.numeric(substr(colnames(x), 1, 4))
month = as.numeric(substr(colnames(x), 5, 6))
y = data.frame(year = year, months = month, precipitation = x["total_accum",])
class(y) <- c("data.frame", "precintcon.monthly")
if (any(is.na(y$precipitation))) return(rep(1, ncol(x)))
spi_monthly = spi(y, period = 1)$spi
return(spi_monthly)
}) -> prcp.spi
raster_cube(SM.collection, view_de) %>%
select_bands("sm") %>%
apply_time(names = "sm_anomaly", FUN = function(x) {
library(zoo)
y = na.approx(x["sm",], na.rm = FALSE)
if (any(is.na(y))) return(rep(NA, ncol(x)))
y = rollmean(y, 2, fill = NA, align = "right")
y = ts(y[-1], frequency = 12, start = c(2015, 2))
y_decomposed = stl(y, "periodic")
res = c(0, as.vector(y_decomposed$time.series[,"remainder"]))
return(res)
}) -> sm.anom
To demonstrate how we can combine the three data cubes anyway, we apply a simple statistical test for zero correlation (Kendall’s \(\tau\)) on \((NDVI_t, SPI_{t-0..2})\) and \((NDVI_t, SM_{t-0..2})\) for each time series below.
out_bandnames = c("NDVI_SPI_LAG_0_tau", "NDVI_SPI_LAG_0_p",
"NDVI_SPI_LAG_1_tau", "NDVI_SPI_LAG_1_p",
"NDVI_SPI_LAG_2_tau", "NDVI_SPI_LAG_2_p",
"NDVI_SM_LAG_0_tau", "NDVI_SM_LAG_0_p",
"NDVI_SM_LAG_1_tau", "NDVI_SM_LAG_1_p",
"NDVI_SM_LAG_2_tau", "NDVI_SM_LAG_2_p")
join_bands(list(ndvi.anom, prcp.spi, sm.anom), cube_names = c("X1","X2","X3")) %>%
reduce_time(FUN = function(x) {
v = data.frame(ndvi = x["X1.NDVI_anomaly", ], spi = x["X2.spi", ], sm = x["X3.sm_anomaly", ])
# use only April to October
#month = as.numeric(substr(colnames(x), 5,6))
#v = v[which(month >= 4 & month <= 10),]
if (sum(complete.cases(v)) < 6) return(rep(NA, 12))
n = nrow(v)
res = rep(NA, 12)
ct = cor.test(v$ndvi, v$spi, use = "complete.obs", method = "kendall")
res[c(1,2)] = c(ct$estimate, ct$p.value)
ct = cor.test(v$ndvi[2:n], v$spi[1:(n-1)], use = "complete.obs", method = "kendall")
res[c(3,4)] = c(ct$estimate, ct$p.value)
ct = cor.test(v$ndvi[3:n], v$spi[1:(n-2)], use = "complete.obs", method = "kendall")
res[c(5,6)] = c(ct$estimate, ct$p.value)
ct = cor.test(v$ndvi, v$sm, use = "complete.obs", method = "kendall")
res[c(7,8)] = c(ct$estimate, ct$p.value)
ct = cor.test(v$ndvi[2:n], v$sm[1:(n-1)], use = "complete.obs", method = "kendall")
res[c(9,10)] = c(ct$estimate, ct$p.value)
ct = cor.test(v$ndvi[3:n], v$sm[1:(n-2)], use = "complete.obs", method = "kendall", na.action = na.exclude)
res[c(11,12)] = c(ct$estimate, ct$p.value)
return(res)
}, names = out_bandnames) -> result.cube
result.cube
## A GDAL data cube proxy object
##
## Dimensions:
## low high count pixel_size chunk_size
## t 2015-01 2015-01 1 P48M 1
## y 2773772 3483772 142 5000 256
## x 4037330.5 4672330.5 127 5000 256
##
## Bands:
## name offset scale nodata unit
## 1 NDVI_SPI_LAG_0_tau 0 1 NaN
## 2 NDVI_SPI_LAG_0_p 0 1 NaN
## 3 NDVI_SPI_LAG_1_tau 0 1 NaN
## 4 NDVI_SPI_LAG_1_p 0 1 NaN
## 5 NDVI_SPI_LAG_2_tau 0 1 NaN
## 6 NDVI_SPI_LAG_2_p 0 1 NaN
## 7 NDVI_SM_LAG_0_tau 0 1 NaN
## 8 NDVI_SM_LAG_0_p 0 1 NaN
## 9 NDVI_SM_LAG_1_tau 0 1 NaN
## 10 NDVI_SM_LAG_1_p 0 1 NaN
## 11 NDVI_SM_LAG_2_tau 0 1 NaN
## 12 NDVI_SM_LAG_2_p 0 1 NaN
plot(result.cube, key.pos = 1, zlim=c(-1,1), col = viridis::viridis, ncol = 4, nbreaks=50)
Is this meaningful? Probably not, but it shows how complex calculations can be applied on data cubes in R. In applications, one would of course use much longer time series to derive anomalies and furthermore, the interactions of our variables are much more complex in reality (and may need more information on soil, vegetation type, elevation, and similar). The result contains \(p\) values and \(\tau\) estimates for both pairs of variables at temporal lags 0, 1, and 2.
Dorigo, W., Wagner, W., Albergel, C., Albrecht, F., Balsamo, G., Brocca, L., Chung, D., Ertl, M., Forkel, M., Gruber, A., and others (2017), “ESA cci soil moisture for improved earth system understanding: State-of-the art and future directions,” Remote Sensing of Environment, Elsevier, 203, 185–215.
Gruber, A., Dorigo, W. A., Crow, W., and Wagner, W. (2017), “Triple collocation-based merging of satellite soil moisture retrievals,” IEEE Transactions on Geoscience and Remote Sensing, IEEE, 55, 6780–6792.
Gruber, A., Scanlon, T., Schalie, R. van der, Wagner, W., and Dorigo, W. (2019), “Evolution of the esa cci soil moisture climate data records and their underlying merging methodology,” Earth System Science Data, Copernicus Publications, 1–37.
Huffman, G., Stocker, E., Bolvin, D., Nelkin, E., and Jackson, T. (2019), “GPM imerg final precipitation l3 half hourly 0.1 degree x 0.1 degree v06, greenbelt, md, goddard earth sciences data and information services center (ges disc).”