OpenGeoHub Summer School 2020, Wageningen

Outline

  • Part I: Introduction to Earth observation (EO) data cubes and the gdalcubes R package

  • Part II: Examples on using data cubes to combine imagery from different satellite-based EO missions

  • Part III: Summary, discussion, and practical exercises

All the material of this tutorial is online at GitHub, including R markdown sources, rendered HTML output, and solutions to the practical exercises.

Part II: Mulit-variable data cubes from different EO products

Please notice that the data used in this part of the tutorial is not available online.

First steps

At first, we make sure to load the needed packages again.

library(gdalcubes)
library(magrittr)
library(colorspace)

Example 1: Building a combined NDVI data cube from Sentinel-2, Landsat 8, and MODIS data

In this example, we are interested in monitoring the vegetation with NDVI as target variable. To get as much information as possible, we will use observations from Sentinel-2, Landsat 8, and MODIS (product MOD09GA). These datasets have very different properties:

Dataset Spatial Resolution (NDVI bands) Temporal Resolution File Format
Sentinel-2 Level 2A 10m 5 days .jp2
Landsat 8 surface reflectance 30m 16 days GeoTIFF
MODIS MOD09GA 500m daily HDF4

Our study area covers a region around Münster, Germany with mostly agricultural and some smaller forest areas.

We start with an overview of available data:

MOD.files = list.files("/media/marius/Samsung_T5/eodata/MOD09GA_MS", pattern = ".hdf",full.names = TRUE, recursive = TRUE)
head(MOD.files, n = 3)
## [1] "/media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.01/MOD09GA.A2019091.h18v03.006.2019093031830.hdf"
## [2] "/media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.02/MOD09GA.A2019092.h18v03.006.2019094031643.hdf"
## [3] "/media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.03/MOD09GA.A2019093.h18v03.006.2019095030640.hdf"
length(MOD.files)
## [1] 214
sum(file.size(MOD.files)) / 1024^3 # GiB
## [1] 25.506
L8.files = list.files("/media/marius/Samsung_T5/eodata/L8_MS2019", pattern = ".tif",full.names = TRUE, recursive = TRUE)
head(L8.files, n = 3)
## [1] "/media/marius/Samsung_T5/eodata/L8_MS2019/LC081960242019041001T1-SC20200716082607/LC08_L1TP_196024_20190410_20190422_01_T1_pixel_qa.tif"  
## [2] "/media/marius/Samsung_T5/eodata/L8_MS2019/LC081960242019041001T1-SC20200716082607/LC08_L1TP_196024_20190410_20190422_01_T1_radsat_qa.tif" 
## [3] "/media/marius/Samsung_T5/eodata/L8_MS2019/LC081960242019041001T1-SC20200716082607/LC08_L1TP_196024_20190410_20190422_01_T1_sr_aerosol.tif"
length(L8.files)
## [1] 270
sum(file.size(L8.files)) / 1024^3 # GiB
## [1] 41.80039
S2.files = list.files("/media/marius/Samsung_T5/eodata/S2L2A_MS", pattern = ".zip",full.names = TRUE, recursive = TRUE)
head(S2.files, n = 3)
## [1] "/media/marius/Samsung_T5/eodata/S2L2A_MS/S2A_MSIL2A_20190410T103021_N0211_R108_T31UGT_20190410T163924.zip"
## [2] "/media/marius/Samsung_T5/eodata/S2L2A_MS/S2A_MSIL2A_20190410T103021_N0211_R108_T32ULC_20190410T163924.zip"
## [3] "/media/marius/Samsung_T5/eodata/S2L2A_MS/S2A_MSIL2A_20190410T103021_N0211_R108_T32UMC_20190410T163924.zip"
length(S2.files)
## [1] 108
sum(file.size(S2.files)) / 1024^3 # GiB
## [1] 91.42826

Notice that the Sentinel-2 images come from three different tiles covering two UTM zones. As in the first part, we now need to create separate image collections, which requires a bit of research to find the correct collection formats (e.g., using collection_formats()).

if (!file.exists("MOD09GA.db")) {
  MOD.col = create_image_collection(MOD.files, "MxD09GA", "MOD09GA.db")
}
MOD.col = image_collection("MOD09GA.db")
MOD.col
## A GDAL image collection object, referencing 214 images with 9  bands
## Images:
##                                                                                              name
## 1 /media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.01/MOD09GA.A2019091.h18v03.006.2019093031830
## 2 /media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.02/MOD09GA.A2019092.h18v03.006.2019094031643
## 3 /media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.03/MOD09GA.A2019093.h18v03.006.2019095030640
## 4 /media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.04/MOD09GA.A2019094.h18v03.006.2019096033933
## 5 /media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.05/MOD09GA.A2019095.h18v03.006.2019097032840
## 6 /media/marius/Samsung_T5/eodata/MOD09GA_MS/2019.04.06/MOD09GA.A2019096.h18v03.006.2019098143830
##   left top bottom right            datetime
## 1    0  60     50    20 2019-04-01T00:00:00
## 2    0  60     50    20 2019-04-02T00:00:00
## 3    0  60     50    20 2019-04-03T00:00:00
## 4    0  60     50    20 2019-04-04T00:00:00
## 5    0  60     50    20 2019-04-05T00:00:00
## 6    0  60     50    20 2019-04-06T00:00:00
##                                                                  srs
## 1 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 2 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 3 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 4 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 5 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 6 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## [ omitted 208 images ] 
## 
## Bands:
##           name offset scale        unit        nodata image_count
## 1      QC_500m      0     1   bit field                       214
## 2    state_1km      0     1   bit field                       214
## 3 sur_refl_b01      0 10000 reflectance -28672.000000         214
## 4 sur_refl_b02      0 10000 reflectance -28672.000000         214
## 5 sur_refl_b03      0 10000 reflectance -28672.000000         214
## 6 sur_refl_b04      0 10000 reflectance -28672.000000         214
## 7 sur_refl_b05      0 10000 reflectance -28672.000000         214
## 8 sur_refl_b06      0 10000 reflectance -28672.000000         214
## 9 sur_refl_b07      0 10000 reflectance -28672.000000         214
if (!file.exists("L8_MS.db")) {
  L8.col = create_image_collection(L8.files, "L8_SR", "L8_MS.db")
}
L8.col = image_collection("L8_MS.db")
L8.col
## A GDAL image collection object, referencing 27 images with 10  bands
## Images:
##                                       name     left      top   bottom    right
## 1 LC08_L1TP_196024_20190410_20190422_01_T1 6.369848 52.77484 50.57697 9.913380
## 2 LC08_L1GT_196024_20190426_20190508_01_T2 6.414248 52.77445 50.57787 9.962292
## 3 LC08_L1TP_196024_20190512_20190521_01_T1 6.409648 52.77722 50.57778 9.953458
## 4 LC08_L1TP_196024_20190528_20190605_01_T1 6.400928 52.77456 50.57760 9.948953
## 5 LC08_L1TP_196024_20190613_20190619_01_T1 6.400928 52.77456 50.57760 9.948953
## 6 LC08_L1TP_196024_20190629_20190706_01_T1 6.409808 52.77453 50.57509 9.953399
##              datetime                                               srs
## 1 2019-04-10T00:00:00 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## 2 2019-04-26T00:00:00 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## 3 2019-05-12T00:00:00 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## 4 2019-05-28T00:00:00 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## 5 2019-06-13T00:00:00 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## 6 2019-06-29T00:00:00 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## [ omitted 21 images ] 
## 
## Bands:
##         name offset scale unit       nodata image_count
## 1    AEROSOL      0     1                            27
## 2        B01      0     1      -9999.000000          27
## 3        B02      0     1      -9999.000000          27
## 4        B03      0     1      -9999.000000          27
## 5        B04      0     1      -9999.000000          27
## 6        B05      0     1      -9999.000000          27
## 7        B06      0     1      -9999.000000          27
## 8        B07      0     1      -9999.000000          27
## 9   PIXEL_QA      0     1                            27
## 10 RADSAT_QA      0     1                            27
if (!file.exists("S2_MS.db")) {
  S2.col = create_image_collection(S2.files, "Sentinel2_L2A", "S2_MS.db")
}
S2.col = image_collection("S2_MS.db")
S2.col
## A GDAL image collection object, referencing 108 images with 15  bands
## Images:
##                                                           name     left
## 1 S2A_MSIL2A_20190410T103021_N0211_R108_T31UGT_20190410T163924 5.870216
## 2 S2A_MSIL2A_20190410T103021_N0211_R108_T32ULC_20190410T163924 6.065798
## 3 S2A_MSIL2A_20190410T103021_N0211_R108_T32UMC_20190410T163924 7.531529
## 4 S2A_MSIL2A_20190420T103031_N0211_R108_T31UGT_20190420T111631 5.870216
## 5 S2A_MSIL2A_20190420T103031_N0211_R108_T32ULC_20190420T111631 6.065798
## 6 S2A_MSIL2A_20190420T103031_N0211_R108_T32UMC_20190420T111631 7.531529
##        top   bottom    right            datetime
## 1 52.31404 51.27893 7.539980 2019-04-10T10:30:21
## 2 52.34305 51.32805 7.704578 2019-04-10T10:30:21
## 3 52.35039 51.35443 9.143291 2019-04-10T10:30:21
## 4 52.31404 51.27893 7.539980 2019-04-20T10:30:31
## 5 52.34305 51.32805 7.704578 2019-04-20T10:30:31
## 6 52.35039 51.35443 9.143291 2019-04-20T10:30:31
##                                                 srs
## 1 +proj=utm +zone=31 +datum=WGS84 +units=m +no_defs
## 2 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## 3 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## 4 +proj=utm +zone=31 +datum=WGS84 +units=m +no_defs
## 5 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## 6 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs
## [ omitted 102 images ] 
## 
## Bands:
##    name offset scale unit   nodata image_count
## 1   AOT      0 1e+00                       108
## 2   B01      0 1e-04      0.000000         108
## 3   B02      0 1e-04      0.000000         108
## 4   B03      0 1e-04      0.000000         108
## 5   B04      0 1e-04      0.000000         108
## 6   B05      0 1e-04      0.000000         108
## 7   B06      0 1e-04      0.000000         108
## 8   B07      0 1e-04      0.000000         108
## 9   B08      0 1e-04      0.000000         108
## 10  B09      0 1e-04      0.000000         108
## 11  B11      0 1e-04      0.000000         108
## 12  B12      0 1e-04      0.000000         108
## 13  B8A      0 1e-04      0.000000         108
## 14  SCL      0 1e+00                       108
## 15  WVP      0 1e+00                       108

After carefully reading the product information from the data providers, we define separate masks to consider high-quality pixels only.

MOD.mask = image_mask("state_1km", values=0, bits = 0:1, invert  = TRUE)
L8.mask =  image_mask("PIXEL_QA", values=c(322, 386, 834, 898, 1346, 324, 388, 836, 900, 1348), invert = TRUE)
S2.mask = image_mask("SCL", values=c(1,3,8,9,10))

We first build three data cubes with separate cube views to get an initial overview of the collections.

MOD.v1 = cube_view(extent = MOD.col, dx = 5000, dt = "P1M", resampling = "average", srs = "EPSG:3857", aggregation = "median")
raster_cube(MOD.col, MOD.v1, mask = MOD.mask) %>%
  select_bands(c("sur_refl_b01", "sur_refl_b04", "sur_refl_b03")) %>%
  plot(rgb = 1:3, zlim=c(0, 2000), ncol = 4)

L8.v1 = cube_view(extent = L8.col, dx = 500, dt = "P1M", resampling = "average", srs = "EPSG:3857", aggregation = "median")
raster_cube(L8.col, L8.v1, mask = L8.mask) %>%
  select_bands(c("B02", "B03", "B04")) %>%
  plot(rgb = 3:1, zlim=c(0,2000), ncol=4)

S2.v1 = cube_view(extent = S2.col, dx = 500, dt = "P1M", resampling = "average", srs = "EPSG:3857", aggregation = "median")
raster_cube(S2.col, S2.v1, mask = S2.mask) %>%
  select_bands(c("B02", "B03", "B04")) %>%
  plot(rgb = 3:1, zlim=c(0,2000), ncol=4)

Now, we define our spatiotemporal area of interest, desired resolution, and build separate data cubes with NDVI measurements.

srs = "EPSG:32632"
aoi = list(left = 383695, right = 399749, bottom = 5751683, top = 5760444, t0="2019-04-01", t1 = "2019-10-31")
v.aoi = cube_view(srs = srs, extent = aoi, dx = 200, dt = "P1D", resampling = "average", aggregation = "median")

raster_cube(L8.col, v.aoi, mask = L8.mask) %>%
  select_bands(c("B04", "B05")) %>%
  apply_pixel("(B05-B04)/(B05+B04)", "NDVI")  -> L8.cube

raster_cube(S2.col, v.aoi, mask = S2.mask) %>%
  select_bands(c("B04", "B08")) %>%
  apply_pixel("(B08-B04)/(B08+B04)", "NDVI") -> S2.cube

raster_cube(MOD.col, cube_view(v.aoi, resampling = "bilinear"), mask = MOD.mask) %>%
  select_bands(c("sur_refl_b01", "sur_refl_b02")) %>%
  apply_pixel("(sur_refl_b02 - sur_refl_b01)/(sur_refl_b02 + sur_refl_b01)", "NDVI") -> MOD.cube

Now, we can combine all data cubes into a single data cube with three variables using the join_bands() function. The cube_names argument here is used to add prefixes and ensure unique band names of the output data cube. As a first step, we count the number of observations of pixel time series from the different datasets.

combined.cube = join_bands(list(L8.cube, S2.cube, MOD.cube), cube_names = c("L8", "S2", "MOD")) 
combined.cube
## A GDAL data cube proxy object
## 
## Dimensions:
##          low       high count pixel_size chunk_size
## t 2019-04-01 2019-10-31   214        P1D          1
## y  5751663.5  5760463.5    44        200        256
## x     383622     399822    81        200        256
## 
## Bands:
##       name offset scale nodata unit
## 1  L8.NDVI      0     1    NaN     
## 2  S2.NDVI      0     1    NaN     
## 3 MOD.NDVI      0     1    NaN
combined.cube %>%
  reduce_time("count(L8.NDVI)", "count(S2.NDVI)", "count(MOD.NDVI)") %>%
  plot(key.pos=1, zlim=c(0, 80), col = viridis::viridis, nbreaks=19, ncol = 3)

Having all data available in one data cube for example allows to create a simple best available NDVI image time series.

ndvi.col = function(n) {
  rev(sequential_hcl(n, "Green-Yellow"))
}
join_bands(list(L8.cube, S2.cube, MOD.cube), cube_names = c("L8", "S2", "MOD")) %>%
  apply_time(names="NDVI_BEST", FUN = function(x) {
    out = x["MOD.NDVI",]
    L8.idx = which(!is.na(x["L8.NDVI",]))
    out[L8.idx] = x["L8.NDVI",L8.idx]
    S2.idx = which(!is.na(x["S2.NDVI",]))
    out[S2.idx] = x["S2.NDVI",S2.idx]
    out
  }) %>%
  plot(key.pos = 1, t = 150:169, col = ndvi.col)

This is of course a very optimistic approach, assuming that NDVI values of different sensors are comparable. Another (equally optimistic) approach, which completely fills the NDVI pixel time series, might be to use MODIS observations only to identify the temporal trend and then apply the trend to Sentinel-2 observations to fill gaps, as in the following.

join_bands(list(S2.cube, MOD.cube), cube_names = c("S2", "MOD")) %>%
  apply_time(names="NDVI", FUN = function(x) {
    library(zoo)
    mod.ndvi = na.approx(x["MOD.NDVI",], na.rm = FALSE)
    mod.ndvi.diff = c(0, diff(mod.ndvi))
    
    out = rep(NA, ncol(x))
    
    S2.idx = which(!is.na(x["S2.NDVI",]))
    if (length(S2.idx) > 1) {
      for (i in S2.idx[1]:ncol(x)){
        if (!is.na(x["S2.NDVI",i])) {
         out[i] = x["S2.NDVI",i]
        }
        else {
          out[i] =  out[i-1] + mod.ndvi.diff[i]
        }
      }
    }
    out
  }) %>%
  plot(key.pos = 1, t = 150:169, col = ndvi.col)

There are of course a lot of issues here, too. For example, the result contains NDVI values > 1. However, it is easy to think about improvements to this method (e.g., learning from days when observations from all sensors are available).

Example 2: Joint analysis of satellite derived vegetation, precipitation, and soil moisture observations

Instead of combining data from different sensors but the same target variable, we may use data cubes to analyze interactions or correlations between different variables. In the example below, we combine NDVI vegetation index observations from MODIS (MOD13A2), daily precipitation from the Global precipitation measurement mission (GPM_3IMERGDF, Huffman et al. (2019)), and daily soil moisture from ESA CCI ESA CCI (Gruber et al. 2019, @dorigo2017esa, @gruber2017triple).

As in the previous example, we start by creating image collections for all different datasets.

MOD13A2.files = list.files("/media/marius/Samsung_T5/eodata/MOD13A2", recursive = TRUE, full.names = TRUE, pattern=".hdf")
head(MOD13A2.files)
## [1] "/media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h17v02.006.2018226105916.hdf"
## [2] "/media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h17v03.006.2018226105932.hdf"
## [3] "/media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h17v04.006.2018226105905.hdf"
## [4] "/media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h17v05.006.2018226105912.hdf"
## [5] "/media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h18v02.006.2018226105911.hdf"
## [6] "/media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h18v03.006.2018226105918.hdf"
if (!file.exists("MOD13A2.db")) {
  MOD13A2.collection = create_image_collection(MOD13A2.files, "MxD13A2", "MOD13A2.db")
}
MOD13A2.collection = image_collection("MOD13A2.db")
MOD13A2.collection
## A GDAL image collection object, referencing 1404 images with 7  bands
## Images:
##                                                                                                       name
## 1 /media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h17v02.006.2018226105916
## 2 /media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h17v03.006.2018226105932
## 3 /media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h17v04.006.2018226105905
## 4 /media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h17v05.006.2018226105912
## 5 /media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h18v02.006.2018226105911
## 6 /media/marius/Samsung_T5/eodata/MOD13A2/MOD13A2.006/2013.12.19/MOD13A2.A2013353.h18v03.006.2018226105918
##        left top bottom    right            datetime
## 1 -29.23804  70     60  0.00000 2013-12-19T00:00:00
## 2 -20.00000  60     50  0.00000 2013-12-19T00:00:00
## 3 -15.55724  50     40  0.00000 2013-12-19T00:00:00
## 4 -13.05407  40     30  0.00000 2013-12-19T00:00:00
## 5   0.00000  70     60 29.23804 2013-12-19T00:00:00
## 6   0.00000  60     50 20.00000 2013-12-19T00:00:00
##                                                                  srs
## 1 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 2 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 3 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 4 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 5 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## 6 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs
## [ omitted 1398 images ] 
## 
## Bands:
##   name offset scale        unit       nodata image_count
## 1 BLUE      0 10000 reflectance -3000.000000        1404
## 2  EVI      0 10000         EVI -3000.000000        1404
## 3  MIR      0 10000 reflectance -3000.000000        1404
## 4 NDVI      0 10000        NDVI -3000.000000        1404
## 5  NIR      0 10000 reflectance -3000.000000        1404
## 6  RED      0 10000 reflectance -3000.000000        1404
## 7  VIQ      0     1   bit field -3000.000000        1404
GPM.files = list.files("/media/marius/Samsung_T5/eodata/GPM", recursive = TRUE, full.names = TRUE, pattern=".tif")
head(GPM.files)
## [1] "/media/marius/Samsung_T5/eodata/GPM/IMERG_3B_DAY_GIS_V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A.ice.accum.tif"    
## [2] "/media/marius/Samsung_T5/eodata/GPM/IMERG_3B_DAY_GIS_V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A.ice.rate.tif"     
## [3] "/media/marius/Samsung_T5/eodata/GPM/IMERG_3B_DAY_GIS_V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A.liquid.accum.tif" 
## [4] "/media/marius/Samsung_T5/eodata/GPM/IMERG_3B_DAY_GIS_V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A.liquid.rate.tif"  
## [5] "/media/marius/Samsung_T5/eodata/GPM/IMERG_3B_DAY_GIS_V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A.liquidPercent.tif"
## [6] "/media/marius/Samsung_T5/eodata/GPM/IMERG_3B_DAY_GIS_V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A/3B-DAY-GIS.MS.MRG.3IMERG.20150101-S000000-E235959.0000.V06A.total.accum.tif"
if (!file.exists("GPM.db")) {
  GPM.collection = create_image_collection(GPM.files, "GPM_IMERG_3B_DAY_GIS_V06A", "GPM.db")
}
GPM.collection = image_collection("GPM.db")
GPM.collection
## A GDAL image collection object, referencing 1462 images with 7  bands
## Images:
##       name left top bottom right            datetime
## 1 20150101 -180  90    -90   180 2015-01-01T00:00:00
## 2 20150102 -180  90    -90   180 2015-01-02T00:00:00
## 3 20150103 -180  90    -90   180 2015-01-03T00:00:00
## 4 20150104 -180  90    -90   180 2015-01-04T00:00:00
## 5 20150105 -180  90    -90   180 2015-01-05T00:00:00
## 6 20150106 -180  90    -90   180 2015-01-06T00:00:00
##                                                      srs
## 1 +proj=longlat +a=6378137 +rf=298.257232666016 +no_defs
## 2 +proj=longlat +a=6378137 +rf=298.257232666016 +no_defs
## 3 +proj=longlat +a=6378137 +rf=298.257232666016 +no_defs
## 4 +proj=longlat +a=6378137 +rf=298.257232666016 +no_defs
## 5 +proj=longlat +a=6378137 +rf=298.257232666016 +no_defs
## 6 +proj=longlat +a=6378137 +rf=298.257232666016 +no_defs
## [ omitted 1456 images ] 
## 
## Bands:
##             name offset scale unit       nodata image_count
## 1      ice_accum      0     1      29999.000000        1462
## 2       ice_rate      0     1      29999.000000        1462
## 3   liquid_accum      0     1      29999.000000        1462
## 4 liquid_percent      0     1                          1462
## 5    liquid_rate      0     1      29999.000000        1462
## 6    total_accum      0     1      29999.000000        1462
## 7     total_rate      0     1      29999.000000        1462
SM.files = list.files("/media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE",
                      recursive = TRUE, full.names = TRUE, pattern=".*nc")
head(SM.files)
## [1] "/media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130101000000-fv04.7.nc"
## [2] "/media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130102000000-fv04.7.nc"
## [3] "/media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130103000000-fv04.7.nc"
## [4] "/media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130104000000-fv04.7.nc"
## [5] "/media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130105000000-fv04.7.nc"
## [6] "/media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130106000000-fv04.7.nc"
if (!file.exists("SM.db")) {
  SM.collection = create_image_collection(SM.files, "ESA_CCI_SM_PASSIVE.json", "SM.db")
}
SM.collection = image_collection("SM.db")
SM.collection
## A GDAL image collection object, referencing 2556 images with 8  bands
## Images:
##                                                                                                                          name
## 1 /media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130101000000-fv04.7
## 2 /media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130102000000-fv04.7
## 3 /media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130103000000-fv04.7
## 4 /media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130104000000-fv04.7
## 5 /media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130105000000-fv04.7
## 6 /media/marius/Samsung_T5/eodata/ESACCI-SOILMOISTURE-PASSIVE/2013/ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-20130106000000-fv04.7
##   left top bottom right            datetime                                 srs
## 1 -180  90    -90   180 2013-01-01T00:00:00 +proj=longlat +datum=WGS84 +no_defs
## 2 -180  90    -90   180 2013-01-02T00:00:00 +proj=longlat +datum=WGS84 +no_defs
## 3 -180  90    -90   180 2013-01-03T00:00:00 +proj=longlat +datum=WGS84 +no_defs
## 4 -180  90    -90   180 2013-01-04T00:00:00 +proj=longlat +datum=WGS84 +no_defs
## 5 -180  90    -90   180 2013-01-05T00:00:00 +proj=longlat +datum=WGS84 +no_defs
## 6 -180  90    -90   180 2013-01-06T00:00:00 +proj=longlat +datum=WGS84 +no_defs
## [ omitted 2550 images ] 
## 
## Bands:
##             name offset scale                               unit       nodata
## 1         dnflag      0     1                                        0.000000
## 2           flag      0     1                                      127.000000
## 3     freqbandID      0     1                                        0.000000
## 4           mode      0     1                                        0.000000
## 5         sensor      0     1                                        0.000000
## 6             sm      0     1                             m3 m-3 -9999.000000
## 7 sm_uncertainty      0     1                             m3 m-3 -9999.000000
## 8             t0      0     1 days since 1970-01-01 00:00:00 UTC -9999.000000
##   image_count
## 1        2556
## 2        2556
## 3        2556
## 4        2556
## 5        2556
## 6        2556
## 7        2556
## 8        2556

We now define our area of interest, select the variables of interest, and plot separate data cubes (using the same data cube view) for one year to get an initial overview.

view_de = cube_view(srs = "EPSG:3035", extent = list(left=4039313, bottom=2775429,right=4670348, top=3482115, t0 = "2018-01", t1="2018-12"), dx = 5000, dt = "P1M", aggregation = "mean", resampling = "bilinear")
  
ndvi.col = function(n) {
  rev(sequential_hcl(n, "Green-Yellow"))
}

prcp.col = function(n) {
  rev(sequential_hcl(n, "Purple-Blue"))
}

raster_cube(MOD13A2.collection, view_de) %>%
  select_bands("NDVI") %>% 
  plot(key.pos = 1, col = ndvi.col)

raster_cube(GPM.collection, view_de) %>%
  select_bands("total_accum") %>% 
  plot(key.pos = 1, col = prcp.col, zlim = c(0, 100), nbreaks = 100)

raster_cube(SM.collection, view_de) %>%
  select_bands("sm") %>%
  plot(key.pos = 1, zlim = c(0,1), col = prcp.col, nbreaks=100)

Next, we preprocess our data cubes separately as described below:

  • For the NDVI cube, we decompose the pixel time series by removing seasonal and trend components using the R function stl(). We refer to the remainder as anomalies. This can be done with the apply_time() function, which expects a user-defined R function over pixel time series but does no reduction (compared to reduce_time(). In the provided function, we first convert time series to ts objects, fill NA values by linear interpolation using zoo::na_approx() and afterwards use the stl() function to decompose the time series.

  • For the soil moisture cube, we remove seasonal and trend components but compute a rolling mean of the last two months before.

  • We calculate the monthly standard precipitation index from the precipitation cube using the R function spi() from package precintcon.

Notice that to keep this example simple and computation times in the order of minutes, we still aggregate our observations monthly and limit our analysis to four years.

view_de = cube_view(view_de, extent = list(t0 = "2015-01", t1 = "2018-12"))

raster_cube(MOD13A2.collection, view_de) %>%
  select_bands("NDVI") %>%
  apply_time(name = "NDVI_anomaly", FUN = function(x) {
    library(zoo)
    y = na.approx(x["NDVI",], na.rm = FALSE)
    if (any(is.na(y))) return(rep(NA, ncol(x)))
    y = ts(y, frequency = 12,  start = c(2015, 1))
    y_decomposed = stl(y, "periodic")
    res = as.vector(y_decomposed$time.series[,"remainder"])
    return(res)
  }) -> ndvi.anom

raster_cube(GPM.collection, view_de) %>%
  select_bands("total_accum") %>% 
  apply_time(names = "spi", FUN = function(x) {
    library(precintcon)
    year = as.numeric(substr(colnames(x), 1, 4))
    month = as.numeric(substr(colnames(x), 5, 6))
    y = data.frame(year = year, months = month, precipitation = x["total_accum",])
    class(y) <- c("data.frame", "precintcon.monthly")
    if (any(is.na(y$precipitation))) return(rep(1, ncol(x)))
    spi_monthly = spi(y, period = 1)$spi
    return(spi_monthly)
  }) -> prcp.spi

raster_cube(SM.collection, view_de) %>%
  select_bands("sm") %>% 
  apply_time(names = "sm_anomaly", FUN = function(x) {
    library(zoo)
    y = na.approx(x["sm",], na.rm = FALSE)
    if (any(is.na(y))) return(rep(NA, ncol(x)))
    y = rollmean(y, 2, fill = NA, align = "right")
    y = ts(y[-1], frequency = 12,  start = c(2015, 2))
    y_decomposed = stl(y, "periodic")
    res = c(0, as.vector(y_decomposed$time.series[,"remainder"]))
    return(res)
  }) -> sm.anom

To demonstrate how we can combine the three data cubes anyway, we apply a simple statistical test for zero correlation (Kendall’s \(\tau\)) on \((NDVI_t, SPI_{t-0..2})\) and \((NDVI_t, SM_{t-0..2})\) for each time series below.

out_bandnames = c("NDVI_SPI_LAG_0_tau", "NDVI_SPI_LAG_0_p",
                  "NDVI_SPI_LAG_1_tau", "NDVI_SPI_LAG_1_p",
                  "NDVI_SPI_LAG_2_tau", "NDVI_SPI_LAG_2_p",
                  "NDVI_SM_LAG_0_tau", "NDVI_SM_LAG_0_p",
                  "NDVI_SM_LAG_1_tau", "NDVI_SM_LAG_1_p",
                  "NDVI_SM_LAG_2_tau", "NDVI_SM_LAG_2_p")
join_bands(list(ndvi.anom, prcp.spi, sm.anom), cube_names = c("X1","X2","X3")) %>%
  reduce_time(FUN = function(x) {
    v = data.frame(ndvi = x["X1.NDVI_anomaly", ], spi = x["X2.spi", ], sm = x["X3.sm_anomaly", ])
    # use only April to October
    #month = as.numeric(substr(colnames(x), 5,6))
    #v = v[which(month >= 4 & month <= 10),]
    if (sum(complete.cases(v)) < 6) return(rep(NA, 12))
    n = nrow(v)

    res = rep(NA, 12)
    ct = cor.test(v$ndvi, v$spi, use = "complete.obs", method = "kendall")
    res[c(1,2)] = c(ct$estimate, ct$p.value)
    ct = cor.test(v$ndvi[2:n], v$spi[1:(n-1)], use = "complete.obs", method = "kendall")
    res[c(3,4)] = c(ct$estimate, ct$p.value)
    ct = cor.test(v$ndvi[3:n], v$spi[1:(n-2)], use = "complete.obs", method = "kendall")
    res[c(5,6)] = c(ct$estimate, ct$p.value)
    ct = cor.test(v$ndvi, v$sm, use = "complete.obs", method = "kendall")
    res[c(7,8)] = c(ct$estimate, ct$p.value)
    ct = cor.test(v$ndvi[2:n], v$sm[1:(n-1)], use = "complete.obs", method = "kendall")
    res[c(9,10)] = c(ct$estimate, ct$p.value)
    ct = cor.test(v$ndvi[3:n], v$sm[1:(n-2)], use = "complete.obs", method = "kendall", na.action = na.exclude)
    res[c(11,12)] = c(ct$estimate, ct$p.value)
    
    return(res)
    
  }, names = out_bandnames) -> result.cube
  result.cube
## A GDAL data cube proxy object
## 
## Dimensions:
##         low      high count pixel_size chunk_size
## t   2015-01   2015-01     1       P48M          1
## y   2773772   3483772   142       5000        256
## x 4037330.5 4672330.5   127       5000        256
## 
## Bands:
##                  name offset scale nodata unit
## 1  NDVI_SPI_LAG_0_tau      0     1    NaN     
## 2    NDVI_SPI_LAG_0_p      0     1    NaN     
## 3  NDVI_SPI_LAG_1_tau      0     1    NaN     
## 4    NDVI_SPI_LAG_1_p      0     1    NaN     
## 5  NDVI_SPI_LAG_2_tau      0     1    NaN     
## 6    NDVI_SPI_LAG_2_p      0     1    NaN     
## 7   NDVI_SM_LAG_0_tau      0     1    NaN     
## 8     NDVI_SM_LAG_0_p      0     1    NaN     
## 9   NDVI_SM_LAG_1_tau      0     1    NaN     
## 10    NDVI_SM_LAG_1_p      0     1    NaN     
## 11  NDVI_SM_LAG_2_tau      0     1    NaN     
## 12    NDVI_SM_LAG_2_p      0     1    NaN
  plot(result.cube, key.pos = 1, zlim=c(-1,1), col = viridis::viridis, ncol = 4, nbreaks=50)

Is this meaningful? Probably not, but it shows how complex calculations can be applied on data cubes in R. In applications, one would of course use much longer time series to derive anomalies and furthermore, the interactions of our variables are much more complex in reality (and may need more information on soil, vegetation type, elevation, and similar). The result contains \(p\) values and \(\tau\) estimates for both pairs of variables at temporal lags 0, 1, and 2.

References

Dorigo, W., Wagner, W., Albergel, C., Albrecht, F., Balsamo, G., Brocca, L., Chung, D., Ertl, M., Forkel, M., Gruber, A., and others (2017), “ESA cci soil moisture for improved earth system understanding: State-of-the art and future directions,” Remote Sensing of Environment, Elsevier, 203, 185–215.

Gruber, A., Dorigo, W. A., Crow, W., and Wagner, W. (2017), “Triple collocation-based merging of satellite soil moisture retrievals,” IEEE Transactions on Geoscience and Remote Sensing, IEEE, 55, 6780–6792.

Gruber, A., Scanlon, T., Schalie, R. van der, Wagner, W., and Dorigo, W. (2019), “Evolution of the esa cci soil moisture climate data records and their underlying merging methodology,” Earth System Science Data, Copernicus Publications, 1–37.

Huffman, G., Stocker, E., Bolvin, D., Nelkin, E., and Jackson, T. (2019), “GPM imerg final precipitation l3 half hourly 0.1 degree x 0.1 degree v06, greenbelt, md, goddard earth sciences data and information services center (ges disc).”