OpenGeoHub Summer School 2021

Please notice that code chunks in this document are meant to be executed on a Amazon Web Services (AWS) machine in region us-west-2 (Oregon).

Introduction

This tutorial demonstrates how we can access and process Sentinel-2 data in the cloud using the R packages rstac (Brazil Data Cube Team 2021) and gdalcubes (Appel and Pebesma 2019).

Two examples on the creation of composite images and more complex time series analysis will introduce important functions of both packages.

Other packages used in this tutorial include stars(Pebesma 2019) and tmap(Tennekes 2018) for creating interactive maps, sf(Pebesma 2018) for processing vector data, and colorspace(Zeileis et al. 2020) for visualizations with accessible colors.

Notice that the examples have been selected to yield computation times acceptable for a live demonstration. Please feel free to apply the examples on larger areas of interest and/or using a higher spatial resolution.

Example 1: Creating composite images

Our study area (the main land area of the Netherlands) is given in a (poorly) digitized GeoPackage file NL.gpkg. We can create a simple interactive map using the sf and tmap packages by running:

library(sf)
nl_shape = read_sf("NL.gpkg")
library(tmap)
tmap_mode("view")

## tmap mode set to interactive viewing

tm_shape(st_geometry(nl_shape)) +  tm_polygons()

Looking at the features, we see that:

st_crs(nl_shape)

## Coordinate Reference System:
##   User input: ETRS89-extended / LAEA Europe 
##   wkt:
## PROJCRS["ETRS89-extended / LAEA Europe",
##     BASEGEOGCRS["ETRS89",
##         DATUM["European Terrestrial Reference System 1989",
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4258]],
##     CONVERSION["Europe Equal Area 2001",
##         METHOD["Lambert Azimuthal Equal Area",
##             ID["EPSG",9820]],
##         PARAMETER["Latitude of natural origin",52,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",10,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["False easting",4321000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",3210000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["northing (Y)",north,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["easting (X)",east,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Europe - LCC & LAEA"],
##         BBOX[24.6,-35.58,84.17,44.83]],
##     ID["EPSG",3035]]

We aim at generating a cloud-free composite image of our study area for June, 2018 and we use the rstac package to find suitable Sentinel-2 images. However, to use the bbox argument of the corresponding function stac_search() for spatial filtering, we first need to derive and transform the bounding box to latitude / longitude (WGS84) values, for which we use the st_bbox() and st_transform() functions.

bbox = st_bbox(nl_shape) 
bbox

##    xmin    ymin    xmax    ymax 
## 3858729 3078715 4134099 3395687

st_as_sfc(bbox) |>
  st_transform("EPSG:4326") |>
  st_bbox() -> bbox_wgs84
bbox_wgs84

##      xmin      ymin      xmax      ymax 
##  3.027224 50.634114  7.348901 53.635937

Querying images with `rstac`

Now, we can specify our STAC-API endpoint, and post a STAC search request using the transformed bounding box, the datetime range, and the collection name “sentinel-s2-l2a-cogs.”

library(rstac)
s = stac("https://earth-search.aws.element84.com/v0")

items = s |>
  stac_search(collections = "sentinel-s2-l2a-cogs",
              bbox = c(bbox_wgs84["xmin"],bbox_wgs84["ymin"],
                       bbox_wgs84["xmax"],bbox_wgs84["ymax"]), 
              datetime = "2018-06-01/2018-06-30",
              limit = 500) |>
  post_request() 
items

## ###STACItemCollection
## - matched feature(s): 260
## - features (260 item(s)):
##   - S2A_31UES_20180630_0_L2A
##   - S2A_31UFU_20180630_0_L2A
##   - S2A_31UFT_20180630_0_L2A
##   - S2A_32ULB_20180630_0_L2A
##   - S2A_32ULC_20180630_0_L2A
##   - S2A_32ULD_20180630_0_L2A
##   - S2A_31UEV_20180630_0_L2A
##   - S2A_31UGT_20180630_0_L2A
##   - S2A_31UGU_20180630_0_L2A
##   - S2A_31UDV_20180630_0_L2A
##   - ... with 250 more feature(s).
## - field(s): 
## type, stac_version, stac_extensions, context, numberMatched, numberReturned, features, links

By default, the result of the used SPAC API contains only up to 10 items and we need to increase this value using the limit argument. Here, we got a list of 260 STAC items.

Looking at one of the items, we see:

names(items$features[[10]])

##  [1] "type"            "stac_version"    "stac_extensions" "id"             
##  [5] "bbox"            "geometry"        "properties"      "collection"     
##  [9] "assets"          "links"

The assets element contains direct links to the image files for separate bands and the properties element contains a lot of useful metadata including cloud coverage, datetime, projection, and more:

items$features[[10]]$assets$B05

## $title
## [1] "Band 5"
## 
## $type
## [1] "image/tiff; application=geotiff; profile=cloud-optimized"
## 
## $roles
## [1] "data"
## 
## $gsd
## [1] 20
## 
## $`eo:bands`
## $`eo:bands`[[1]]
## $`eo:bands`[[1]]$name
## [1] "B05"
## 
## $`eo:bands`[[1]]$center_wavelength
## [1] 0.7039
## 
## $`eo:bands`[[1]]$full_width_half_max
## [1] 0.019
## 
## 
## 
## $href
## [1] "https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/31/U/DV/2018/6/S2A_31UDV_20180630_0_L2A/B05.tif"
## 
## $`proj:shape`
## [1] 5490 5490
## 
## $`proj:transform`
## [1]      20       0  399960       0     -20 6000000       0       0       1

items$features[[10]]$properties$`eo:cloud_cover`

## [1] 0

Creating an image collection

Next, we load the gdalcubes package and use this list of features from rstac to create a gdalcubes image collection object with the stac_image_collection() function. Compared to using gdalcubes with imagery on local storage, this does not need to open and read metadata from all files because STAC items already contain all relevant metadata including datetime, spatial extent, and how files relate to bands. As a result, creation of an image collection from STAC is quite fast.

library(gdalcubes)
s2_collection = stac_image_collection(items$features)
s2_collection

## Image collection object, referencing 260 images with 18  bands
## Images:
##                       name     left      top   bottom    right
## 1 S2A_31UES_20180630_0_L2A 2.999727 51.45117 50.45353 4.579529
## 2 S2A_31UFU_20180630_0_L2A 4.464995 53.24020 52.22257 6.141754
## 3 S2A_31UFT_20180630_0_L2A 4.436135 52.34135 51.32453 6.077743
## 4 S2A_32ULB_20180630_0_L2A 6.123723 51.42176 50.98568 6.374427
## 5 S2A_32ULC_20180630_0_L2A 6.065813 52.33094 51.32806 6.852503
## 6 S2A_32ULD_20180630_0_L2A 6.004777 53.23819 52.22622 7.349580
##              datetime        srs
## 1 2018-06-30T10:54:40 EPSG:32631
## 2 2018-06-30T10:54:40 EPSG:32631
## 3 2018-06-30T10:54:40 EPSG:32631
## 4 2018-06-30T10:54:40 EPSG:32632
## 5 2018-06-30T10:54:40 EPSG:32632
## 6 2018-06-30T10:54:40 EPSG:32632
## [ omitted 254 images ] 
## 
## Bands:
##            name offset scale unit nodata image_count
## 1           B01      0     1                     260
## 2           B02      0     1                     260
## 3           B03      0     1                     260
## 4           B04      0     1                     260
## 5           B05      0     1                     260
## 6           B06      0     1                     260
## 7           B07      0     1                     260
## 8           B08      0     1                     260
## 9           B09      0     1                     260
## 10          B11      0     1                     260
## 11          B12      0     1                     260
## 12          B8A      0     1                     260
## 13 overview:B02      0     1                     260
## 14 overview:B03      0     1                     260
## 15 overview:B04      0     1                     260
## 16   visual:B02      0     1                     260
## 17   visual:B03      0     1                     260
## 18   visual:B04      0     1                     260

However, this function takes some further useful arguments. First, we see that our image collection does not contain the SCL band that contains information on cloud and cloud shadow pixels. This band is ignored by default, because it is missing the eo:bands properties in the STAC API response. As an alternative to consider this band, we can specify asset names manually using the asset_names argument. Second, the result contains all images although there are some with almost no clear pixels. To reduce the number of images, we can provide a function as the property_filter argument. This function receives the properties element (a list) of a STAC item as argument and is expected to produce a single logical value, where an image is ignored if the function returns FALSE.

assets = c("B01","B02","B03","B04","B05","B06", "B07","B08","B8A","B09","B11","SCL")
 s2_collection = stac_image_collection(items$features, asset_names = assets, property_filter = function(x) {x[["eo:cloud_cover"]] < 20})

## Warning in stac_image_collection(items$features, asset_names = assets,
## property_filter = function(x) {: STAC asset with name 'SCL' does not include
## eo:bands metadata and will be considered as a single band source

s2_collection

## Image collection object, referencing 92 images with 12  bands
## Images:
##                       name     left      top   bottom    right
## 1 S2A_31UES_20180630_0_L2A 2.999727 51.45117 50.45353 4.579529
## 2 S2A_31UFU_20180630_0_L2A 4.464995 53.24020 52.22257 6.141754
## 3 S2A_31UFT_20180630_0_L2A 4.436135 52.34135 51.32453 6.077743
## 4 S2A_32ULB_20180630_0_L2A 6.123723 51.42176 50.98568 6.374427
## 5 S2A_32ULC_20180630_0_L2A 6.065813 52.33094 51.32806 6.852503
## 6 S2A_32ULD_20180630_0_L2A 6.004777 53.23819 52.22622 7.349580
##              datetime        srs
## 1 2018-06-30T10:54:40 EPSG:32631
## 2 2018-06-30T10:54:40 EPSG:32631
## 3 2018-06-30T10:54:40 EPSG:32631
## 4 2018-06-30T10:54:40 EPSG:32632
## 5 2018-06-30T10:54:40 EPSG:32632
## 6 2018-06-30T10:54:40 EPSG:32632
## [ omitted 86 images ] 
## 
## Bands:
##    name offset scale unit nodata image_count
## 1   B01      0     1                      92
## 2   B02      0     1                      92
## 3   B03      0     1                      92
## 4   B04      0     1                      92
## 5   B05      0     1                      92
## 6   B06      0     1                      92
## 7   B07      0     1                      92
## 8   B08      0     1                      92
## 9   B09      0     1                      92
## 10  B11      0     1                      92
## 11  B8A      0     1                      92
## 12  SCL      0     1                      92

As a result we get an image collection with 92 images and the SCL band.

Defining the data cube geometry

The next step in gdalcubes is to specify the geometry of our target data cube, which is called the data cube view. The data cube view is independent from specific image collections and hence does not contain information on spectral bands. In the following code, we use the cube_view() function to create and specify a coarse resolution data cube with cell size 200m x 200m x 30 days, using the Lambert equal area projection for Europe:

v.NL.overview = cube_view(srs="EPSG:3035",  dx=200, dy=200, dt="P30D", 
                  aggregation="median", resampling = "average",
                  extent=list(t0 = "2018-06-01", t1 = "2018-06-30",
                              left=bbox["xmin"]-1000, right=bbox["xmax"]+1000,
                              top=bbox["ymax"] + 1000, bottom=bbox["ymin"]-1000))
v.NL.overview

## A data cube view object
## 
## Dimensions:
##                low             high count pixel_size
## t       2018-06-01       2018-06-30     1       P30D
## y 3077700.81626337 3396700.81626337  1595        200
## x 3857714.27069528 4135114.27069528  1387        200
## 
## SRS: "EPSG:3035"
## Temporal aggregation method: "median"
## Spatial resampling method: "average"

The messages simply tell us that the extent of the data cube has been enlarged because there can’t be anything like partial pixels. Notice that the resampling and aggregation methods define how pixels will be resampled in space and how pixel values from different days within the same data cube cell will be aggregated while aligning images with the target cube geometry. Our data cube geometry has 1595 x 1387 x 1 pixels in space and time directions respectively.

Creating, processing, and plotting the data cube

Afterwards, we can combine our image collection and cube view and create, process, and plot our actual data cube. To ignore pixels that have been classified as clouds or cloud shadows in individual images, we first need to create a mask object that simply tells gdalcubes that corresponding pixels with values 3,8, or 9 (see here) in the SCL band will not contribute to the data cube values and ignored during the temporal aggregation step.

The raster_cube() function then takes the image collection, data cube view, and the mask, and creates a virtual data cube. Calling this function does not start any computations or data transfers but simply returns a proxy object, which knows what to do. The functions select_bands() and filter_geom() to subset spectral bands and crop a data cube by a polygon repsectively both take a data cube as an input and produce a proxy object (or virtual data cube) as a result. Calling plot() will eventually start all the computations and plot the result. Computations are multithreaded (we use up to 16 threads here) and no intermediate results of the operations are written to disk.

S2.mask = image_mask("SCL", values = c(3,8,9))

gdalcubes_options(threads = 16)

system.time(
  raster_cube(s2_collection, v.NL.overview, S2.mask) |>
    select_bands(c("B02", "B03", "B04")) |>
    filter_geom(nl_shape$geom) |>
    plot(rgb = 3:1, zlim=c(0,1500)))

##    user  system elapsed 
## 289.370   8.632  34.916

If we are interested in a smaller area at higher resolution, we can create a data cube with a different data cube view as in the following example.

v.amsterdam = cube_view(view = v.NL.overview, dx=10, dy=10,
                  extent=list(left = 3968584, right = 3979617,
                              top = 3266445, bottom = 3259740))

raster_cube(s2_collection, v.amsterdam, S2.mask) |>
  select_bands(c("B02", "B03", "B04")) |>
  plot(rgb = 3:1, zlim=c(0,1500))

Operations on data cubes

The gdalcubes package comes with some built-in operations to process data cubes. The following operations produce a derived data cube from one or more input data cubes.

Operator	Description
`apply_pixel`	Apply arithmetic expressions on band values per pixel.
`fill_time`	Fill missing values by simple time series interpolation.
`filter_pixel`	Filter pixels based on logical expressions.
`filter_geom`	Filter pixels that do not intersect with a given input geometry
`join_bands`	Combine bands of two or more identically shaped input data cubes.
`reduce_space`	Apply a reducer function over time slices of a data cube.
`reduce_time`	Apply a reducer function over individual pixel time series.
`select_bands`	Select a subset of a data cube’s bands.
`window_time`	Apply a moving window reducer or kernel over individual pixel time series.

There are some more functions for exporting data cubes as netCDF or (cloud-optimized) GeoTIFF files, to read data cubes from netCDF files, to compute summary statistics over polygons (zonal statistics), to query data cube values by irregular spatiotemporal points, and to create animations.

In the example below, we compute the normalized difference vegetation index (NDVI), leave out values with NDVI <= 0 and plot the example. A custom color palette from the colorspace package is used to use light yellow for lower and green for higher NDVI values.

library(colorspace)
ndvi.col = function(n) {
  rev(sequential_hcl(n, "Green-Yellow"))
}

raster_cube(s2_collection, v.NL.overview, S2.mask) |>
  select_bands(c("B04", "B08")) |>
  filter_geom(nl_shape$geom) |>
  apply_pixel("(B08-B04)/(B08+B04)", "NDVI") |>
  filter_pixel("NDVI > 0") |>
  plot(key.pos = 1, zlim=c(0,1), col = ndvi.col)

We can see that some additional water areas with NDVI < 0 have been set to NA.

Interactive maps

We can convert data cubes to stars objects and use tmap for interactive mapping:

library(stars)
raster_cube(s2_collection, v.NL.overview, S2.mask) |>
  select_bands(c("B04", "B08")) |>
  filter_geom(nl_shape$geom) |>
  apply_pixel("(B08-B04)/(B08+B04)", "NDVI") |>
  filter_pixel("NDVI > 0") |>
  st_as_stars() |>
  tm_shape() + tm_raster()

## stars object downsampled to 932 by 1072 cells. See tm_shape manual (argument raster.downsample)

Summary

This example has shown how satellite imagery can be accessed and analyzed in the cloud using STAC and gdalcubes. The analysis has been rather simple by creating cloud-free composite images for only one month. However, the data originated from 92 Sentinel-2 images and downloading all the data first would certainly need some time. Due to the availability of data, it is straightforward to run the analysis in a different area of interest and hence checking the transferability of methods may become somewhat easier. The second example will focus on more complex time series processing for a smaller area.

Example 2: Time series analysis

This example shows how complex times series methods from external R packages can be applied in cloud computing environments using rstac (Brazil Data Cube Team 2021) and gdalcubes (Appel and Pebesma 2019). We will use the bfast R package(Verbesselt et al. 2010) containing unsupervised change detection methods identifying structural breakpoints in vegetation index time series. Specifically, we will use the bfastmonitor() function to monitor changes on a time series of Sentinel-2 imagery.

Compared to the first example, our study area is rather small, covering a small forest area in the southeast of Berlin. The area of interest is again available as a polygon in a GeoPackage file gruenheide_forest.gpkg, which for example can be visualized in a map using the tmap package(Tennekes 2018) package.

library(sf)
geom = read_sf("gruenheide_forest.gpkg")
library(tmap)
tmap_mode("view")

## tmap mode set to interactive viewing

tm_shape(st_geometry(geom)) +  tm_polygons()

Querying images with `rstac`

Using the rstac package, we first request all available images from 2016 to 2020 that intersect with our region of interest. Here, since the polygon has WGS84 as CRS, we do not need to transform the bounding box before using the stac_search() function.

s = stac("https://earth-search.aws.element84.com/v0")

bbox = st_bbox(geom)  

items <- s |>
    stac_search(collections = "sentinel-s2-l2a-cogs",
                bbox = c(bbox["xmin"],bbox["ymin"],bbox["xmax"],bbox["ymax"]), 
                datetime = "2016-01-01/2020-12-31",
                limit = 500) |>
    post_request() 
items

## ###STACItemCollection
## - matched feature(s): 457
## - features (457 item(s)):
##   - S2B_33UVU_20201229_0_L2A
##   - S2A_33UVU_20201227_0_L2A
##   - S2A_33UVU_20201224_0_L2A
##   - S2B_33UVU_20201222_0_L2A
##   - S2B_33UVU_20201219_0_L2A
##   - S2A_33UVU_20201217_0_L2A
##   - S2B_33UVU_20201212_0_L2A
##   - S2B_33UVU_20201209_1_L2A
##   - S2A_33UVU_20201207_0_L2A
##   - S2A_33UVU_20201204_0_L2A
##   - ... with 447 more feature(s).
## - field(s): 
## type, stac_version, stac_extensions, context, numberMatched, numberReturned, features, links

# Date and time of first and last images
range(sapply(items$features, function(x) {x$properties$datetime}))

## [1] "2016-11-05T10:12:57Z" "2020-12-29T10:16:06Z"

This gives us 457 images recorded between Nov. 2016 and Dec. 2020.

Creating a monthly Sentinel-2 data cube

To build a regular monthly data cube, we again need to create a gdalcubes image collection from the STAC query result. Notice that to include the SCL band containing per-pixel quality flags (classification as clouds, cloud-shadows, and others), we need to explicitly list the names of the assets. We furthermore ignore images with 50% or more cloud coverage.

library(gdalcubes)
assets = c("B01","B02","B03","B04","B05","B06", "B07","B08","B8A","B09","B11","SCL")
s2_collection = stac_image_collection(items$features, asset_names = assets, property_filter = function(x) {x[["eo:cloud_cover"]] < 50})

## Warning in stac_image_collection(items$features, asset_names = assets,
## property_filter = function(x) {: STAC asset with name 'SCL' does not include
## eo:bands metadata and will be considered as a single band source

s2_collection

## Image collection object, referencing 200 images with 12  bands
## Images:
##                       name     left      top   bottom    right
## 1 S2A_33UVU_20201107_0_L2A 13.50096 53.24957 52.25346 14.89124
## 2 S2A_33UVU_20201104_0_L2A 13.50096 53.24953 52.25346 15.14626
## 3 S2B_33UVU_20201023_0_L2A 13.50096 53.24958 52.25346 14.90383
## 4 S2B_33UVU_20201003_0_L2A 13.50096 53.24958 52.25346 14.90847
## 5 S2A_33UVU_20200928_0_L2A 13.50096 53.24958 52.25346 14.90143
## 6 S2B_33UVU_20200923_0_L2A 13.50096 53.24958 52.25346 14.91057
##              datetime        srs
## 1 2020-11-07T10:26:09 EPSG:32633
## 2 2020-11-04T10:16:13 EPSG:32633
## 3 2020-10-23T10:26:08 EPSG:32633
## 4 2020-10-03T10:26:08 EPSG:32633
## 5 2020-09-28T10:26:10 EPSG:32633
## 6 2020-09-23T10:26:07 EPSG:32633
## [ omitted 194 images ] 
## 
## Bands:
##    name offset scale unit nodata image_count
## 1   B01      0     1                     200
## 2   B02      0     1                     200
## 3   B03      0     1                     200
## 4   B04      0     1                     200
## 5   B05      0     1                     200
## 6   B06      0     1                     200
## 7   B07      0     1                     200
## 8   B08      0     1                     200
## 9   B09      0     1                     200
## 10  B11      0     1                     200
## 11  B8A      0     1                     200
## 12  SCL      0     1                     200

The result still contains 200 images, from which we can now create a data cube. We use the transformed (UTM) bounding box of our polygon as spatial extent, 10 meter spatial resolution, bilinear spatial resampling and derive monthly median values for all pixel values from multiple images within a month, if available. Notice that we add 100m to each side of the cube.

st_as_sfc(bbox) |>
  st_transform("EPSG:32633") |>
  st_bbox() -> bbox_utm
  
v = cube_view(srs = "EPSG:32633", extent = list(t0 = "2016-01",t1 = "2020-12", 
                                                left = bbox_utm["xmin"]-100, 
                                                right = bbox_utm["xmax"]+100,
                                                bottom = bbox_utm["ymin"]-100, 
                                                top = bbox_utm["ymax"]+100), 
              dx = 10, dy = 10, dt = "P1M",  aggregation = "median", 
              resampling = "bilinear")
v

## A data cube view object
## 
## Dimensions:
##                low             high count pixel_size
## t          2016-01          2020-12    60        P1M
## y 5801913.49436843 5807493.49436843   558         10
## x 415432.739260076 424752.739260076   932         10
## 
## SRS: "EPSG:32633"
## Temporal aggregation method: "median"
## Spatial resampling method: "bilinear"

Next, we create a data cube, subset the red and near infrared bands and crop by our polygon, which simply sets pixel values outside of the polygon to NA. Afterwards we save the data cube as a single netCDF file. Notice that this is not neccessary but storing intermediate results makes debugging sometimes easier, especially if the methods applied afterwards are computationally intensive.

s2.mask = image_mask("SCL", values = c(3,8,9))
gdalcubes_options(threads = 16, ncdf_compression_level = 5)
raster_cube(s2_collection, v, mask = s2.mask) |>
  select_bands(c("B04","B08")) |>
  filter_geom(geom$geometry) |>
  write_ncdf("gruenheide_cube_monthly.nc")

Reduction over space and time

To get an overview of the data, we can compute simple summary statistics (applying reducer functions) over dimensions. Below, we derive minimum, maximum, and mean monthly NDVI values over all pixel time series.

ncdf_cube("gruenheide_cube_monthly.nc") |>
  apply_pixel("(B08-B04)/(B08+B04)", "NDVI") |>
  reduce_time("min(NDVI)", "max(NDVI)", "mean(NDVI)") |>
  plot(key.pos = 1, zlim=c(-0.2,1), col = ndvi.col, nbreaks = 19)

Possible reducers include "min", "mean", "median", "max", "count" (count non-missing values), "sum", "var" (variance), and "sd" (standard deviation). Reducer expressions are always given as a string starting with the reducer name followed by the band name in parentheses. Notice that it is possible to mix reducers and bands.

The "count" reducer is often very useful to get an initial understanding of an image collection:

ncdf_cube("gruenheide_cube_monthly.nc") |>
  reduce_time("count(B04)") |>
  plot(key.pos = 1, zlim=c(0,60), col = viridis::viridis, nbreaks = 7)

We can see that most time series contain valid observations for 40-50 months, which should be sufficient for our example. Similarly, it is also possible to reduce over space, leading to summary time series.

ncdf_cube("gruenheide_cube_monthly.nc") |>
  apply_pixel("(B08-B04)/(B08+B04)", "NDVI") |>
  reduce_space("min(NDVI)", "max(NDVI)", "mean(NDVI)") |>
  plot(join.timeseries = TRUE)

Applying bfastmonitor as a user-defined reducer function

To apply a more complex time series method such as bfastmonitor(), the data cube operations below allow to provide custom user-defined R functions instead of string expressions, which translate to built-in reducers. It is very important that these functions receive arrays as input and must return arrays as output, too. Depending on the operation, the dimensionality of the arrays is different:

Operator	Input	Output
`apply_pixel`	Vector of band values for one pixel	Vector of band values of one pixel
`reduce_time`	Multi-band time series as a matrix	Vector of band values
`reduce_space`	Three-dimensional array with dimensions bands, x, and y	Vector of band values
`apply_time`	Multi-band time series as a matrix	Multi-band time series as a matrix

There is no limit in what we can do in the provided R function, but we must take care of a few things:

The reducer function is executed in a new R process without access to the current workspace. It is not possible to access variables defined outside of the function and packages must be loaded within the function.
The reducer function must always return a vector with the same length (for all time series).
It is a good idea to think about NA values, i.e. you should check whether the complete time series is NA, and that missing values do not produce errors.

Another possibility to apply R functions to data cubes is of course to convert data cubes to stars objects and use the stars package.

In our example, bfastmonitor() returns change date and change magnitude values per time series and we can use reduce_time(). The script below calculates the kNDVI, applies bfastmonitor(), and properly handles errors e.g. due to missing data with tryCatch(). Finally, resulting change dates and magnitudes for all pixel time series are written to disk as a netCDF file.

system.time(
ncdf_cube("gruenheide_cube_monthly.nc") |>
  reduce_time(names = c("change_date", "change_magnitude"), FUN = function(x) {
    knr <- exp(-((x["B08",]/10000)-(x["B04",]/10000))^2/(2))
    kndvi <- (1-knr) / (1+knr)   
    if (all(is.na(kndvi))) {
      return(c(NA,NA))
    }
    kndvi_ts = ts(kndvi, start = c(2016, 1), frequency = 12)
    library(bfast)
    tryCatch({
        result = bfastmonitor(kndvi_ts, start = c(2020,1), 
                              history = "all", level = 0.01)
        return(c(result$breakpoint, result$magnitude))
      }, error = function(x) {
        return(c(NA,NA))
      })
  }) |>
  write_ncdf("result.nc"))

##    user  system elapsed 
## 739.493  14.214 164.441

Running bfastmonitor() is computationally expensive. However, since the data is located in the cloud anyway, it would be obvious to launch one of the more powerful machine instance types with many processors. Parallelization within one instance can be controlled entirely by gdalcubes using gdalcubes_options().

Results

To visualize the change detection results, we load the resulting netCDF file, convert it to a stars object, and finally use the tmap package to create an interactive map to visualize the change date.

library(stars)
ncdf_cube("result.nc") |>
  st_as_stars() ->x
tm_shape(x["change_date"]) + tm_raster()

The result certainly needs some postprocessing to understand types of changes and to identify false positives. The larger region in the west of the study area however clearly shows some deforestation due to the construction of Tesla’s Gigafactory Berlin-Brandenburg.

Summary

This example has shown how more complex time series methods as from external R packages can be applied on data cubes in cloud computing environments. For computationally intensive methods, it is in many cases useful to store intermediate results by combining write_ncdf() and ncdf_cube(). A more powerful instance type would be very useful to scale the presented analysis to larger areas and to reduce computation times further.

References

Appel, M., and Pebesma, E. (2019), “On-demand processing of data cubes from satellite image collections with the gdalcubes library,” Data, 4.

Brazil Data Cube Team (2021), Rstac: Client library for SpatioTemporal asset catalog.

Pebesma, E. (2018), “Simple Features for R: Standardized Support for Spatial Vector Data,” The R Journal, 10, 439–446. https://doi.org/10.32614/RJ-2018-009.

Pebesma, E. (2019), Stars: Spatiotemporal arrays, raster and vector data cubes.

Tennekes, M. (2018), “tmap: Thematic maps in R,” Journal of Statistical Software, 84, 1–39. https://doi.org/10.18637/jss.v084.i06.

Verbesselt, J., Hyndman, R., Newnham, G., and Culvenor, D. (2010), “Detecting trend and seasonal changes in satellite image time series,” Remote sensing of Environment, Elsevier, 114, 106–115.

Zeileis, A., Fisher, J. C., Hornik, K., Ihaka, R., McWhite, C. D., Murrell, P., Stauffer, R., and Wilke, C. O. (2020), “colorspace: A toolbox for manipulating and assessing colors and palettes,” Journal of Statistical Software, 96, 1–49. https://doi.org/10.18637/jss.v096.i01.

Tutorial: Analyzing massive amounts of EO data in the cloud with R, gdalcubes, and STAC

Marius Appel

Sept. 1, 2021

Introduction

Example 1: Creating composite images

Querying images with `rstac`

Creating an image collection

Defining the data cube geometry

Creating, processing, and plotting the data cube

Operations on data cubes

Interactive maps

Summary

Example 2: Time series analysis

Querying images with `rstac`

Creating a monthly Sentinel-2 data cube

Reduction over space and time

Applying bfastmonitor as a user-defined reducer function

Results

Summary

References

Tutorial: Analyzing massive amounts of EO data in the cloud with R, gdalcubes, and STAC

Marius Appel

Sept. 1, 2021

Introduction

Example 1: Creating composite images

Querying images with rstac

Creating an image collection

Defining the data cube geometry

Creating, processing, and plotting the data cube

Operations on data cubes

Interactive maps

Summary

Example 2: Time series analysis

Querying images with rstac

Creating a monthly Sentinel-2 data cube

Reduction over space and time

Applying bfastmonitor as a user-defined reducer function

Results

Summary

References

Querying images with `rstac`

Querying images with `rstac`