Example 1: Cloud-free composite images

Please notice that code chunks in this document are meant to be executed on an Amazon Web Services (AWS) machine in region us-west-2 (Oregon). Examples have been selected to yield computation times acceptable for a live demonstration. Please feel free to apply on larger areas and/or using a higher spatial resolution.

Introduction

In this example, we will create a cloud-free RGB composite image for the Córdoba Province, Argentina from a collection of Seninel-2 images. We will read imagery from the open Sentinel-2 COG catalog on AWS and use the available Earth Search STAC-API service to search for images intersecting with our region of interest.

1. Define area of interest

Our area of interest is provided as a GeoPackage file (data/cordoba_region.gpkg), containing a single polygon. We can use the sf package (Pebesma 2018) to read the file afterwards use the tmap package (Tennekes 2018) to create a simple map.

library(sf)
cordoba_shape = read_sf("data/cordoba_region.gpkg")

library(tmap)
tm_shape(st_geometry(cordoba_shape)) +  
  tm_polygons(alpha = 0.2, col = "red")

2. Query available images from STAC

To find images that intersect with our region and time of interest, we can extract the bounding box of our polygons with st_bbox(). The STAC request, however, expects WGS84 coordinates and we therefore transform (st_transform) the polygon and derive the bounding box again.

bbox = st_bbox(cordoba_shape) 
cordoba_shape |>
  st_transform("EPSG:4326") |>
  st_bbox() -> bbox_wgs84
bbox_wgs84
##      xmin      ymin      xmax      ymax 
## -65.77198 -35.00013 -61.77089 -29.50042

Next, we can load the rstac package (Brazil Data Cube Team 2021), connect to the Earth Search service and request images that intersect with the derived bounding box and the provided time range (March 2022).

library(rstac)
s = stac("https://earth-search.aws.element84.com/v0")
items = s |>
  stac_search(collections = "sentinel-s2-l2a-cogs",
              bbox = c(bbox_wgs84["xmin"],bbox_wgs84["ymin"],
                       bbox_wgs84["xmax"],bbox_wgs84["ymax"]), 
              datetime = "2022-03-01/2022-03-31") |>
  post_request() |> items_fetch(progress = FALSE)
items
## ###STACItemCollection
## - matched feature(s): 385
## - features (385 item(s) / 0 not fetched):
##   - S2B_19HGB_20220330_0_L2A
##   - S2B_20HKG_20220330_0_L2A
##   - S2B_20HLG_20220330_0_L2A
##   - S2B_20HMG_20220330_0_L2A
##   - S2B_20HNG_20220330_0_L2A
##   - S2B_20HKH_20220330_0_L2A
##   - S2B_20HLH_20220330_0_L2A
##   - S2B_20HMH_20220330_0_L2A
##   - S2B_20HNH_20220330_0_L2A
##   - S2B_20HKJ_20220330_0_L2A
##   - ... with 375 more feature(s).
## - assets: 
## thumbnail, overview, info, metadata, visual, B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12, AOT, WVP, SCL
## - other field(s): 
## type, stac_version, stac_extensions, context, numberMatched, numberReturned, features, links

The result contains 385 images as a list items$features. Each item (image) is in turn a list with properties, assets (links to S3 objects / files), and other metadata. Below, we show some example fields of a single item.

items$features[[20]]$properties
items$features[[20]]$assets$B08
## $datetime
## [1] "2022-03-30T14:21:21Z"
## 
## $platform
## [1] "sentinel-2b"
## 
## $constellation
## [1] "sentinel-2"
## 
## $instruments
## [1] "msi"
## 
## $gsd
## [1] 10
## 
## $`view:off_nadir`
## [1] 0
## 
## $`proj:epsg`
## [1] 32720
## 
## $`sentinel:utm_zone`
## [1] 20
## 
## $`sentinel:latitude_band`
## [1] "J"
## 
## $`sentinel:grid_square`
## [1] "ML"
## 
## $`sentinel:sequence`
## [1] "0"
## 
## $`sentinel:product_id`
## [1] "S2B_MSIL2A_20220330T141039_N0400_R110_T20JML_20220330T192736"
## 
## $`sentinel:data_coverage`
## [1] 100
## 
## $`eo:cloud_cover`
## [1] 73.88
## 
## $`sentinel:valid_cloud_cover`
## [1] TRUE
## 
## $`sentinel:processing_baseline`
## [1] "04.00"
## 
## $`sentinel:boa_offset_applied`
## [1] TRUE
## 
## $created
## [1] "2022-03-30T23:42:13.704Z"
## 
## $updated
## [1] "2022-03-30T23:42:13.704Z"
## 
## $title
## [1] "Band 8 (nir)"
## 
## $type
## [1] "image/tiff; application=geotiff; profile=cloud-optimized"
## 
## $roles
## [1] "data"
## 
## $gsd
## [1] 10
## 
## $`eo:bands`
## $`eo:bands`[[1]]
## $`eo:bands`[[1]]$name
## [1] "B08"
## 
## $`eo:bands`[[1]]$common_name
## [1] "nir"
## 
## $`eo:bands`[[1]]$center_wavelength
## [1] 0.8351
## 
## $`eo:bands`[[1]]$full_width_half_max
## [1] 0.145
## 
## 
## 
## $href
## [1] "https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/20/J/ML/2022/3/S2B_20JML_20220330_0_L2A/B08.tif"
## 
## $`proj:shape`
## [1] 10980 10980
## 
## $`proj:transform`
## [1]      10       0  399960       0     -10 6600040       0       0       1

3. Create a gdalcubes image collection from STAC result

To create a data cube from the images contained in the STAC response, we can now use the gdalcubes package (Appel and Pebesma 2019). First, we need to convert the STAC item list to a gdalcubes image collection object, which indexes available images (in a single-file database). Image collection objects do not contain any pixel data and hence are very small. Below, we use the stac_image_collection() function that receives a list of STAC items as input and returns a gdalcubes image collection as output. We explicitly provide names of assets to make sure that the “SCL” band is included and at the same time apply a filter function on images to ignore cloudy images.

library(gdalcubes)
assets = c("B01","B02","B03","B04","B05","B06", "B07","B08","B8A","B09","B11","SCL")
s2_collection = stac_image_collection(items$features, asset_names = assets, 
                                      property_filter = function(x) {x[["eo:cloud_cover"]] < 20})
s2_collection
## Image collection object, referencing 227 images with 12 bands
## Images:
##                       name      left       top    bottom     right
## 1 S2B_19HGB_20220330_0_L2A -65.87358 -34.51337 -35.29082 -65.59543
## 2 S2B_20HKG_20220330_0_L2A -65.87558 -34.31426 -35.31315 -65.06754
## 3 S2B_20HLG_20220330_0_L2A -65.19991 -34.32183 -35.32718 -63.98063
## 4 S2B_20HMG_20220330_0_L2A -64.10066 -34.33630 -35.33121 -62.89262
## 5 S2B_20HNG_20220330_0_L2A -63.00021 -34.33886 -35.33035 -62.25445
## 6 S2B_20HKH_20220330_0_L2A -65.59248 -33.41797 -34.41180 -65.04591
##              datetime        srs
## 1 2022-03-30T14:22:30 EPSG:32719
## 2 2022-03-30T14:22:25 EPSG:32720
## 3 2022-03-30T14:22:22 EPSG:32720
## 4 2022-03-30T14:22:19 EPSG:32720
## 5 2022-03-30T14:22:15 EPSG:32720
## 6 2022-03-30T14:22:11 EPSG:32720
## [ omitted 221 images ] 
## 
## Bands:
##    name offset scale unit nodata image_count
## 1   B01      0     1                     227
## 2   B02      0     1                     227
## 3   B03      0     1                     227
## 4   B04      0     1                     227
## 5   B05      0     1                     227
## 6   B06      0     1                     227
## 7   B07      0     1                     227
## 8   B08      0     1                     227
## 9   B09      0     1                     227
## 10  B11      0     1                     227
## 11  B8A      0     1                     227
## 12  SCL      0     1                     227

4. Create a (virtual) data cube

We can now define our target data cube, including the coordinate reference system, the pixel sizes, the spatiotemporal extent as well as methods used for spatial resampling and temporal aggregation.

v= cube_view(srs="EPSG:3857",  dx=500, dy=500, dt="P1D", 
             aggregation="median", resampling = "average",
             extent=list(t0 = "2022-03-01", t1 = "2022-03-31",
                         left=bbox["xmin"], right=bbox["xmax"],
                         top=bbox["ymax"], bottom=bbox["ymin"]))
v
## A data cube view object
## 
## Dimensions:
##              low           high count pixel_size
## t     2022-03-01     2022-03-31    31        P1D
## y -4163946.78375 -3439446.78375  1449        500
## x -7321754.07345 -6876254.07345   891        500
## 
## SRS: "EPSG:3857"
## Temporal aggregation method: "median"
## Spatial resampling method: "average"

To ignore cloud and cloud shadow pixels during construction of the data cube, we define a mask using the SCL band and afterwards combine the collection, the data cube view, and the mask to build a data cube.

S2.mask = image_mask("SCL", values = c(3,8,9))
S2_cube = raster_cube(s2_collection, v, S2.mask)
S2_cube
## A data cube proxy object
## 
## Dimensions:
##              low           high count pixel_size chunk_size
## t     2022-03-01     2022-03-31    31        P1D          1
## y -4163946.78375 -3439446.78375  1449        500        128
## x -7321754.07345 -6876254.07345   891        500        128
## 
## Bands:
##    name offset scale nodata unit
## 1   B01      0     1    NaN     
## 2   B02      0     1    NaN     
## 3   B03      0     1    NaN     
## 4   B04      0     1    NaN     
## 5   B05      0     1    NaN     
## 6   B06      0     1    NaN     
## 7   B07      0     1    NaN     
## 8   B08      0     1    NaN     
## 9   B09      0     1    NaN     
## 10  B11      0     1    NaN     
## 11  B8A      0     1    NaN     
## 12  SCL      0     1    NaN

Notice that the result is a virtual data cube (or a proxy object) that still does not contain any pixel data and consumes any processing / memory resources. Instead, the object simply knows how to create the cube when needed (e.g. when you call plot(), write_tif(), write_ncdf(), or similar). This concept is sometimes referred to as lazy evaluation.

5. Process data cube and plot result

Given the data cube, we can apply built-in data cube operations. Notice that calling these operations still will not start any expensive computations / data reading and the returned object is still a virtual data cube. To derive a simple cloud-free mosaic image of our study region, we use the select_bands() function to filter by spectral bands, the reduce_time() function to calculate median values for all pixel time series and all three bands, and the filter_geom() function to cut our polygon from the result. Calling plot() will eventually start needed computations and finally plot a composite image.

S2_cube |>
  select_bands(c("B02", "B03", "B04")) |>
  reduce_time("median(B02)","median(B03)","median(B04)") |>
  filter_geom(cordoba_shape$geom) |>
  plot(rgb = 3:1, zlim = c(0, 1800))

Notice that computations are executed in parallel. Calling gdalcubes_options(parallel = 8) can be used to e.g. use up to 8 parallel worker processes.

In addition to the operations used above, the gdalcubes package provides the following operations on data cubes.

Table 1: Built-in data cube operations
Operation Description
aggregate_space Reduce spatial resolution of a cube by applying a spatial aggregation function.
aggregate_time Aggregate and/or regularize time series.
apply_pixel Apply an arithmetic expression to all data cube pixels.
crop Extract a rectangular spatial / temporal / spatiotemporal window.
fill_time Fill missing values of a data cube by simple time series interpolation.
filter_geom Filter pixels by a a spatial polygon.
filter_pixel Filter pixels by a logical expressions on band values.
join_bands Combine bands of two identically shaped data cubes.
reduce_space Apply a reducer function to all spatial slices of a data cube.
reduce_time Apply a reducer function to all pixel time series.
select_bands Select specific bands of a data cube.
select_time Select irregular time slices of a data cube.
slice_space Select a single time series of a data cube.
slice_time Select a single time slice of a data cube.
window_time Apply a moving window aggregate or convolution kernel to all pixel time series.

6. Greenest pixel composite with a user-defined function

Some operations accept user-defined R functions as arguments. For example, we can write our own reducer function that computes per pixel NDVI values, and returns RGB values at the day with maximum NDVI (greenest pixel) as in the following code example.

raster_cube(s2_collection, v, S2.mask) |>
  select_bands(c("B02", "B03", "B04", "B08")) |>
  reduce_time(names=c("blue", "green", "red"), FUN = function(x) {
    ndvi = (x["B08", ] - x["B04", ]) / (x["B08", ] + x["B04", ])
    if (all(is.na(ndvi))) {
      return(c(NA,NA,NA))
    }
    i = which.max(ndvi)
    return(x[c("B02", "B03", "B04"), i])
  }) |>
  filter_geom(cordoba_shape$geom) |>
  plot(rgb = 3:1, zlim = c(0, 1800))

References

Appel, M., and Pebesma, E. (2019), On-demand processing of data cubes from satellite image collections with the gdalcubes library,” Data, 4.
Brazil Data Cube Team (2021), Rstac: Client library for SpatioTemporal asset catalog.
Pebesma, E. (2018), Simple Features for R: Standardized Support for Spatial Vector Data,” The R Journal, 10, 439–446. https://doi.org/10.32614/RJ-2018-009.
Tennekes, M. (2018), tmap: Thematic maps in R,” Journal of Statistical Software, 84, 1–39. https://doi.org/10.18637/jss.v084.i06.