library(sf)
= read_sf("data/cordoba_region.gpkg")
cordoba_shape
library(tmap)
tm_shape(st_geometry(cordoba_shape)) +
tm_polygons(alpha = 0.2, col = "red")
Example 1: Cloud-free composite images
Please notice that code chunks in this document are meant to be executed on an Amazon Web Services (AWS) machine in region us-west-2
(Oregon). Examples have been selected to yield computation times acceptable for a live demonstration. Please feel free to apply on larger areas and/or using a higher spatial resolution.
Introduction
In this example, we will create a cloud-free RGB composite image for the Córdoba Province, Argentina from a collection of Seninel-2 images. We will read imagery from the open Sentinel-2 COG catalog on AWS and use the available Earth Search STAC-API service to search for images intersecting with our region of interest.
1. Define area of interest
Our area of interest is provided as a GeoPackage file (data/cordoba_region.gpkg), containing a single polygon. We can use the sf
package (Pebesma 2018) to read the file afterwards use the tmap
package (Tennekes 2018) to create a simple map.
2. Query available images from STAC
To find images that intersect with our region and time of interest, we can extract the bounding box of our polygons with st_bbox()
. The STAC request, however, expects WGS84 coordinates and we therefore transform (st_transform
) the polygon and derive the bounding box again.
= st_bbox(cordoba_shape)
bbox |>
cordoba_shape st_transform("EPSG:4326") |>
st_bbox() -> bbox_wgs84
bbox_wgs84
## xmin ymin xmax ymax
## -65.77198 -35.00013 -61.77089 -29.50042
Next, we can load the rstac
package (Brazil Data Cube Team 2021), connect to the Earth Search service and request images that intersect with the derived bounding box and the provided time range (March 2022).
library(rstac)
= stac("https://earth-search.aws.element84.com/v0")
s = s |>
items stac_search(collections = "sentinel-s2-l2a-cogs",
bbox = c(bbox_wgs84["xmin"],bbox_wgs84["ymin"],
"xmax"],bbox_wgs84["ymax"]),
bbox_wgs84[datetime = "2022-03-01/2022-03-31") |>
post_request() |> items_fetch(progress = FALSE)
items
## ###STACItemCollection
## - matched feature(s): 385
## - features (385 item(s) / 0 not fetched):
## - S2B_19HGB_20220330_0_L2A
## - S2B_20HKG_20220330_0_L2A
## - S2B_20HLG_20220330_0_L2A
## - S2B_20HMG_20220330_0_L2A
## - S2B_20HNG_20220330_0_L2A
## - S2B_20HKH_20220330_0_L2A
## - S2B_20HLH_20220330_0_L2A
## - S2B_20HMH_20220330_0_L2A
## - S2B_20HNH_20220330_0_L2A
## - S2B_20HKJ_20220330_0_L2A
## - ... with 375 more feature(s).
## - assets:
## thumbnail, overview, info, metadata, visual, B01, B02, B03, B04, B05, B06, B07, B08, B8A, B09, B11, B12, AOT, WVP, SCL
## - other field(s):
## type, stac_version, stac_extensions, context, numberMatched, numberReturned, features, links
The result contains 385 images as a list items$features
. Each item (image) is in turn a list with properties, assets (links to S3 objects / files), and other metadata. Below, we show some example fields of a single item.
$features[[20]]$properties
items$features[[20]]$assets$B08 items
## $datetime
## [1] "2022-03-30T14:21:21Z"
##
## $platform
## [1] "sentinel-2b"
##
## $constellation
## [1] "sentinel-2"
##
## $instruments
## [1] "msi"
##
## $gsd
## [1] 10
##
## $`view:off_nadir`
## [1] 0
##
## $`proj:epsg`
## [1] 32720
##
## $`sentinel:utm_zone`
## [1] 20
##
## $`sentinel:latitude_band`
## [1] "J"
##
## $`sentinel:grid_square`
## [1] "ML"
##
## $`sentinel:sequence`
## [1] "0"
##
## $`sentinel:product_id`
## [1] "S2B_MSIL2A_20220330T141039_N0400_R110_T20JML_20220330T192736"
##
## $`sentinel:data_coverage`
## [1] 100
##
## $`eo:cloud_cover`
## [1] 73.88
##
## $`sentinel:valid_cloud_cover`
## [1] TRUE
##
## $`sentinel:processing_baseline`
## [1] "04.00"
##
## $`sentinel:boa_offset_applied`
## [1] TRUE
##
## $created
## [1] "2022-03-30T23:42:13.704Z"
##
## $updated
## [1] "2022-03-30T23:42:13.704Z"
##
## $title
## [1] "Band 8 (nir)"
##
## $type
## [1] "image/tiff; application=geotiff; profile=cloud-optimized"
##
## $roles
## [1] "data"
##
## $gsd
## [1] 10
##
## $`eo:bands`
## $`eo:bands`[[1]]
## $`eo:bands`[[1]]$name
## [1] "B08"
##
## $`eo:bands`[[1]]$common_name
## [1] "nir"
##
## $`eo:bands`[[1]]$center_wavelength
## [1] 0.8351
##
## $`eo:bands`[[1]]$full_width_half_max
## [1] 0.145
##
##
##
## $href
## [1] "https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/20/J/ML/2022/3/S2B_20JML_20220330_0_L2A/B08.tif"
##
## $`proj:shape`
## [1] 10980 10980
##
## $`proj:transform`
## [1] 10 0 399960 0 -10 6600040 0 0 1
3. Create a gdalcubes image collection from STAC result
To create a data cube from the images contained in the STAC response, we can now use the gdalcubes
package (Appel and Pebesma 2019). First, we need to convert the STAC item list to a gdalcubes image collection object, which indexes available images (in a single-file database). Image collection objects do not contain any pixel data and hence are very small. Below, we use the stac_image_collection()
function that receives a list of STAC items as input and returns a gdalcubes image collection as output. We explicitly provide names of assets to make sure that the “SCL” band is included and at the same time apply a filter function on images to ignore cloudy images.
library(gdalcubes)
= c("B01","B02","B03","B04","B05","B06", "B07","B08","B8A","B09","B11","SCL")
assets = stac_image_collection(items$features, asset_names = assets,
s2_collection property_filter = function(x) {x[["eo:cloud_cover"]] < 20})
s2_collection
## Image collection object, referencing 227 images with 12 bands
## Images:
## name left top bottom right
## 1 S2B_19HGB_20220330_0_L2A -65.87358 -34.51337 -35.29082 -65.59543
## 2 S2B_20HKG_20220330_0_L2A -65.87558 -34.31426 -35.31315 -65.06754
## 3 S2B_20HLG_20220330_0_L2A -65.19991 -34.32183 -35.32718 -63.98063
## 4 S2B_20HMG_20220330_0_L2A -64.10066 -34.33630 -35.33121 -62.89262
## 5 S2B_20HNG_20220330_0_L2A -63.00021 -34.33886 -35.33035 -62.25445
## 6 S2B_20HKH_20220330_0_L2A -65.59248 -33.41797 -34.41180 -65.04591
## datetime srs
## 1 2022-03-30T14:22:30 EPSG:32719
## 2 2022-03-30T14:22:25 EPSG:32720
## 3 2022-03-30T14:22:22 EPSG:32720
## 4 2022-03-30T14:22:19 EPSG:32720
## 5 2022-03-30T14:22:15 EPSG:32720
## 6 2022-03-30T14:22:11 EPSG:32720
## [ omitted 221 images ]
##
## Bands:
## name offset scale unit nodata image_count
## 1 B01 0 1 227
## 2 B02 0 1 227
## 3 B03 0 1 227
## 4 B04 0 1 227
## 5 B05 0 1 227
## 6 B06 0 1 227
## 7 B07 0 1 227
## 8 B08 0 1 227
## 9 B09 0 1 227
## 10 B11 0 1 227
## 11 B8A 0 1 227
## 12 SCL 0 1 227
4. Create a (virtual) data cube
We can now define our target data cube, including the coordinate reference system, the pixel sizes, the spatiotemporal extent as well as methods used for spatial resampling and temporal aggregation.
= cube_view(srs="EPSG:3857", dx=500, dy=500, dt="P1D",
vaggregation="median", resampling = "average",
extent=list(t0 = "2022-03-01", t1 = "2022-03-31",
left=bbox["xmin"], right=bbox["xmax"],
top=bbox["ymax"], bottom=bbox["ymin"]))
v
## A data cube view object
##
## Dimensions:
## low high count pixel_size
## t 2022-03-01 2022-03-31 31 P1D
## y -4163946.78375 -3439446.78375 1449 500
## x -7321754.07345 -6876254.07345 891 500
##
## SRS: "EPSG:3857"
## Temporal aggregation method: "median"
## Spatial resampling method: "average"
To ignore cloud and cloud shadow pixels during construction of the data cube, we define a mask using the SCL band and afterwards combine the collection, the data cube view, and the mask to build a data cube.
= image_mask("SCL", values = c(3,8,9))
S2.mask = raster_cube(s2_collection, v, S2.mask)
S2_cube S2_cube
## A data cube proxy object
##
## Dimensions:
## low high count pixel_size chunk_size
## t 2022-03-01 2022-03-31 31 P1D 1
## y -4163946.78375 -3439446.78375 1449 500 128
## x -7321754.07345 -6876254.07345 891 500 128
##
## Bands:
## name offset scale nodata unit
## 1 B01 0 1 NaN
## 2 B02 0 1 NaN
## 3 B03 0 1 NaN
## 4 B04 0 1 NaN
## 5 B05 0 1 NaN
## 6 B06 0 1 NaN
## 7 B07 0 1 NaN
## 8 B08 0 1 NaN
## 9 B09 0 1 NaN
## 10 B11 0 1 NaN
## 11 B8A 0 1 NaN
## 12 SCL 0 1 NaN
Notice that the result is a virtual data cube (or a proxy object) that still does not contain any pixel data and consumes any processing / memory resources. Instead, the object simply knows how to create the cube when needed (e.g. when you call plot()
, write_tif()
, write_ncdf()
, or similar). This concept is sometimes referred to as lazy evaluation.
5. Process data cube and plot result
Given the data cube, we can apply built-in data cube operations. Notice that calling these operations still will not start any expensive computations / data reading and the returned object is still a virtual data cube. To derive a simple cloud-free mosaic image of our study region, we use the select_bands()
function to filter by spectral bands, the reduce_time()
function to calculate median values for all pixel time series and all three bands, and the filter_geom()
function to cut our polygon from the result. Calling plot()
will eventually start needed computations and finally plot a composite image.
|>
S2_cube select_bands(c("B02", "B03", "B04")) |>
reduce_time("median(B02)","median(B03)","median(B04)") |>
filter_geom(cordoba_shape$geom) |>
plot(rgb = 3:1, zlim = c(0, 1800))
Notice that computations are executed in parallel. Calling gdalcubes_options(parallel = 8)
can be used to e.g. use up to 8 parallel worker processes.
In addition to the operations used above, the gdalcubes
package provides the following operations on data cubes.
Operation | Description |
---|---|
aggregate_space |
Reduce spatial resolution of a cube by applying a spatial aggregation function. |
aggregate_time |
Aggregate and/or regularize time series. |
apply_pixel |
Apply an arithmetic expression to all data cube pixels. |
crop |
Extract a rectangular spatial / temporal / spatiotemporal window. |
fill_time |
Fill missing values of a data cube by simple time series interpolation. |
filter_geom |
Filter pixels by a a spatial polygon. |
filter_pixel |
Filter pixels by a logical expressions on band values. |
join_bands |
Combine bands of two identically shaped data cubes. |
reduce_space |
Apply a reducer function to all spatial slices of a data cube. |
reduce_time |
Apply a reducer function to all pixel time series. |
select_bands |
Select specific bands of a data cube. |
select_time |
Select irregular time slices of a data cube. |
slice_space |
Select a single time series of a data cube. |
slice_time |
Select a single time slice of a data cube. |
window_time |
Apply a moving window aggregate or convolution kernel to all pixel time series. |
6. Greenest pixel composite with a user-defined function
Some operations accept user-defined R functions as arguments. For example, we can write our own reducer function that computes per pixel NDVI values, and returns RGB values at the day with maximum NDVI (greenest pixel) as in the following code example.
raster_cube(s2_collection, v, S2.mask) |>
select_bands(c("B02", "B03", "B04", "B08")) |>
reduce_time(names=c("blue", "green", "red"), FUN = function(x) {
= (x["B08", ] - x["B04", ]) / (x["B08", ] + x["B04", ])
ndvi if (all(is.na(ndvi))) {
return(c(NA,NA,NA))
}= which.max(ndvi)
i return(x[c("B02", "B03", "B04"), i])
|>
}) filter_geom(cordoba_shape$geom) |>
plot(rgb = 3:1, zlim = c(0, 1800))