OpenGeoHub Summer School 2019, Münster
Part I: Creating, Visualizing, and Exporting Data Cubes from Satellite Image Collections (45 min demonstration, 15 min practical exercises)
Part II: On-the-fly Processing of Data Cubes (45 min demonstration, 15 min practical exercises)
All the material of this tutorial is online at GitHub, including R markdown sources, rendered HTML output, and solutions to the practical exercises. However, the repository does not contain the data used in the tutorial due to its size.
Most parts of the tutorial will use a collection of 180 Landsat 8 surface reflectance images, covering a small part of the Brazilian Amazon forest. The live demonstration will use the full resolution dataset (62 gigabytes compressed; > 200 gigabytes unzipped; available here), whereas we will use a downsampled version of the same dataset with coarse spatial resolution (300 meter pixel size; 740 megabytes compressed; 2 gigabytes unzipped) in the practical part (available here).
After downloading whichever version of the dataset, make sure to unzip it. The following R code will download the low resolution version to your current working directory and unzip it.
download.file("https://uni-muenster.sciebo.de/s/e5yUZmYGX0bo4u9/download",
destfile = "L8_Amazon.zip", mode="wb")
unzip("L8_Amazon.zip", exdir = "L8_Amazon")
Raw satellite imagery is mostly distributed as collection of files, whether on download portals of space agencies, or in cloud computing environments (Amazon Web Services, Google Cloud, …). If we want to analyze more than a single image, or even images from multiple satellites, we quickly run into the following challenges:
To do time series analysis, process larger areas, and / or combine datasets from different sensors / satellites, we first must restructure our data, e.g. as a data cube with a single spatial reference system, regular pixel sizes, both in time and in space.
Notice that what we call cube is actually not really a cube. It has (up to) four dimensions, and the lengths of the dimensions may be different. Therefore, four dimensional regular raster data cubes also cover simple time series, multiband time series, grayscale images, multispectral images, and time-series of images.
GDAL, the Geospatial Data Abstraction Library is a software library reading and writing all relevant raster (and vector) data formats, and providing functions to warp (reproject, rescale, resample, and crop) multiband raster images. It has a three dimensional (space, bands) raster data model and solves some of the problems (data formats, image warping). However, it does not know about the organization of data products, and time. GDAL is written in C / C++ but the rgdal
package (Bivand, Keitt, and Rowlingson 2019) provides an easy to use interface in R.
There are further R packages to process satellite imagery:
raster
(Hijmans 2019)
stars
(Pebesma 2019) (see parallel session)
gdalcubes is a quite new C++ library and R package that mostly wraps functions of the C++ library. It uses GDAL to read, write, and warp images, but understands date/time and how complex satellite image data products are organized. We will focus on the R package here. To get started, please install the gdalcubes package from CRAN with:
install.packages("gdalcubes")
We can load the package and make sure that all computations later in this tutorial use up to 8 threads with:
library(gdalcubes)
## Using gdalcubes library version 0.2.1
gdalcubes_options(threads=8)
Please notice that this tutorial needs package version >= 0.2.0, which you can check by running:
packageVersion("gdalcubes")
## [1] '0.2.1'
To analyze our sample dataset, we must first tell gdalcubes, which files belong to the image collection, and where to find them.
At first, we simply list (recursively) all GeoTIFF files in the directory with the Landsat 8 images:
L8.files = list.files("/media/marius/Samsung_T5/eodata/L8_Amazon_full", pattern = ".tif", recursive = TRUE, full.names = TRUE)
head(L8.files, 15)
## [1] "/media/marius/Samsung_T5/eodata/L8_Amazon_full/LC082260632014071901T1-SC20190715045926/LC08_L1TP_226063_20140719_20170421_01_T1_pixel_qa.tif"
## [2] "/media/marius/Samsung_T5/eodata/L8_Amazon_full/LC082260632014071901T1-SC20190715045926/LC08_L1TP_226063_20140719_20170421_01_T1_radsat_qa.tif"
## [3] "/media/marius/Samsung_T5/eodata/L8_Amazon_full/LC082260632014071901T1-SC20190715045926/LC08_L1TP_226063_20140719_20170421_01_T1_sr_aerosol.tif"
## [4] "/media/marius/Samsung_T5/eodata/L8_Amazon_full/LC082260632014071901T1-SC20190715045926/LC08_L1TP_226063_20140719_20170421_01_T1_sr_band1.tif"
## [5] "/media/marius/Samsung_T5/eodata/L8_Amazon_full/LC082260632014071901T1-SC20190715045926/LC08_L1TP_226063_20140719_20170421_01_T1_sr_band2.tif"
## [6] "/media/marius/Samsung_T5/eodata/L8_Amazon_full/LC082260632014071901T1-SC20190715045926/LC08_L1TP_226063_20140719_20170421_01_T1_sr_band3.tif"
## [7] "/media/marius/Samsung_T5/eodata/L8_Amazon_full/LC082260632014071901T1-SC20190715045926/LC08_L1TP_226063_20140719_20170421_01_T1_sr_band4.tif"
## [8] "/media/marius/Samsung_T5/eodata/L8_Amazon_full/LC082260632014071901T1-SC20190715045926/LC08_L1TP_226063_20140719_20170421_01_T1_sr_band5.tif"
## [9] "/media/marius/Samsung_T5/eodata/L8_Amazon_full/LC082260632014071901T1-SC20190715045926/LC08_L1TP_226063_20140719_20170421_01_T1_sr_band6.tif"
## [10] "/media/marius/Samsung_T5/eodata/L8_Amazon_full/LC082260632014071901T1-SC20190715045926/LC08_L1TP_226063_20140719_20170421_01_T1_sr_band7.tif"
## [11] "/media/marius/Samsung_T5/eodata/L8_Amazon_full/LC082260632014082001T1-SC20190715051515/LC08_L1TP_226063_20140820_20170420_01_T1_pixel_qa.tif"
## [12] "/media/marius/Samsung_T5/eodata/L8_Amazon_full/LC082260632014082001T1-SC20190715051515/LC08_L1TP_226063_20140820_20170420_01_T1_radsat_qa.tif"
## [13] "/media/marius/Samsung_T5/eodata/L8_Amazon_full/LC082260632014082001T1-SC20190715051515/LC08_L1TP_226063_20140820_20170420_01_T1_sr_aerosol.tif"
## [14] "/media/marius/Samsung_T5/eodata/L8_Amazon_full/LC082260632014082001T1-SC20190715051515/LC08_L1TP_226063_20140820_20170420_01_T1_sr_band1.tif"
## [15] "/media/marius/Samsung_T5/eodata/L8_Amazon_full/LC082260632014082001T1-SC20190715051515/LC08_L1TP_226063_20140820_20170420_01_T1_sr_band2.tif"
sum(file.size(L8.files)) / 1000^3 # gigabytes
## [1] 271.3001
We see that every image is represented by a directory, with individual files for spectral bands. We can then add all images to an image collection with:
L8.col = create_image_collection(L8.files, format = "L8_SR", out_file = "L8.db")
# L8.col = image_collection("L8.db")
L8.col
## A GDAL image collection object, referencing 180 images with 10 bands
## Images:
## name left top bottom
## 1 LC08_L1TP_226063_20140719_20170421_01_T1 -54.15776 -3.289862 -5.392073
## 2 LC08_L1TP_226063_20140820_20170420_01_T1 -54.16858 -3.289828 -5.392054
## 3 LC08_L1GT_226063_20160114_20170405_01_T2 -54.16317 -3.289845 -5.392064
## 4 LC08_L1TP_226063_20160724_20170322_01_T1 -54.16317 -3.289845 -5.392064
## 5 LC08_L1TP_226063_20170609_20170616_01_T1 -54.17399 -3.289810 -5.392044
## 6 LC08_L1TP_226063_20170711_20170726_01_T1 -54.15506 -3.289870 -5.392083
## right datetime
## 1 -52.10338 2014-07-19T00:00:00
## 2 -52.11418 2014-08-20T00:00:00
## 3 -52.10878 2016-01-14T00:00:00
## 4 -52.10878 2016-07-24T00:00:00
## 5 -52.11958 2017-06-09T00:00:00
## 6 -52.09798 2017-07-11T00:00:00
## srs
## 1 +proj=utm +zone=22 +datum=WGS84 +units=m +no_defs
## 2 +proj=utm +zone=22 +datum=WGS84 +units=m +no_defs
## 3 +proj=utm +zone=22 +datum=WGS84 +units=m +no_defs
## 4 +proj=utm +zone=22 +datum=WGS84 +units=m +no_defs
## 5 +proj=utm +zone=22 +datum=WGS84 +units=m +no_defs
## 6 +proj=utm +zone=22 +datum=WGS84 +units=m +no_defs
## [ omitted 174 images ]
##
## Bands:
## name offset scale unit nodata image_count
## 1 AEROSOL 0 1 180
## 2 B01 0 1 -9999.000000 180
## 3 B02 0 1 -9999.000000 180
## 4 B03 0 1 -9999.000000 180
## 5 B04 0 1 -9999.000000 180
## 6 B05 0 1 -9999.000000 180
## 7 B06 0 1 -9999.000000 180
## 8 B07 0 1 -9999.000000 180
## 9 PIXEL_QA 0 1 180
## 10 RADSAT_QA 0 1 180
This opens all provided files once, reads some relevant metadata (spatial extent, reference system, recording date/time, and how the file relates to the spectral bands of the data product). The format
argument describes, how this information can be extracted. The gdalcubes package comes with a set of predefined image collection formats for particular data products. We can list available formats with:
collection_formats()
## CHIRPS_v2_0_daily_p05_tif | Image collection format for CHIRPS v 2.0
## | daily global precipitation dataset (0.05
## | degrees resolution) from GeoTIFFs, expects
## | list of .tif or .tif.gz files as input.
## | [TAGS: CHIRPS, precipitation]
## CHIRPS_v2_0_monthly_p05_tif | Image collection format for CHIRPS v 2.0
## | monthly global precipitation dataset (0.05
## | degrees resolution) from GeoTIFFs, expects
## | list of .tif or .tif.gz files as input.
## | [TAGS: CHIRPS, precipitation]
## L8_L1TP | Collection format for Landsat 8 Level 1 TP
## | product [TAGS: Landsat, USGS, Level 1,
## | NASA]
## L8_SR | Collection format for Landsat 8 surface
## | reflectance product [TAGS: Landsat, USGS,
## | Level 2, NASA, surface reflectance]
## MxD10A2 | Collection format for selected bands from
## | the MODIS MxD10A2 (Aqua and Terra) v006
## | Snow Cover product [TAGS: MODIS, Snow
## | Cover]
## MxD11A1 | Collection format for selected bands from
## | the MODIS MxD11A2 (Aqua and Terra) v006
## | Land Surface Temperature product [TAGS:
## | MODIS, LST]
## MxD11A2 | Collection format for selected bands from
## | the MODIS MxD11A2 (Aqua and Terra) v006
## | Land Surface Temperature product [TAGS:
## | MODIS, LST]
## MxD13A3 | Collection format for selected bands from
## | the MODIS MxD13A3 (Aqua and Terra) product
## | [TAGS: MODIS, VI, NDVI, EVI]
## MxD13Q1 | Collection format for selected bands from
## | the MODIS MxD13Q1 (Aqua and Terra) product
## | [TAGS: MODIS, VI, NDVI, EVI]
## PlanetScope_3B_AnalyticMS_SR | Image collection format for PlanetScope
## | 4-band scenes [TAGS: PlanetScope, BOA,
## | Surface Reflectance]
## Sentinel2_L1C | Image collection format for Sentinel 2
## | Level 1C data as downloaded from the
## | Copernicus Open Access Hub, expects a list
## | of file paths as input. The format works on
## | original ZIP compressed as well as
## | uncompressed imagery. [TAGS: Sentinel,
## | Copernicus, ESA, TOA]
## Sentinel2_L1C_AWS | Image collection format for Sentinel 2
## | Level 1C data in AWS [TAGS: Sentinel,
## | Copernicus, ESA, TOA]
## Sentinel2_L2A | Image collection format for Sentinel 2
## | Level 2A data as downloaded from the
## | Copernicus Open Access Hub, expects a list
## | of file paths as input. The format should
## | work on original ZIP compressed as well as
## | uncompressed imagery. [TAGS: Sentinel,
## | Copernicus, ESA, BOA, Surface Reflectance]
## Sentinel2_L2A_THEIA | Image collection format for Sentinel 2
## | Level 2A data as downloaded from Theia.
## | [TAGS: Sentinel, ESA, Flat Reflectance,
## | Theia]
The number of available formats is still rather limited, but continues to grow and is extensible (using add_collection_format()
). In fact, a collection format is a single JSON (JavaScript Object Notation) file, describing some rules how to extract e.g. date/time, and bands from filenames (examples at https://github.com/appelmar/gdalcubes_formats). Writing collection formats for your own non-standard datasets is not too difficult and documented here.
In our example, we used the predefined format "L8_SR"
for Landsat 8 surface reflectance data as downloaded from the USGS portal.
The creation of image collections is typically done only once. We can add images to an existing collection with add_images()
.
We can extract the spatiotemporal extent of the collection with:
extent(L8.col, srs="EPSG:4326")
## $left
## [1] -59.12746
##
## $right
## [1] -52.09798
##
## $top
## [1] -1.844241
##
## $bottom
## [1] -6.84404
##
## $t0
## [1] "2013-06-12T00:00:00"
##
## $t1
## [1] "2019-07-06T00:00:00"
We can define a target data cube by its geometry, i.e., the spatiotemporal extent, the spatial reference system, the spatial size, and the temporal duration of cells. We call this a data cube view, i.e. the geometry of a cube without connecting it to any data. To create a data cube view, we can use the cube_view()
function:
# Coarse resolution overview
v.overview.500m = cube_view(srs="EPSG:3857", extent=L8.col, dx=500, dy=500, dt = "P1Y", resampling="average", aggregation="median")
v.overview.500m
## A data cube view object
##
## Dimensions:
## low high count pixel_size
## t 2013 2019 7 P1Y
## y -763764.387686915 -205264.387686915 1117 500
## x -6582280.06164712 -5799280.06164712 1566 500
##
## SRS: "EPSG:3857"
## Temporal aggregation method: "median"
## Spatial resampling method: "average"
v.subarea.60m = cube_view(extent=list(left=-6180000, right=-6080000, bottom=-550000, top=-450000,
t0="2014-01-01", t1="2018-12-31"), dt="P1Y", dx=60, dy=60, srs="EPSG:3857",
aggregation = "median", resampling = "average")
v.subarea.60m
## A data cube view object
##
## Dimensions:
## low high count pixel_size
## t 2014 2018 5 P1Y
## y -550010 -449990 1667 60
## x -6180010 -6079990 1667 60
##
## SRS: "EPSG:3857"
## Temporal aggregation method: "median"
## Spatial resampling method: "average"
v.subarea.60m.daily = cube_view(view = v.subarea.60m, dt="P1D")
v.subarea.60m.daily
## A data cube view object
##
## Dimensions:
## low high count pixel_size
## t 2014-01-01 2018-01-01 1462 P1D
## y -550010 -449990 1667 60
## x -6180010 -6079990 1667 60
##
## SRS: "EPSG:3857"
## Temporal aggregation method: "median"
## Spatial resampling method: "average"
Notice that the data cube view does not contain any information on bands, because it is independent from particular data products.
Having defined an image collection, and a data cube view, a data cube is simply the combination of the two. We can create a data cube with the raster_cube()
function:
L8.cube.overview = raster_cube(L8.col, v.overview.500m)
L8.cube.overview
## A GDAL data cube proxy object
##
## Dimensions:
## low high count pixel_size chunk_size
## t 2013 2019 7 P1Y 16
## y -763764.387686915 -205264.387686915 1117 500 256
## x -6582280.06164712 -5799280.06164712 1566 500 256
##
## Bands:
## name offset scale nodata unit
## 1 AEROSOL 0 1 NaN
## 2 B01 0 1 NaN
## 3 B02 0 1 NaN
## 4 B03 0 1 NaN
## 5 B04 0 1 NaN
## 6 B05 0 1 NaN
## 7 B06 0 1 NaN
## 8 B07 0 1 NaN
## 9 PIXEL_QA 0 1 NaN
## 10 RADSAT_QA 0 1 NaN
L8.cube.subarea = raster_cube(L8.col, v.subarea.60m)
L8.cube.subarea
## A GDAL data cube proxy object
##
## Dimensions:
## low high count pixel_size chunk_size
## t 2014 2018 5 P1Y 16
## y -550010 -449990 1667 60 256
## x -6180010 -6079990 1667 60 256
##
## Bands:
## name offset scale nodata unit
## 1 AEROSOL 0 1 NaN
## 2 B01 0 1 NaN
## 3 B02 0 1 NaN
## 4 B03 0 1 NaN
## 5 B04 0 1 NaN
## 6 B05 0 1 NaN
## 7 B06 0 1 NaN
## 8 B07 0 1 NaN
## 9 PIXEL_QA 0 1 NaN
## 10 RADSAT_QA 0 1 NaN
L8.cube.subarea.daily = raster_cube(L8.col, v.subarea.60m.daily)
L8.cube.subarea.daily
## A GDAL data cube proxy object
##
## Dimensions:
## low high count pixel_size chunk_size
## t 2014-01-01 2018-01-01 1462 P1D 16
## y -550010 -449990 1667 60 256
## x -6180010 -6079990 1667 60 256
##
## Bands:
## name offset scale nodata unit
## 1 AEROSOL 0 1 NaN
## 2 B01 0 1 NaN
## 3 B02 0 1 NaN
## 4 B03 0 1 NaN
## 5 B04 0 1 NaN
## 6 B05 0 1 NaN
## 7 B06 0 1 NaN
## 8 B07 0 1 NaN
## 9 PIXEL_QA 0 1 NaN
## 10 RADSAT_QA 0 1 NaN
This is very cheap, simply returning proxy objects, but not reading any image data. The package delays the computational intensive parts as much as possible (e.g., until users call plot()
). The returned object knows about the bands of the data product. We can use select_bands()
to get only the bands we are interested in:
L8.cube.overview.rgb = select_bands(L8.cube.overview, c("B02", "B03", "B04"))
L8.cube.overview.rgb
## A GDAL data cube proxy object
##
## Dimensions:
## low high count pixel_size chunk_size
## t 2013 2019 7 P1Y 16
## y -763764.387686915 -205264.387686915 1117 500 256
## x -6582280.06164712 -5799280.06164712 1566 500 256
##
## Bands:
## name offset scale nodata unit
## 1 B02 0 1 NaN
## 2 B03 0 1 NaN
## 3 B04 0 1 NaN
There are some utility functions on data cubes, including:
names(L8.cube.overview.rgb)
## [1] "B02" "B03" "B04"
srs(L8.cube.overview.rgb)
## [1] "EPSG:3857"
bands(L8.cube.overview.rgb)
## name offset scale nodata unit
## 1 B02 0 1 NaN
## 2 B03 0 1 NaN
## 3 B04 0 1 NaN
The plot function can be used to visualize data cubes. Calling plot()
will start reading and processing the data:
For a simple RGB plot, we use the rgb
argument to specify which bands correspond to the red, green, and blue channels, and specify the black and white points of the channels (to control contrast and brightness) in zlim
.
plot(L8.cube.overview.rgb, rgb=3:1, zlim=c(0,1500))
plot(select_bands(L8.cube.subarea, c("B02", "B03", "B04")), rgb=3:1, zlim=c(0,1500))
Notice that we can also plot bands individually, creating a two-dimensional plot layout of bands and time. Using key.pos = 1
, and col= viridis::viridis
, we plot a legend at the bottom of the plot, and use the viridis color scales (this requires the viridis package).
plot(L8.cube.overview.rgb, zlim=c(0,1500), key.pos=1, col=viridis::viridis, t=2:5)
plot(select_bands(raster_cube(L8.col,view = v.subarea.60m), c("B05")),col=viridis::viridis, zlim=c(0,6000), key.pos=1)
Plotting an identical data cube twice, with different isualization arguments zlim
, col
, and others will not need to reprocess the data cube again. plot()
internally writes netCDF files to a temporary directory and remembers that a specific cube is already available.
The plot()
function also considers different types of data cubes. For example, if the number of cells in x and y direction equals one, we get a simple time series plot, as we will see later in this tutorial.
The data cube representation makes it straightforward to create animations, by plotting time slices of the cube individually, and use these plots as animation frames:
animate(select_bands(raster_cube(L8.col, v.subarea.60m), c("B02","B03","B04")), rgb=3:1, zlim=c(0,1500))
## # A tibble: 5 x 7
## format width height colorspace matte filesize density
## <chr> <int> <int> <chr> <lgl> <int> <chr>
## 1 gif 1344 960 sRGB FALSE 0 72x72
## 2 gif 1344 960 sRGB TRUE 0 72x72
## 3 gif 1344 960 sRGB TRUE 0 72x72
## 4 gif 1344 960 sRGB TRUE 0 72x72
## 5 gif 1344 960 sRGB TRUE 0 72x72
Sometimes we want to process data cubes further, e.g. with external software. We can export data cubes either as single netCDF files, or as a collection of GeoTIFF files, where each time-slice of a cube will be stored as one (multiband) file.
Both, netCDF and GeoTIFF export support compression, and packing (converting double precision numeric values to smaller integer types by applying an offset and scale) to reduce the file size if needed (see documentation at ?write_ncdf
, and ?write_tif
).
gdalcubes_options(ncdf_compression_level = 1)
write_ncdf(L8.cube.overview.rgb, file.path("~/Desktop", basename(tempfile(fileext = ".nc"))))
gdalcubes_options(ncdf_compression_level = 0)
write_tif()
and write_ncdf()
both return the path(s) to created file(s) as a character vector.
The package comes with a function as_stars()
to convert data cubes to stars
objects (Pebesma 2019), data cubes supporting any number of dimensions, and even vector data cubes.
x = as_stars(
select_bands(
raster_cube(L8.col, v.subarea.60m), "B05"))
x
## stars object with 3 dimensions and 1 attribute
## attribute(s), summary of first 1e+05 cells:
## B05
## Min. : 470.9
## 1st Qu.:3005.2
## Median :3202.7
## Mean :3227.0
## 3rd Qu.:3429.8
## Max. :5351.6
## dimension(s):
## from to offset delta refsys point
## x 1 1667 -6180010 60 +proj=merc +a=6378137 +b=... FALSE
## y 1 1667 -449990 -60 +proj=merc +a=6378137 +b=... FALSE
## time 1 5 NA NA POSIXct FALSE
## values
## x NULL [x]
## y NULL [y]
## time 2014-01-01,...,2018-01-01
plot(x)
## Warning in classInt::classIntervals(na.omit(values), min(nbreaks - 1,
## n.unq), : N is large, and some styles will run very slowly; sampling
## imposed
The resulting object considers bands as array attributes that can be converted to a new dimension e.g. with stars::st_redimension()
.
If the raster cube has only a single band, or a single time slice, it is also possible to convert it to a raster (stack), by using write_tif()
:
x = raster::stack(
write_tif(
select_bands(
raster_cube(L8.col, v.subarea.60m), "B05")))
x
## class : RasterStack
## dimensions : 1667, 1667, 2778889, 5 (nrow, ncol, ncell, nlayers)
## resolution : 60, 60 (x, y)
## extent : -6180010, -6079990, -550010, -449990 (xmin, xmax, ymin, ymax)
## crs : +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs
## names : cube_286d3fcd5a912014, cube_286d3fcd5a912015, cube_286d3fcd5a912016, cube_286d3fcd5a912017, cube_286d3fcd5a912018
The raster_cube()
function receives two further optional arguments.
The mask
argument can be used to apply image masks during construction of the data cube if the data products includes a mask band (e.g. for clouds, cloud shadows, or general quality flags). To define a mask, we typically call the image_mask()
function. This function expects the name of the mask band as its first band
argument. Additionally, we can either pass a vector of values that are masked (all bands set to NA if the specified band
has one of the provided values) as the values
argument, or give a range of mask values by passing minimum and maximum values as min
and max
arguments. Masks can be inverted by setting invert = TRUE
. For bit field masks, it is possible to extract specific bits (applying a logical AND) of the band values, before comparing them to the values or range of the mask.
The example below will mask all pixels with a "PIXEL_QA"
value different from the provided values (taken from the Landsat 8 handbook).
L8.clear_mask = image_mask("PIXEL_QA", values=c(322, 386, 834, 898, 1346, 324, 388, 836, 900, 1348), invert = TRUE)
x = raster_cube(L8.col, v.subarea.60m, mask=L8.clear_mask)
x = select_bands(x, c("B02","B03","B04"))
animate(x, rgb=3:1, zlim=c(0,1500))
## # A tibble: 5 x 7
## format width height colorspace matte filesize density
## <chr> <int> <int> <chr> <lgl> <int> <chr>
## 1 gif 1344 960 sRGB FALSE 0 72x72
## 2 gif 1344 960 sRGB TRUE 0 72x72
## 3 gif 1344 960 sRGB TRUE 0 72x72
## 4 gif 1344 960 sRGB TRUE 0 72x72
## 5 gif 1344 960 sRGB TRUE 0 72x72
The chunking
argument defines the size of data cube chunks as a vector with three integer values for the number of pixels in time, y, and x directions respectively. Chunks are read completely into main memory, i.e., smaller chunks will generally reduce the main memory consumption. The size of chunks also has an effect on parallelization. Internally, chunks of the target data cube are read and processed independently, potentially by multiple threads. However, the effect of the chunk size on the performance is much more complex and depends on how we process the data (e.g., time series vs. time slices oriented), and how the data is stored. Some data formats e.g. do not allow efficient range selection reads whereas others do.
Start R. If not yet done, install the gdalcubes
package from CRAN, and load it.
If not yet done, download the sample dataset from https://uni-muenster.sciebo.de/s/e5yUZmYGX0bo4u9/download and unzip.
Create an image collection from all GeoTIFF files in the unzipped directory.
Create a yearly data cube from the image collection, covering the full spatiotemporal extent at 1 km resolution, using a Brazil Mercator projection (EPSG:5641).
Select the near infrared band ("B05"
) and plot the cube.
Create a false-color image for the year 2017, using the red ("B04"
), swir2 ("B07"
), and blue ("B02"
) bands as red, green, and blue channels. You can select the year 2017 by creating a new data cube view (derived from the previous view, and setting both t0 = "2017"
, and t1 = "2017"
).
The gdalcubes package comes with some built-in operations on data cubes. The following operations produce a derived data cube from one or more input data cubes.
Operator | Description |
---|---|
apply_pixel |
Apply arithmetic expressions on band values per pixel. |
fill_time |
Fill missing values by simple time-series interpolation. |
filter_pixel |
Filter pixels based on logical expressions. |
join_bands |
Combine bands of two identically shaped input data cubes. |
reduce_space |
Apply a reducer function over time slices of a data cube. |
reduce_time |
Apply a reducer function over individual pixel time series. |
select_bands |
Select a subset of a data cube’s bands. |
window_time |
Apply a moving window reducer of kernel over individual pixel time series. |
chunk_apply |
Apply a user-defined R function over chunks of a data cube. |
These operations can be chained (e.g., using the pipe operator %>%
from the magrittr
package, which passes a left-hand-side R expression as the first argument to the function on the right-hand-side (e.g. rnorm(100) %>% mean
).
The implementation of these operations in gdalcubes works chunk-wise, i.e. reads only the chunk of the input data cube that is currently needed. This makes sure that only small parts are needed in main memory.
The apply_pixel()
function can be used to apply per-pixel arithmetic expressions on band values of a data cube. Examples include the calculation of vegetation indexes. The function takes a data cube, a string vector of arithmetic expressions, and a vector of result band names as arguments. Below, we derive the normalized difference vegetation index (NDVI) from the red and near infrared (NIR) channel. We can apply multiple expressions at the same time by providing a vector of expressions (and names).
library(magrittr)
L8.ndvi = raster_cube(L8.col, v.subarea.60m, mask=L8.clear_mask) %>%
select_bands(c("B04","B05")) %>%
apply_pixel("(B05-B04)/(B05+B04)" , names = "NDVI", keep_bands=FALSE)
L8.ndvi
## A GDAL data cube proxy object
##
## Dimensions:
## low high count pixel_size chunk_size
## t 2014 2018 5 P1Y 16
## y -550010 -449990 1667 60 256
## x -6180010 -6079990 1667 60 256
##
## Bands:
## name offset scale nodata unit
## 1 NDVI 0 1 NaN
plot(L8.ndvi, col=viridis::viridis, zlim=c(-0.3,1), key.pos = 1)
Creating a chain of data cube operations still returns proxy objects, knowing the size and shape of the output data cube, before calling plot will start computations. In the example, we do not need the original bands after computing the NDVI and set keep_bands = FALSE
(this is the default).
Similar to apply_pixel()
we can filter pixels by arithmetic expressions with filter_pixel()
. Values of all bands for pixels not fulfilling a logical expression will be set to NA.
raster_cube(L8.col, v.subarea.60m, mask=L8.clear_mask) %>%
select_bands(c("B05","B07")) %>%
apply_pixel("(B05-B07)/(B05+B07)" , names = "NBR") %>%
filter_pixel("NBR < 0.5") %>%
plot(col=viridis::viridis, zlim=c(-1,0.5), key.pos = 1)
Data cubes can be reduced over the space and time dimensions. The reduce_time()
function applies one or more reducer functions over pixel time series, producing a single (multiband) result image, whereas reduce_space()
reduces time slices in the cube to single values (per band), resulting in a single (multiband) time series.
The example below derives median NDVI values over all pixel time series.
raster_cube(L8.col, v.subarea.60m, mask=L8.clear_mask) %>%
select_bands(c("B04","B05")) %>%
apply_pixel("(B05-B04)/(B05+B04)", names = "NDVI", keep_bands=FALSE) %>%
reduce_time("median(NDVI)") %>%
plot(col=viridis::viridis, nbreaks=100, zlim=c(-0.3,1), key.pos = 1)
Possible reducers include "min"
, "mean"
, "median"
, "max"
, "count"
(count non-missing values), "sum"
, "var"
(variance), and "sd"
(standard deviation). Reducer expressions are always given as a string starting with the reducer name followed by the band name in parentheses. Notice that it is not possible to apply more complex arithmetic expressions here. It is however possible to mix reducers and bands:
raster_cube(L8.col, v.subarea.60m, mask=L8.clear_mask) %>%
select_bands(c("B04","B05")) %>%
apply_pixel("(B05-B04)/(B05+B04)", names = "NDVI", keep_bands=TRUE) %>%
reduce_time("median(NDVI)", "mean(NDVI)","max(B05)")
## A GDAL data cube proxy object
##
## Dimensions:
## low high count pixel_size chunk_size
## t 2014 2014 1 P5Y 1
## y -550010 -449990 1667 60 256
## x -6180010 -6079990 1667 60 256
##
## Bands:
## name offset scale nodata unit
## 1 NDVI_median 0 1 NaN
## 2 NDVI_mean 0 1 NaN
## 3 B05_max 0 1 NaN
Results of reduce_space()
are plotted as simple time series.
raster_cube(L8.col, v.subarea.60m, mask=L8.clear_mask) %>%
select_bands(c("B04","B05")) %>%
apply_pixel("(B05-B04)/(B05+B04)", names = "NDVI") %>%
reduce_space("median(NDVI)", "sd(NDVI)") %>%
plot()
The "count"
reducer is often very useful to get an initial understanding of an image collection.
raster_cube(L8.col, cube_view(view=v.overview.500m, dt="P1D"), mask=L8.clear_mask) %>%
select_bands(c("B01")) %>%
reduce_time("count(B01)") %>%
plot(key.pos=1)
raster_cube(L8.col, cube_view(view=v.overview.500m, dt="P1M"), mask=L8.clear_mask) %>%
select_bands("B01") %>%
reduce_space("count(B01)") %>%
plot()
We can see that there are almost no observations during the months from October to May, because the download was limited to images with low cloud percentages.
There are two more built-in functions that operate on individual pixel time series.
The fill_time()
function interpolates missing values by preceding or succeeding values (using simple linear or nearest neighbor interpolation, or carrying observations forwards or backwards), The window_time()
function can either apply a moving window kernel, or apply a reducer function over moving windows.
In the example below, we sum NDVI changes between subsequent time slices in the data cube, and visualize the result using a diverging color scale from the RColorBrewer
package.
raster_cube(L8.col, cube_view(view = v.subarea.60m, extent=list(t0="2014-01",t1="2018-12")), mask=L8.clear_mask) %>%
select_bands(c("B04","B05")) %>%
apply_pixel("(B05-B04)/(B05+B04)", names = "NDVI") %>%
fill_time(method = "locf") %>%
window_time(kernel = c(-1,1), window=c(1,0)) %>%
reduce_time("sum(NDVI)") %>%
plot(zlim=c(-0.4,0.4),nbreaks = 12, col=RColorBrewer::brewer.pal(11, "RdYlBu"), key.pos=1)
So far, we have provided expressions and reducers as characters / strings. The reasons was that these methods automatically translate to C++, i.e. are evaluated in the C++ code. In the current version, reduce_time()
, and apply_pixel()
may also receive R functions as argument. This opens up quite a bunch of things we can do, e.g. using functions from our favorite R packages to process pixel time series. In the example below, we simply fit a line to individual NDVI pixel time series and return its slope (trend).
raster_cube(L8.col, cube_view(view = v.subarea.60m, dx=200), mask = L8.clear_mask) %>%
select_bands(c("B04","B05")) %>%
apply_pixel("(B05-B04)/(B05+B04)", names = "NDVI") %>%
reduce_time(names=c("ndvi_trend"), FUN=function(x) {
z = data.frame(t=1:ncol(x), ndvi=x["NDVI",])
result = NA
if (sum(!is.na(z$ndvi)) > 3) {
result = coef(lm(ndvi ~ t, z, na.action = na.exclude))[2]
}
return(result)
}) %>%
plot(key.pos=1, col=viridis::viridis)
There is no limit in what we can do in the provided R function, but we must take care of a few things:
The reducer function is executed in a new R process without access to the current workspace. It is not possible to access variables defined outside of the function and packages must be loaded within the function.
The reducer function must always return a vector with the same length (for all time series).
It is a good idea to think about NA
values, i.e. you should check whether the complete time series is NA
, and that missing values do not produce errors.
Chaining processes works lazliy; internally gdalcubes creates a process graph of operations that can be serialized as JSON:
raster_cube(L8.col, cube_view(view = v.subarea.60m, extent=list(t0="2014-01",t1="2018-12")), mask=L8.clear_mask) %>%
select_bands(c("B04","B05")) %>%
apply_pixel("(B05-B04)/(B05+B04)", names = "NDVI") %>%
fill_time(method = "locf") %>%
window_time(kernel = c(-1,1), window=c(1,0)) %>%
reduce_time("sum(NDVI)") %>%
as_json %>%
cat
## {
## "cube_type": "reduce_time",
## "in_cube": {
## "cube_type": "window_time",
## "in_cube": {
## "cube_type": "fill_time",
## "in_cube": {
## "band_names": [
## "NDVI"
## ],
## "cube_type": "apply_pixel",
## "expr": [
## "(b05-b04)/(b05+b04)"
## ],
## "in_cube": {
## "bands": [
## "B04",
## "B05"
## ],
## "cube_type": "select_bands",
## "in_cube": {
## "chunk_size": [
## 16,
## 256,
## 256
## ],
## "cube_type": "image_collection",
## "file": "L8.db",
## "mask": {
## "bits": [],
## "invert": true,
## "mask_type": "value_mask",
## "values": [
## 900.0,
## 324.0,
## 836.0,
## 1346.0,
## 1348.0,
## 898.0,
## 388.0,
## 834.0,
## 386.0,
## 322.0
## ]
## },
## "mask_band": "PIXEL_QA",
## "view": {
## "aggregation": "median",
## "resampling": "average",
## "space": {
## "bottom": -550010.0,
## "left": -6180010.0,
## "nx": 1667,
## "ny": 1667,
## "right": -6079990.0,
## "srs": "EPSG:3857",
## "top": -449990.0
## },
## "time": {
## "dt": "P1Y",
## "t0": "2014",
## "t1": "2018"
## }
## },
## "warp_args": []
## }
## },
## "keep_bands": false
## },
## "method": "locf"
## },
## "kernel": [
## -1.0,
## 1.0
## ],
## "win_size_l": 1,
## "win_size_r": 0
## },
## "reducer_bands": [
## [
## "sum",
## "NDVI"
## ]
## ]
## }
This is allows to easily recreate chains of operations, and helps e.g. to cache results.
For other datasets, only the creation of the image collection must be adapted (and names of bands). Depending on the particular dataset, this sometimes requires a bit more than just listing different files.
We can use gdalcubes to process Sentinel-2 (Level 2A) data as they can be downloaded from the Copernicus Open Access Hub.
files = list.files("~/eodata/Sentinel2_L2A/BocaDoAcre",pattern = ".zip", full.names = TRUE)
head(files)
## [1] "/home/marius/eodata/Sentinel2_L2A/BocaDoAcre/S2A_MSIL2A_20190502T144741_N0211_R139_T19LGL_20190502T190458.zip"
## [2] "/home/marius/eodata/Sentinel2_L2A/BocaDoAcre/S2A_MSIL2A_20190509T143751_N0212_R096_T19LGK_20190509T185055.zip"
## [3] "/home/marius/eodata/Sentinel2_L2A/BocaDoAcre/S2A_MSIL2A_20190509T143751_N0212_R096_T19LGL_20190509T185055.zip"
## [4] "/home/marius/eodata/Sentinel2_L2A/BocaDoAcre/S2A_MSIL2A_20190509T143751_N0212_R096_T19LHK_20190509T185055.zip"
## [5] "/home/marius/eodata/Sentinel2_L2A/BocaDoAcre/S2A_MSIL2A_20190509T143751_N0212_R096_T19LHL_20190509T185055.zip"
## [6] "/home/marius/eodata/Sentinel2_L2A/BocaDoAcre/S2A_MSIL2A_20190509T143751_N0212_R096_T20LKQ_20190509T185055.zip"
sum(file.size(files)) / 1000^3 # gigabytes
## [1] 110.0526
These files sum to approx 110 gigabytes and cover a small part of the Brazilian Amazon forest in 2019. Similarly to what we did with the Landsat 8 dataset, we first create an image collection, a data cube view and finally define a chain of operations. The exported GeoTIFF file here is converted to a raster layer and directly visualized in an interactive map using the mapview
package (Appelhans et al. 2019).
create_image_collection(files, "Sentinel2_L2A", "S2.db")
## A GDAL image collection object, referencing 120 images with 15 bands
## Images:
## name left
## 1 S2A_MSIL2A_20190502T144741_N0211_R139_T19LGL_20190502T190458 -67.18515
## 2 S2A_MSIL2A_20190509T143751_N0212_R096_T19LGK_20190509T185055 -67.18084
## 3 S2A_MSIL2A_20190509T143751_N0212_R096_T19LGL_20190509T185055 -67.18515
## 4 S2A_MSIL2A_20190509T143751_N0212_R096_T19LHK_20190509T185055 -66.27149
## 5 S2A_MSIL2A_20190509T143751_N0212_R096_T19LHL_20190509T185055 -66.27794
## 6 S2A_MSIL2A_20190509T143751_N0212_R096_T20LKQ_20190509T185055 -65.73677
## top bottom right datetime
## 1 -8.132306 -9.130645 -66.18192 2019-05-02T14:47:41
## 2 -9.035376 -10.034303 -66.17445 2019-05-09T14:37:51
## 3 -8.132306 -9.130645 -66.18192 2019-05-09T14:37:51
## 4 -9.027271 -10.028033 -65.26328 2019-05-09T14:37:51
## 5 -8.125023 -9.124950 -65.27313 2019-05-09T14:37:51
## 6 -9.036045 -10.034780 -64.73058 2019-05-09T14:37:51
## srs
## 1 +proj=utm +zone=19 +south +datum=WGS84 +units=m +no_defs
## 2 +proj=utm +zone=19 +south +datum=WGS84 +units=m +no_defs
## 3 +proj=utm +zone=19 +south +datum=WGS84 +units=m +no_defs
## 4 +proj=utm +zone=19 +south +datum=WGS84 +units=m +no_defs
## 5 +proj=utm +zone=19 +south +datum=WGS84 +units=m +no_defs
## 6 +proj=utm +zone=20 +south +datum=WGS84 +units=m +no_defs
## [ omitted 114 images ]
##
## Bands:
## name offset scale unit nodata image_count
## 1 AOT 0 1e+00 120
## 2 B01 0 1e-04 0.000000 120
## 3 B02 0 1e-04 0.000000 120
## 4 B03 0 1e-04 0.000000 120
## 5 B04 0 1e-04 0.000000 120
## 6 B05 0 1e-04 0.000000 120
## 7 B06 0 1e-04 0.000000 120
## 8 B07 0 1e-04 0.000000 120
## 9 B08 0 1e-04 0.000000 120
## 10 B09 0 1e-04 0.000000 120
## 11 B11 0 1e-04 0.000000 120
## 12 B12 0 1e-04 0.000000 120
## 13 B8A 0 1e-04 0.000000 120
## 14 SCL 0 1e+00 120
## 15 WVP 0 1e+00 120
S2.col = image_collection("S2.db")
extent(S2.col)
## $left
## [1] -67.18515
##
## $right
## [1] -64.72648
##
## $top
## [1] -8.125023
##
## $bottom
## [1] -10.03478
##
## $t0
## [1] "2019-05-02T14:47:41"
##
## $t1
## [1] "2019-08-22T14:37:59"
S2.mask = image_mask("SCL", values=c(1,2,3,8,9,10))
v.overview.250m = cube_view(srs="EPSG:3857", extent=S2.col, dx=250, dy=250, dt = "P1M", resampling="med", aggregation="median")
v.overview.250m
## A data cube view object
##
## Dimensions:
## low high count pixel_size
## t 2019-05 2019-08 4 P1M
## y -1122920.8970063 -907420.897006302 862 250
## x -7479042.57367612 -7205292.57367612 1095 250
##
## SRS: "EPSG:3857"
## Temporal aggregation method: "median"
## Spatial resampling method: "med"
raster_cube(S2.col, cube_view(view = v.overview.250m, extent=list(t0="2019-06-01")), mask = S2.mask) %>%
select_bands(c("B08", "B12")) %>%
apply_pixel("(B08-B12)/(B08+B12)", "NBR") %>%
fill_time() %>%
reduce_time(names=c("dNBR intensity"), FUN = function(x) {
n = ncol(x)
dnbr = x["NBR", 1:(n-1)] - x["NBR", 2:n]
if (all(is.na(dnbr))) {
return(NA)
}
m = max(dnbr, na.rm = TRUE)
# http://www.un-spider.org/node/10959#targetText=To%20benefit%20from%20the%20magnitude,to%20values%20close%20to%20zero.
if (m > 0.9) {
return(5)
}
else if (m > 0.66) {
return(4)
}
else if (m > 0.44) {
return(3)
}
else if (m > 0.27) {
return(2)
}
else if (m > 0.1) {
return(1)
}
return(NA)
}) %>%
write_tif() %>%
raster::raster() -> x
library(mapview)
mapView(x, maxpixels = 4e+06, layer.name="dNBR intensity",
map.types=c("Esri.WorldImagery","OpenTopoMap"))
v.subarea.B = cube_view(srs="EPSG:3857", extent = list(left=-7340000, right=-7330000, bottom=-1005000, top=-995999, t0="2019-06-01", t1="2019-08-20"), dx = 30, dy = 30, dt = "P5D", resampling = "bilinear", aggregation="median")
v.subarea.B
## A data cube view object
##
## Dimensions:
## low high count pixel_size
## t 2019-06-01 2019-08-24 17 P5D
## y -1005014.5 -995984.5 301 30
## x -7340010 -7329990 334 30
##
## SRS: "EPSG:3857"
## Temporal aggregation method: "median"
## Spatial resampling method: "bilinear"
raster_cube(S2.col, v.subarea.B, mask = S2.mask) %>%
select_bands(c("B02","B03","B04")) %>%
animate(rgb=3:1, zlim=c(0,1200), fps=4)
## # A tibble: 17 x 7
## format width height colorspace matte filesize density
## <chr> <int> <int> <chr> <lgl> <int> <chr>
## 1 gif 1344 960 sRGB FALSE 0 72x72
## 2 gif 1344 960 sRGB TRUE 0 72x72
## 3 gif 1344 960 sRGB TRUE 0 72x72
## 4 gif 1344 960 sRGB TRUE 0 72x72
## 5 gif 1344 960 sRGB TRUE 0 72x72
## 6 gif 1344 960 sRGB TRUE 0 72x72
## 7 gif 1344 960 sRGB TRUE 0 72x72
## 8 gif 1344 960 sRGB TRUE 0 72x72
## 9 gif 1344 960 sRGB TRUE 0 72x72
## 10 gif 1344 960 sRGB TRUE 0 72x72
## 11 gif 1344 960 sRGB TRUE 0 72x72
## 12 gif 1344 960 sRGB TRUE 0 72x72
## 13 gif 1344 960 sRGB TRUE 0 72x72
## 14 gif 1344 960 sRGB TRUE 0 72x72
## 15 gif 1344 960 sRGB TRUE 0 72x72
## 16 gif 1344 960 sRGB TRUE 0 72x72
## 17 gif 1344 960 sRGB TRUE 0 72x72
Data from the Moderate Resolution Imaging Spectroradiometer (MODIS) is typically distributed as HDF4 files, where one file contains all bands / variables. Since GDAL reads the bands as subdatasets, we must know the exact identifiers for our bands, and list all of them. In our example with MOD11A1 data, we must add HDF4_EOS:EOS_GRID:"
before, and "MODIS_Grid_8Day_Fire:FireMask
or "MODIS_Grid_8Day_Fire:QA
respectively after the path to the file.
files = list.files("~/eodata/MOD11A1.006", pattern = ".hdf", recursive = TRUE, full.names = TRUE)
length(files)
## [1] 672
file_subdatasets = expand.grid(
file=files,
subdataset=c("LST_Day_1km", "QC_Day", "LST_Night_1km", "QC_Night"))
gdal_datasets = paste("HDF4_EOS:EOS_GRID:\"", file_subdatasets$file, "\":MODIS_Grid_Daily_1km_LST:",file_subdatasets$subdataset, sep="")
M.col = create_image_collection(gdal_datasets, "MxD11A1", "MOD11A1.db")
M.col
## A GDAL image collection object, referencing 672 images with 4 bands
## Images:
## name
## 1 /home/marius/eodata/MOD11A1.006/2019.05.01/MOD11A1.A2019121.h11v09.006.2019122161235
## 2 /home/marius/eodata/MOD11A1.006/2019.05.01/MOD11A1.A2019121.h11v10.006.2019122161240
## 3 /home/marius/eodata/MOD11A1.006/2019.05.01/MOD11A1.A2019121.h12v09.006.2019122161250
## 4 /home/marius/eodata/MOD11A1.006/2019.05.01/MOD11A1.A2019121.h12v10.006.2019122161236
## 5 /home/marius/eodata/MOD11A1.006/2019.05.01/MOD11A1.A2019121.h13v09.006.2019122161234
## 6 /home/marius/eodata/MOD11A1.006/2019.05.01/MOD11A1.A2019121.h13v10.006.2019122161242
## left top bottom right datetime
## 1 -71.07986 0 -10 -60.00000 2019-05-01T00:00:00
## 2 -74.49244 -10 -20 -60.92560 2019-05-01T00:00:00
## 3 -60.92560 0 -10 -50.00000 2019-05-01T00:00:00
## 4 -63.85067 -10 -20 -50.77133 2019-05-01T00:00:00
## 5 -50.77133 0 -10 -40.00000 2019-05-01T00:00:00
## 6 -53.20889 -10 -20 -40.61706 2019-05-01T00:00:00
## srs
## 1 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs
## 2 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs
## 3 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs
## 4 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs
## 5 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs
## 6 +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +a=6371007.181 +b=6371007.181 +units=m +no_defs
## [ omitted 666 images ]
##
## Bands:
## name offset scale unit nodata image_count
## 1 LST_DAY 0 0.02 K 0.000000 672
## 2 LST_NIGHT 0 0.02 K 0.000000 672
## 3 QC_DAY 0 1.00 672
## 4 QC_NIGHT 0 1.00 672
v = cube_view(srs="EPSG:3857", extent=M.col, dx=5000, dt="P7D", aggregation = "mean", resampling = "average")
v
## A data cube view object
##
## Dimensions:
## low high count pixel_size
## t 2019-05-01 2019-08-20 16 P7D
## y -2274015.46349384 984.53650615504 455 5000
## x -8292620.2828787 -4452620.2828787 768 5000
##
## SRS: "EPSG:3857"
## Temporal aggregation method: "mean"
## Spatial resampling method: "average"
raster_cube(M.col, v, mask=image_mask(band = "QC_DAY", bits = 0:1, values=c(0,1), invert = TRUE)) %>%
select_bands("LST_DAY") %>%
apply_pixel("LST_DAY * 0.02", names = "LST_DAY") %>%
plot(key.pos=1, zlim=c(270, 340), col=function(n) rev(heat.colors(n)))
v = cube_view(srs="EPSG:3857", extent=M.col, dx=5000, dt="P5D", aggregation = "mean", resampling = "average")
raster_cube(M.col, v, mask=image_mask(band = "QC_DAY", bits = 0:1, values=c(0,1), invert = TRUE)) %>%
select_bands("LST_DAY") %>%
apply_pixel("LST_DAY * 0.02 - 273.15", names = "LST_DAY") %>%
animate(key.pos=1, zlim=c(0, 40), col=viridis::viridis, fps = 5,save_as = "LST.gif")
## # A tibble: 23 x 7
## format width height colorspace matte filesize density
## <chr> <int> <int> <chr> <lgl> <int> <chr>
## 1 gif 1344 960 sRGB FALSE 0 72x72
## 2 gif 1344 960 sRGB TRUE 0 72x72
## 3 gif 1344 960 sRGB TRUE 0 72x72
## 4 gif 1344 960 sRGB TRUE 0 72x72
## 5 gif 1344 960 sRGB TRUE 0 72x72
## 6 gif 1344 960 sRGB TRUE 0 72x72
## 7 gif 1344 960 sRGB TRUE 0 72x72
## 8 gif 1344 960 sRGB TRUE 0 72x72
## 9 gif 1344 960 sRGB TRUE 0 72x72
## 10 gif 1344 960 sRGB TRUE 0 72x72
## # … with 13 more rows
Two data cubes with the same shape (the same cube_view, except aggregation and resampling) can be combined with the join_bands()
function. The result data cube contains all bands from both input cubes. In the example, we combine MODIS land surface temperature data with the Sentinel-2 dataset.
v.S2 = cube_view(srs="EPSG:3857", extent=S2.col, dx=1000, dt="P1M", aggregation="median", resampling="average")
v.LST = cube_view(view = v.S2, aggregation = "mean")
S2.cube = raster_cube(S2.col, v.S2, mask = S2.mask)
LST.cube = raster_cube(M.col, v.LST, mask = image_mask(band = "QC_DAY", bits = 0:1, values=c(0,1), invert = TRUE))
join_bands(S2.cube, LST.cube)
## A GDAL data cube proxy object
##
## Dimensions:
## low high count pixel_size chunk_size
## t 2019-05 2019-08 4 P1M 16
## y -1123170.8970063 -907170.897006302 216 1000 256
## x -7479167.57367612 -7205167.57367612 274 1000 256
##
## Bands:
## name offset scale nodata unit
## 1 AOT 0 1e+00 NaN
## 2 B01 0 1e-04 NaN
## 3 B02 0 1e-04 NaN
## 4 B03 0 1e-04 NaN
## 5 B04 0 1e-04 NaN
## 6 B05 0 1e-04 NaN
## 7 B06 0 1e-04 NaN
## 8 B07 0 1e-04 NaN
## 9 B08 0 1e-04 NaN
## 10 B09 0 1e-04 NaN
## 11 B11 0 1e-04 NaN
## 12 B12 0 1e-04 NaN
## 13 B8A 0 1e-04 NaN
## 14 SCL 0 1e+00 NaN
## 15 WVP 0 1e+00 NaN
## 16 LST_DAY 0 2e-02 NaN K
## 17 LST_NIGHT 0 2e-02 NaN K
## 18 QC_DAY 0 1e+00 NaN
## 19 QC_NIGHT 0 1e+00 NaN
Although this is rather a toy example, it opens up a lot of possibilities to study dependencies and interactions of different variables. For example, the robustness of forest ecosystems against heat or dry periods could be analyzed with such multi-sensor data cubes.
On-demand raster data cubes as implemented in gdalcubes make it easier to
Though gdalcubes works with quite a few datasets directly, others require additional preprocessing. These include radar datasets such as Sentinel-1, or datasets on curvilinear grids, such as Sentinel-5P.
gdalcubes can work directly in cloud computing environments. It uses GDAL to read images and hence can use GDAL’s virtual file systems to access data on object storage (e.g. AWS S3 buckets). Processing cubes in distributed (cloud) computing environments is a bit more difficult and is current work in progress.
gdalcubes is a pretty young tool, there are many ideas still to be implemented (e.g. Python interface, user-defined function support for further operations, interfacing image processing libraries such as Orfeo Toolbox, using gdalcubes as a fully open source OpenEO backend, …).
If you have further ideas, questions, or would like to contribute in any other way, please just ask me, or create issues at GitHub.
v.subarea = cube_view(extent=list(left=-6320000, right=-6220000, bottom=-600000, top=-500000,
t0="2014-01-01", t1="2018-12-31"), dt="P1M", dx=100, dy=100
srs="EPSG:3857", aggregation = "median", resampling = "bilinear")
L8.clear_mask = image_mask("PIXEL_QA", values=
c(322, 386, 834, 898, 1346, 324, 388, 836, 900, 1348),
invert = TRUE)
Calculate the normalized difference moisture index (NDMI) using the formula “(B05-B06)/(B05+B06)”. This index is used to assess vegetation water content.
Compute minimum, maximum, median, and mean NDMI values over time and plot the result.
Calculate the NDVI as in the tutorial, and apply a user defined reducer function to create a “greenest pixel” composit image, by finding the date/time of the maximum NDVI, and returning the corresponding RGB values.
Appelhans, Tim, Florian Detsch, Christoph Reudenbach, and Stefan Woellauer. 2019. Mapview: Interactive Viewing of Spatial Data in R. https://CRAN.R-project.org/package=mapview.
Bivand, Roger, Tim Keitt, and Barry Rowlingson. 2019. Rgdal: Bindings for the ’Geospatial’ Data Abstraction Library. https://CRAN.R-project.org/package=rgdal.
Hijmans, Robert J. 2019. Raster: Geographic Data Analysis and Modeling. https://CRAN.R-project.org/package=raster.
Pebesma, Edzer. 2019. Stars: Spatiotemporal Arrays, Raster and Vector Data Cubes. https://CRAN.R-project.org/package=stars.