CONAE spring school 2022
Marius Appel
Sept. 27, 2022
Data availability (e.g. Sentinel-2) in the cloud
Method availability (e.g. in R, > 18k CRAN packages)
Who wants to download > 100 GB from data portals?
Objective: Show how you can analyze satellite image collections in the cloud with R
All materials are available on GitHub: https://github.com/appelmar/CONAE_2022.
Services:
Infrastructure providers:
Somewhere in between: Microsoft Planetary Computer
In this tutorial, we will use a custom machine on AWS to analyze satellite image collections in the cloud.
Select a region and machine instance type, based on costs, hardware, and OS
Create a key pair for accessing the machine over SSH
Click “Launch instance” and follow instructions
Connect via SSH and install software (PROJ, GDAL, R, RStudioServer1, R packages, …)
Notice that security considerations (e.g. by using IAM roles, multi-factor authorization) are NOT part of this tutorial.
Provider | Data |
---|---|
Amazon web services (AWS) | Sentinel, Landsat, ERA 5, OSM, CMIP 6, and more, see here |
Google Cloud Platform | Landsat, Sentinel, access to GEE data |
Microsoft Planetary Computer | Sentinel, Landsat, MODIS and more, see here |
EC2 machines have local storage (EBS) but big data archives use highly scalable object storage.
S3 elements:
Pricing (storage, transfer, requests):
Buckets:
Object:
How to find images by location, time, and other criteria?
How to efficiently read image data from S3 without copying images to our machine storage first?
Standardized JSON-based language for describing catalogs of spatiotemporal data (imagery, point clouds, SAR)
Extensible (available extensions include EO, Data Cubes, Point Clouds, and more)
1.0.0 release available since May 2021
Growing ecosystem
Static STAC catalogs
catalog.json
STAC API
STAC Index
Image file formats must be cloud-friendly to reduce transfer times and costs associated with transfer and requests
COG = Normal tiled GeoTIFF files whose content follows a specific order of data and metadata (see full spec here)
support compression
support efficient HTTP range requests, i.e. partial reading of images (blocks, and overviews) over cloud storage
may contain overview images (image pyramids)
GDAL can efficiently read and write COGs, and access object storage in the cloud with virtual file systems
Images spatially overlap, have different coordinate reference systems, have different pixel sizes depending on spectral bands, yield irregular time series for larger areas
Here: A four-dimensional (space, time, variable / band) regular raster data cube
Important: There is no single correct data cube!
Imagery from https://r-spatial.github.io/stars
Creation and processing of four-dimensional (space, time, variable) data cubes from irregular image collections (Appel and Pebesma 2019)
Parallel chunk-wise processing
Documentation available at https://gdalcubes.github.io/
Imagery from https://github.com/e-sensing/sits
This tutorial focuses on the packages rstac
and gdalcubes
.
Access to huge data archives
Flexibility: You can do whatever you can do on your local machine
Powerful machines available
Open source software only
Not free
GEE and others can be easier to use (some are free)
Your institution’s computing center might have more computing resources (for free)
Setup and familiarization needed
Depends on the existence of STAC-API services and imagery as COGs!
→ Which tools / platforms / environments are most efficient to use highly depends on factors like data volume, computational effort, data & method availability, effort needed to familiarization and reimplementation, and others.
Cloud-computing platforms contain lots of satellite data
Cloud storage differs from local storage
Technology and tools:
STAC (and STAC API!) for efficient and standardized search of spatiotemporal EO data
COGs allow efficiently reading parts of imagery, potentially on lower resolution
GDAL has everything for efficient data access on cloud storage
gdalcubes makes the creation and processing of data cubes from satellite image collections in R easier
Slides and notebooks:
https://github.com/appelmar/CONAE_2022
Contact: