Create a new collection step by step

The following tutorial will guide you through the process of creating and updating a set of configurations (and database) for an imaginary new dataset - layer. Wherever possible it links to other parts of the documentation for further reference.

During the tutorial, elements of the EOxServer Data Model are used and should be understood although by defining the values, direct interaction with the EOxServer database models are usually not required:

Examples of public configurations

The following public View Server configuration values examples can be used as a further reference:

Analyze the data

First, analyze the earth observation data that you will provide by View Server. Below you will find the different types of information necessary to create the configurations:

Data format

For ideal viewing performance of the data, images should be formatted as Cloud Optimized GeoTIFF (COG) or should at least have internal overviews and internal tiling. If the data fulfill any of those two points, proceed to the next point.

If internal overviews are not present even in the case of large EO Data files, rendering a 1x1 pixel image will cause the whole image file to be read, which will negatively impact the rendering performance.

An important attribute of the raster data is their data type (UInt16, Int32, and others). Although View Server will generally be able to read any data type that GDAL can read, having this information is necessary for further steps.

To convert the data to COG it is suggested to:

either manually use GDAL tools before ingesting the data to View Server (In this case the component preprocessor is not necessary)
or configure and use the Preprocessor configuration - preprocessor and reference the preprocessed data instead.

Metadata format

It is also important to check the format of metadata files (sidecar files) next to the raster data. View Server uses SpatioTemporal Asset Catalog (STAC) items internally as a metadata format, both for storage and for messaging between components.

In an ideal case, the STAC items describing the data and metadata are already generated and should be used.

Having STAC items generated is not a prerequisite for all data. View Server will understand some other metadata formats during ingestion. More on that later.

Data storage

The next step is to clarify where the raster data and metadata are stored.

This does not have to be on the same infrastructure where View Server is going to be deployed. Having the data closer to the deployment (at the same cloud provider for example) should significantly speed up data access.

View Server supports the following access and storage protocols (the fsspec library should enable further extensions):

s3
OpenStack Swift
local path
http

Bands and rendering

Depending on the type of data (optical, radar, other) and the number of bands in the raster data, different types of rendering can be configured. In this step, it should be clarified how many bands are there in the raster data and which wavelengths (or general type of information) each band has. The knowledge of band structure will influence possible definitions of types of rendering further on.

Groups of products

Products can or should be grouped or separated using a shared metadata property.

An example of ideal separation would be Processing levels:

Level 1 should not be visualized together in one layer with Level 3 products
SAR products: Single Look Complex (SLC) and Ground Range Detected (GRD) should be separated

Generate configurations

Let’s continue to create View Server configuration values based on the knowledge about the products that have been gathered.

To find out all possible configurations for any of the values configuration keys please refer to the Helm configuration reference.

As a foundation for a new set of values, the default vs-deployment config values can be used as an empty template to be filled.

Warning

Database structure and models are created as a first step of deployment of View Server and is afterward not updated if the used values change (there are no database migrations performed between deploys).

Therefore you have to create a consistent working configuration, it might be an iterative process involving deleting the persistent database storage between each redeploy of updated values if the changes involve database model changes. Refer to Purge database for the how-to.

Further steps in this cookbook will contain a note if the configuration is used in the database structure or not.

Coverage Types

Changes involve database structure: YES

The first concept to focus on during values creation is coverage_types. The objective of this step is to:

either map the raster data type and band order to the existing coverage_type definition
or alternatively, define a new coverage_type

The possible values and meaning of the coverage_type are described in Global.coverageTypes.

If there already is an existing coverage_type with the same type of bands, just in a different order, (near-infrared band of data is for example not a last band, as in RGBNir coverage_type, but first), then for the sake of clarity, it is always better to create a new coverage_type although it is not strictly necessary, as for the rendering step, the band order (which band corresponds to which RGB color) can be changed.

Note

Pay attention to the following keys when defining a new coverage_type:

coverageTypes[i].name - needed for collections definition
coverageTypes[i].bands[i].identifier - needed for browses definition

Product Types

Changes involve database structure:

YES for the global.productTypes key and all its values except for following: filter, coverages.

The second, even more, important step, is to create productType definitions. Each productType represents an EOxServer Product Type model and some of its links to other models:

BrowseType EOxServer model - specifies renderings (one to many) via browses key - refer to Browse Types for guidance on how to fill this key
Which data assets will map to which EOxServer Coverage Type model - coverages key. There can be multiple data assets named STAC Item entries for multiple coverages.
To which collections will the product from the product_type be added. One product can be added to multiple collections if the product_type is allowed for those collections. Refer to Collections.

The possible values and meaning of the product_type are described in Global.productTypes.

Browse Types

Changes involve database structure: YES

The third important step is to define browses (rendering) definitions for each productType. Each browses entry represents an EOxServer Browse Type model, therefore adding an available WMS Layer to the renderer service.

Multiple simple band expressions and pre-made functions can be used in the band.expression value. Full list of usable functions.

The band specifications inside the expression (red, pan, gray) need to match those defined in the selected coverage_type and correspond to the meaning of the raster data itself. The names of the color specification in browse_type name (red, green, blue, grey) are to be used as-is and reference the stretching into RGB (or grayscale) spectrum of the WMS output image.

If browse.asset key has a value with a name of a STAC asset, this asset will be used to as a Browse Model. This is a way to attempt to register an asset without a ‘data’ role. It is preferred for cases, when a viewing ready Browse has been already pregenerated rather than trying to fit it to a Coverage model. The Browse behaves slightly differently than Coverages - for example does not allow WCS to be used with it, but at the same time does not need exact georeferencing of image, just that the footprint is extracted correctly in the original STAC item.

Some examples of configured expressions are:

percentile rendering of 2-98% of precomputed histogram stretched to 1-256 with configured defaults if individual STAC Item does not have computed statistics contained in metadata. It also additionally masks our pixels in range 1-10 as extra no data.

TRUE_COLOR:
  red:
    expression: "interpolate(red, percentile(red, var('percmin', 2), 1), percentile(red, var('percmax', 98), 10), 1, 256, var('clip', True),[var('nodata_start',1),var('nodata_end',10)])"
    range:
      - 1
      - 256

pansharpening operation on the source RGBNir Pan coverages

TRUE_COLOR_PANSHARPEN:
  red:
    expression: pansharpen(pan, red, green, blue, nir)[0]
    range: [0, 1000]
    nodata: 0
  green:
    expression: pansharpen(pan, red, green, blue, nir)[1]
    range: [0, 1000]
    nodata: 0
  blue:
    expression: pansharpen(pan, red, green, blue, nir)[2]
    range: [0, 1000]
    nodata: 0

hillshade rendering of DEM height data in EPSG:4326 with some parameters of the formula specified as “rendering variables” - allowing the WMS client to specify values

hillshade:
  grey:
    expression: hillshade(gray, var('zfactor', 5), 111120, var('azimuth', 315), var('altitude', 45), var('alg', 'Horn'))
    range: [0, 255]
    nodata: 0

Default unnamed browse type with 0-255 color range on 4 bands mapped to STAC Item Asset with name browse.

"":
  asset: browse

Collections

Changes involve database structure: YES

The fourth step is to define all collections grouping the Products. For each collection, it is necessary to add their allowed product_types and coverage_types.

Example configuration for creating three collections: Level_1, Level_3 and a shared one:

collections:
  VHR_IMAGE_2018:
    product_types:
      - DOV_MS_L1A
      - DOV_MS_L3A
    coverage_types:
      - RGBNir
  VHR_IMAGE_2018_Level_1:
    product_types:
      - DOV_MS_L1A
    coverage_types:
      - RGBNir
  VHR_IMAGE_2018_Level_3:
    product_types:
      - DOV_MS_L3A
    coverage_types:
      - RGBNir

The part of productType values corresponding to the above added collections key could be for example:

productTypes:
  - name: DOV_MS_L1A
    collections:
      - VHR_IMAGE_2018
      - VHR_IMAGE_2018_Level_1
  - name: DOV_MS_L3A
    collections:
      - VHR_IMAGE_2018
      - VHR_IMAGE_2018_Level_3

The possible values and meaning of the collections are described in Global.collections.

Displaying data

Changes involve database structure: NO

The fifth step influences how the layers can be displayed via the client service and which tilesets will be exposed by the cache service.

The possible values and meaning of the layers and overlayLayers are described in Global.layers and Global.overlayLayers.

External access

Changes involve database structure: NO

The sixth step is to define external access to View Server components. If the values are going to be deployed on Kubernetes, it is possible to use View Server’s ingress configuration - refer to ingress.

If there is already an external setup configured in the system (external ingress, traefik, etc.), the View Server ingress configurations should be completely disabled by:

ingress:
  tls: false

How to get the data in

Changes involve database structure: NO

Storage

The seventh step in the workflow is to see where the data are located for the View Server to correctly reference them and ingest them to get the information about Product data and metadata into the database.

Possible values and meaning of the storage are described in storage.

For successful ingestion, at least the data key (location of data) needs to be filled according to the used protocol to access the data on the storage.

There are currently three ways how to ingest data into View Server and they might require further configuration.

Optionally preprocessor can be used to convert data format beforehand. Refer to Preprocessor configuration schema.

Local storage

Warning

If the files to be ingested are on a local storage, the storage folder(s) need to be mounted into the containers of services, which need access to them. For direct registration without preprocessing, the services would be registrar and renderer.

The mounting needs to be configured on the level of helm release or docker-compose templates. Each node (master or worker) which will possibly host that service needs to have access to the data folder as well.

Example docker compose configuration mounting a folder /data/test1` into renderer container path /data` is following

renderer:
  volumes:
    - type: bind
      source: /data/test1
      target: /data

Global data storage configuration in values.yaml for using this folder would look like:

global:
  storage:
    data:
      directory-data:
        type: "directory"
        root_directory: "/data/"

Ingestion

Direct ingestion of STAC Item JSON strings to redis register_queue.

This process is suitable if the STAC items of Products already exist and for one-off ingestion campaigns - collections that do not require any regular updates or additions.

No special configuration except for storage.data key is necessary.

Using the harvester service for a pulling approach

If you configured the harvester, it will harvest new or updated data from various endpoints and protocols and convert the metadata and data to STAC internally and then push it to other components (preprocessor, registrar).

Harvester-specific configuration is required. Refer to Harvester configuration schema.

Ingestor for legacy Browse Reports - pushing approach

Ingestor-specific configuration is required. Refer to Ingestor configuration schema.

Optionally refer to Data Ingestion chapter for more information.

Global env

The last important step is to modify the global.env key which lists all environment variables and their values that all services have access to.

It specifies database and Django passwords, which should be changed as well.

Refer to Global configurations for more information.

Individual service configurations

Additionally, most View Server services are configurable using their keys in the values. Refer to Individual service configurations for more information.