Operating on Kubernetes

VS software is shipped as a set of Helm Charts, where each component has a meaningful default set of configuration values. These set values need to be created for each deployment of VS.

The important part of the initialization is the configuration. The values.yaml file is structured in YAML as detailed below. It can contain sections for each component, as well as global accessible by all individual components.

Full deployment configuration schema enabling strict validation of configuration against it will be released soon.

The following section contains just an extract of available keys and example values for the vs-deployment chart. To find out all possible configurations, please refer to the Helm configuration reference.

To go straight to creating your first earth observation data collection in VS, follow section Create a new collection step by step.

Global configurations

Values under global key contain mainly parameters that more than 1 component need to set up their behavior. Examples are: Database configuration, collections, product types and layers.

`database and django`

global:
  env:
    DJANGO_MAIL: office@eox.at
    DJANGO_PASSWORD: 7xtMd62&bY#I
    DJANGO_USER: vs_admin
    DB_NAME: vs_db
    DB_PORT: "5432"
    DB_PW:  Go-J_eOUvj2k
    DB_USER: vs_user

`collections`

In the collections section, the collections are set up and it is defined which products based on product_type will be inserted into them. The product_types must list types defined in the product_types section, coverage_types allowed for a product_type must be a subset of those configured for the whole collection. More information about main EOxServer models.

global:
  collections:
    COLLECTION:
      product_types:
        - PL00
      coverage_types:
        - int16_grayscale

`productTypes`

This section defines product_type related information. It is a list of possible product types where each entity defines filters to register new products into correct product_type matching against STAC Item properties (metadata), as well as which browses renderings will be generated and coverages configuring the mapping of different named STAC assets to coverage_types, defaultBrowse selects one of the existing browses and sets it as a default rendering. Collections key specifies the names of multiple collections that will a new product be ingested into.

masks registration is not yet fully implemented in View Server 2.

More information about main EOxServer models.

global:
  productTypes:
    - name: HRA_MS4_1C
      defaultBrowse: TRUE_COLOR
      filter:
        product_type: HRA_MS4_1C
      collections:
        - Deimos-HRA_MS4_1C
      coverages:
        RGBNir:
          assets:
            - ms
        Pan:
          assets:
            - pan
      masks:
        - name: validity
          validity: true
      browses:
        TRUE_COLOR:
          asset: browse # optional name of asset to register as Browse
          red:
            expression: red
            range: [0, 1000]
            nodata: 0
          green:
            expression: green
            range: [0, 1000]
            nodata: 0
          blue:
            expression: blue
            range: [0, 1000]
            nodata: 0
        FALSE_COLOR:
          red:
            expression: nir
            range: [0, 1800]
            nodata: 0
          green:
            expression: red
            range: [0, 1000]
            nodata: 0
          blue:
            expression: green
            range: [0, 1000]
            nodata: 0
        PAN:
          grey:
            expression: pan
            range: [0, 1600]
            nodata: 0
        NDVI:
          grey:
            expression: (nir-red)/(nir+red)
            range: [-1, 1]

`coverageTypes`

Allows to define a new coverage_type when not contained in the list of predefined ones. By default View Server contains the following coverage_types:

Sentinel 2 data - coverage_type named S2_RGBNir:
Other common coverage types

More information about main EOxServer models.

global:
  coverageTypes:
    - data_type: "Uint16"
      name: "BGR"
      bands:
        - definition: "http://www.opengis.net/def/property/OGC/0/Radiance"
          description: "Blue Channel"
          gdal_interpretation: "BlueBand"
          identifier: "blue"
          name: "blue"
          nil_values:
            - reason: "http://www.opengis.net/def/nil/OGC/0/unknown"
              value: 0
          uom: "W.m-2.Sr-1"
          significant_figures: 5
          allowed_value_ranges:
            -
              - 0
              - 65535
        - definition: "http://www.opengis.net/def/property/OGC/0/Radiance"
          description: "Red Channel"
          gdal_interpretation: "RedBand"
          identifier: "red"
          name: "red"
          nil_values:
            - reason: "http://www.opengis.net/def/nil/OGC/0/unknown"
              value: 0
          uom: "W.m-2.Sr-1"
          significant_figures: 5
          allowed_value_ranges:
            -
              - 0
              - 65535
        - definition: "http://www.opengis.net/def/property/OGC/0/Radiance"
          description: "Green Channel"
          gdal_interpretation: "GreenBand"
          identifier: "green"
          name: "green"
          nil_values:
            - reason: "http://www.opengis.net/def/nil/OGC/0/unknown"
              value: 0
          uom: "W.m-2.Sr-1"
          significant_figures: 5
          allowed_value_ranges:
            -
              - 0
              - 65535

`storage`

Here, the three relevant storages can be configured: the source, data and cache storages.

The source storage defines the locations from which the original files will be downloaded to be preprocessed. Preprocessed images and metadata will then be uploaded to the data storage, which is also used by registrar during registration. The cache service will cache images on the cache storage.

Each storage definition uses the same structure and can target various types of storages, such as OpenStack Swift, s3 or local.

These storage definitions will be used in the appropriate sections.

global:
  source:
    type: swift
    username:
    password:
    project_name:
    project_id:
    region_name:
    auth_url:
    user_domain_name:
    user_domain_id:
    project_domain_name:
    project_domain_id:
  data:
    public:
      type: swift
      ...
  cache:
    type: swift
    ...

`layers`

This section defines how the layers shall be cached and their configuration in the client.

There is a difference between the concept of parentLayers and subLayers.

If layer.parentLayer value is equal to layer.id, all of its properties and values are considered as a full layer for client configurations.

If layer.parentLayer and layer.id are not equal, a new cache tileset is created with the given id and grids definitions. In the client, such subLayer is represented only as a Display option of a parentLayer.

The subLayer definitions correspond to the defined browses from product_type values. Each WMTS subLayer tileset created in the cache references the WMS layer of a collection in the renderer with the same name. The subLayer.id should therefore be composed in the following manner: collection.name__browse.name. The two underscores is a default separator and a configurable value.

Full configuration schema of client - search for layers.

global:
  layers:
    - id: VHR_IMAGE_2018_Level_1
      title: VHR IMAGE 2018 Level 1
      displayColor: "#eb3700"
      parentLayer: VHR_IMAGE_2018_Level_1
      maxZoom: 18
      visible: false
      grids: &defaultGridOptions
        - name: WGS84
          zoom: 16
      search: &defaultSearch
        parameters:
          - type: "eo:cloudCover"
            title: "Cloud Coverage in percent"
            name: "Cloud Coverage"
            max: 100
            min: 0
            range: true
          - type: "geo:uid"
            title: "Product ID"
            privileged: true
  - id: VHR_IMAGE_2018_Level_1__TRUE_COLOR
    title: VHR Image 2018 Level 1 True color
    parentLayer: VHR_IMAGE_2018_Level_1
    grids: *defaultGridOptions
  - id: VHR_IMAGE_2018_Level_1__NDVI
    title: VHR Image 2018 Level 1 NDVI
    parentLayer: VHR_IMAGE_2018_Level_1
    style: earth
    grids: *defaultGridOptions

`overlayLayers`

This section defines overlayLayers definitions in client and cache. The following example configures a pre-seeded full coverage mosaic layer with limited European extent served as an overlay. Full configuration schema of client - search for overlayLayers.

global:
  overlayLayers:
    - id: VHR_IMAGE_2018_Level_3__outlines
      title: VHR Image 2018 Level_3 outlines
      description: "WMS rendering of Level 3 product footprints for current time range."
    - id: VHR_IMAGE_2018_Level_3__masked_validity__Full
      title: VHR Image 2018 Level 3 True Color with masked validity Full Coverage
      protocol: WMTS
      urls: baseUrlsWMTS
      synchronizeTime: false
      source: "VHR_IMAGE_2018_Level_3__masked_validity"
      description: "<p>Pre-seeded Full coverage mosaic layer of VHR_IMAGE_2018 Level 3 products with their validity masks applied to masked out the final True Color rendering. Products composing the rendered tiles were sorted by time, placing newest products on top.</p><p>This mosaic does not have any search or time dimension functionality enabled."
      grids: &defaultFullGridOptions
        - name: WGS84
          zoom: 16
          restricted_extent: "-24.7 27.5 45 71.3"

`ingress`

Global definition of Kubernetes ingress controller which many services can take their URL access patterns from. Optional.

global:
  ingress:
    enabled: true
    hosts:
      - host: collection.remoteurl.com
    tls:
      - hosts:
          - collection.remoteurl.com
        secretName: secret

Component specific

`preprocessor-v2`

Here, the preprocessing can be configured in detail. Example of a preprocessing configuration with defaults and a special configuration for a single product_type:

preprocessor-v2:
  config:
    type_extractor:
      xpath:
        - /gsc:report/gsc:opt_metadata/gml:metaDataProperty/gsc:EarthObservationMetaData/eop:productType/text()
        - /gsc:report/gsc:sar_metadata/gml:metaDataProperty/gsc:EarthObservationMetaData/eop:productType/text()
    level_extractor:
      xpath: ''
    metadata_glob: "*GSC*.xml"
    stac_output: true
    preprocessing:
      defaults:
        stac_item_structure:
          statistics:
            compute_statistics: true
            stats_approx: 2
          assets:
            pan: &cog_stac_asset
              description: 'Product image converted into a COG'
              title: 'Preprocessed image'
              media_type: 'image/tiff; application=geotiff; profile=cloud-optimized'
              roles:
                - data
              globs:
                - '*.tif'
            ms: *cog_stac_asset
            gsc_metadata: &gsc_metadata_stac_asset
              globs:
                - '*.xml'
              description: 'GSC metadata file from source archive'
              title: 'GSC Metadata file'
              media_type: 'application/xml'
              roles:
                - metadata
        move_files: true
        nested: true
        output:
          options: &default_output_options
            format: COG
            dstSRS: 'EPSG:4326'
            dstNodata: 0
            multithread: True
            warpMemoryLimit: 3000
            creationOptions:
              - BLOCKSIZE=512
              - COMPRESS=DEFLATE
              - NUM_THREADS=8
              - BIGTIFF=YES
              - OVERVIEWS=AUTO
              - PREDICTOR=YES
      types:
        SKY_CBU_3A:
          data_file_globs:
            - "*analytic_clip.tif"
            - "*analytic.tif"
            - "*panchromatic_clip.tif"
            - "*panchromatic.tif"
          output:
            group_by: "(.*)"
            options: *default_output_options
          stac_item_structure:
            statistics:
              compute_statistics: true
              stats_approx: 2
              force_histogram_min_value: 2
            assets:
              pan:
                <<: *cog_stac_asset
                globs:
                  - '*_panchromatic*'
              ms:
                <<: *cog_stac_asset
                globs:
                  - '*_analytic*'
              gsc_metadata: *gsc_metadata_stac_asset

`client`

This section contains other configurations to the client other than layer definitions. Those are referenced under the layers key. Full configuration schema of client.

client:
  config:
    eoxserverDownloadEnabled: true
    leftPanelTabIndex: 0
    timeDomain:
      - "2010-01-01T00:00:00Z"
      - today
    displayTimeDomain:
      - "2017-01-01T00:00:00Z"
      - "2019-12-31T23:59:59Z"
    selectedTimeDomain:
      - "2018-08-01T00:00:00Z"
      - "2018-08-31T23:59:59Z"
    maxZoom: 17
    displayInterval: P1096D

`registrar`

This section defines registrar-specific configurations, for setting up specific registration routes:

registrar:
  config:
    routes:
      collections:
        path: registrar.route.stac.CollectionRoute
        queue: register-collections
        backends:
          - path: registrar.backend.eoxserver.CollectionBackend
          - path: registrar_pycsw.backend.CollectionBackend

`harvester`

This section configures the harvester service, filtering capabilities or to which queue should it push the harvested results.

harvester:
  config:
    redis:
      host: redis # docker swarm only, otherwise do not override default
    harvesters:
      Deimos-HRA_MS4_1C:
        filter:
          eq:
            - property: "oads:product_type"
            - HRA_MS4_1C
        resource:
          type: OADS
          oads:
            url: https://tpm-ds.eo.esa.int/oads/meta/Kompsat2/index/
            use_oads_ext: true
        output: queue
        queue: register_queue
        postprocessors:
          - type: builtin
            process: static
            kwargs:
              values:
                properties:
                  collection: Deimos-HRA_MS4_1C

`preprocessor`

This section configures the preprocessor - Mapchete enabled preprocessor. Each config in configs targets a specific set of products based on metadata value collection.

preprocessor:
  replicaCount: 1
  limits:
    cpu: 2
    memory: 6Gi
  requests:
    cpu: 0.1
    memory: 1Gi
  config:
    filesystems:
      s3:
        type: s3
        s3:
          access_key_id: access
          secret_access_key: key
          region: eu

    processors:
      p1:
        type: local
        local:
          process: preprocessor.processes.local.browse_to_geotiff
    paths:
      output_path: s3://
    collections:
      SPOT6-7:
        filesystems:
          target: s3
        data:
          - input:
              type: http
              http:
                asset_map:
                  - key: band_1
                    band: browse
            output:
              path: output_path
              asset: data
              processors:
                - p1

Deploying using Helm

It is generally expected that a user will deploy the VS helm chart into the Kubernetes cluster. This can be done via the Flux Helm Operator, which takes care of the installation of helm charts as well as applying subsequent changes to the configuration automatically.

However, it is also possible to deploy manually using the helm command line tool directly:

helm install -f values.yaml vs chart-location

This will install VS with the configuration specified in values.yaml. To apply changes to values.yaml, the following command can be used:

helm upgrade -f values.yaml vs chart-location

Finally vs can also be uninstalled using the following command:

helm uninstall vs

Helm configuration reference

In this section variables for a helm deployment will be outlined starting with the main values file:

global:
  env:
  storage:
    data: {}
    source: {}
    cache:
      type: local
  collections: {}
  productTypes: []
  defaultLayer:
  layers: []
  overlayLayers: []
  coverageTypes: []
  metadata: {}
database: {}
redis: {}
client: {}
cache: {}
renderer: {}
registrar: {}
harvester: {}
scheduler: {}
seeder: {}
preprocessor: {}

Global Configuration

Environment variables - `env`

Environment variables noted in other sections are added to this object as key:value pairs e.g.

global:
  env:
    GDAL_PAM_ENABLED: "NO"

Any environment variable added in global.env will get passed to each service of VS.

Note

Following global.env variables are mandatory to be set this way for docker swarm deployment in the values.yaml. These variables have their default values set for k8s deployment exclusively.

Any set of values.yaml for docker swarm deployments should include the following values:

global:
  ingress:
    tls: false
  env:
    DB_HOST: "database"
    RENDERER_HOST: "database"

Storage configuration - `storage`

The storage section handles all data storage-related configuration

data: key, value mapping in form name:{config} for registrar and preprocessor services. There may exist multiple name:{config} mappings

for swift storage:

type: swift

username - service username

password - service password

project_name - name of project

project_id - id of project

region_name - name of region

auth_url - authentication url

auth_url_short - short version of auth_url

auth_version - authentication version, defaults to 3

user_domain_name - user domain name

streaming - if streaming version of /vsi file accessor is used

for s3 storage:

type: s3

bucket - name of S3 bucket

endpoint_url - url endpoint

access_key_id - access key identifier

secret_access_key - secret access key

public - default “false”

region_name - aws s3 region

validate_bucket_name - if bucket name should be validated, defaults to true

streaming - if streaming version of /vsi file accessor is used

for local storage:

type: local

root_directory - directory with data (must be accessible inside containers of services that access it)

for http storage:

type: http

endpoint_url - url endpoint

streaming - if streaming version of /vsi file accessor is used

source: optional data source for the preprocessor. Configuration parameters same as data
cache: configuration for the data source of the cache. Configuration parameters same as data. Can be type:local. In this case a local sqlite3 database is created.

Data Collections - `collections`

name:{config} pairs where the name of the collection is mapped to the product and coverage types

product_types - list of product types for the collection

coverage_types - list of coverage types for the collection

Product types - `productTypes`

List of product type objects with the following configs:

name - product type name

defaultBrowse - name of the default browse type

coverages - mapping of coverage names to assets

assets - list of assets

browses - mapping of browse types to definitions

collections - collections to which the product type belongs to

masks - masks to which the product type belongs to

Layers - `layers`

Full configuration schema of client - search for layers.

Overlay layers - `overlayLayers`

Full configuration schema of client - search for overlayLayers.

Coverage Types - `coverageTypes`

List of coverage types to add to the backend.

bands - list of band definitions

definition - ogc link to band definition

description - description of band

identifier - identifier of band

name - name of band

nil_values - list of NAN values

reason - ogc reason

value - what value is considered NAN

uom - unit of measure

wavelength - wavelength

data_type - type of data

name - name of the band

Service Metadata - `metadata`

Metadata values used by services.

title - title of the service

header - client header

abstract - abstract of the service

url - override service url - if not set, then announced links in Capabilities documents will depend on the used hostname of the request

keywords - list of keywords

accessConstraints - access constraints

fees - fees

contactName - name of contact person

contactPhone - phone of contact person

contactFacsimile - facsimile of contact person

contactOrganization - contact person organization

contactCity - city of contact person

contactStateOrProvince - state or province of contact person

contactPostcode - postcode of contact

contactCountry - country of contact

contactElectronicMailAddress - contact email

contactPosition - contact position

providerName - name of provider

providerUrl - url of provider

inspireProfile - inspire profile

inspireMetadataUrl - inspire metadata url

defaultLanguage - default language of service

language - language of service

Database configuration - `database`

Database configuration. See https://artifacthub.io/packages/helm/bitnami/postgresql for a comprehensive guide.

Redis configuration - `redis`

Redis configuration. See https://artifacthub.io/packages/helm/bitnami/redis for comprehensive configuration.

Common service configuration

Here is a list of common configurations across services.

replicaCount - number of pods to spawn

nameOverride - override the short name

fullNameOverride - override the full name

image - image mapping

repository - repository of image

pullPolicy - pull policy

tag - tag. If unset will default to latest

service - service mapping. Available only for forwarded services

type - type of network service

port - port to forward

resources - resource mapping

limits - resource limits

cpu

memory

requests - request resources

cpu

memory

affinity - affinity configuration

livenessProbe - liveness tests

ingress - ingress trigger

global - global settings

All non-global configuration relevant to the services is located in the config section for each service e.g.

cache:
  config:
    # cache configuration values go here

client:
  config:
    # client configuration values go here

Client configuration - `client`

Full configuration schema of client.

Cache configuration - `cache`

wmsEnabled - wms enable switch

wmtsEnabled - wmts enable switch

connectionTimeout - timeout in seconds for connection

timeout - timeout for upstream connection

expires - tile expiry in number of seconds

key - cache path scheme with keys

Renderer configuration - `renderer`

Currently accepts no additional custom configuration.

Registrar configuration - `registrar`

disableDefaultRoute - disables default route for eoxserver if true

eoxserverInstanceBasePath - the default backend instance path

eoxserverInstanceName - the default backend instance name

defaultQueue - the name of the queue that the registrar listens on - default “register”

defaultSuccessQueue - queue that the registrar sends successfully registered items to

defaultErrorQueue - queue that the registrar sends failed items to

defaultReplace - if set to true, replaces existing items during registration, default “true”

defaultBackends - list of backend definitions

defaultHandlers - list of handler definitions

routes - mapping of custom routes.

routes:
  <route-name>:
    path: <import-path>
    queue: <queue>
    backends:
      - path: <backend-import-path>
        kwargs: <backend-keyword-arguments>

Example configuration: https://gitlab.eox.at/vs/core/-/blob/main/registrar/config-sample.yaml

Seeder configuration - `seeder`

minzoom - minimum zoom from which to seed layers

maxzoom - maximum zoom to which to seed layers

collection_grids - dictionary of mappings collection:grids if only selected grids for a certain collection should to be seeded

Ingestor configuration - `ingestor`

Currently accepts no additional custom configuration.

Scaling

For Kubernetes deployments, advanced scaling configurations are available.

Renderer

The renderer uses a fixed number of 8 workers for each replica. By default, the replicas have the following resource settings:

Limits:
  cpu:     1500m
  memory:  6Gi
Requests:
  cpu:      500m
  memory:   512Mi

This can be customized by setting the following helm values:

renderer:
  resources:
    requests:
      cpu: 1
    [...]

Scaling

The default replica count is set to 1 which can be customized by this helm value:

renderer:
  replicaCount: 2

Alternatively, horizontal autoscaling is supported based on the CPU metric. If enabled, the default minimum and maximum value for the replicas are 1 and 3 respectively. It can be further customized using the following helm values:

renderer:
  hpa:
    enabled: true
    minReplicas: 1
    maxReplicas: 3

Note that the horizontal auto-scaler uses a target CPU utilization of 100%, which refers to 100% of the required CPU resources.

Service management

This subchapter documents k8s specific management steps.

Running commands in VS services

For administration, it can be necessary to run commands directly in one of the services that make up VS.

Most VS services correspond to a deployment in Kubernetes, so they can be accessed like this:

kubectl exec -it deployment/vs-preprocessor -- bash

However stateful components such as redis or postgres map to a statefulsets in Kubernetes and can be accessed using the following command:

kubectl exec -n demo-eocat-multiple -it statefulset/vs-redis-master -- bash

Note that the command given above launches a shell inside the container. If only one command needs to be run inside the services, the command can be directly given instead of bash.

Purge database

Database structure and models are created as a first step of deployment of View Server and is afterward not updated if the used values change.

Warning

WARNING: The following step deletes all added contents of the database - ALL products have to be then re-registered!

To clean the database to enable recreating it from scratch when values changed do:

kubectl exec -it deployment/vs-registrar -- bash -c 'python3 $INSTANCE_DIR/manage.py flushdb'
helm uninstall name
helm install name --values ... # triggers database structure recreate

For platform-agnostic management and operations steps, visit chapter Operations and management.

Or continue to the section Data Ingestion to see how data can be ingested to a VS.

Operating on Kubernetes

Global configurations

database and django

collections

productTypes

coverageTypes

storage

layers

overlayLayers

ingress

Component specific

preprocessor-v2

client

registrar

harvester

preprocessor

Deploying using Helm

Helm configuration reference

Global Configuration

Environment variables - env

Storage configuration - storage

Data Collections - collections

Product types - productTypes

Layers - layers

Overlay layers - overlayLayers

Coverage Types - coverageTypes

Service Metadata - metadata

Database configuration - database

Redis configuration - redis

Common service configuration

Client configuration - client

Cache configuration - cache

Renderer configuration - renderer

Registrar configuration - registrar

Harvester configuration - harvester

Scheduler configuration - scheduler

Preprocessor configuration - preprocessor

Preprocessor-v2 configuration - preprocessor-v2

Seeder configuration - seeder

Ingestor configuration - ingestor

Scaling

Renderer

Scaling

Service management

Running commands in VS services

Purge database

`database and django`

`collections`

`productTypes`

`coverageTypes`

`storage`

`layers`

`overlayLayers`

`ingress`

`preprocessor-v2`

`client`

`registrar`

`harvester`

`preprocessor`

Environment variables - `env`

Storage configuration - `storage`

Data Collections - `collections`

Product types - `productTypes`

Layers - `layers`

Overlay layers - `overlayLayers`

Coverage Types - `coverageTypes`

Service Metadata - `metadata`

Database configuration - `database`

Redis configuration - `redis`

Client configuration - `client`

Cache configuration - `cache`

Renderer configuration - `renderer`

Registrar configuration - `registrar`

Harvester configuration - `harvester`

Scheduler configuration - `scheduler`

Preprocessor configuration - `preprocessor`

Preprocessor-v2 configuration - `preprocessor-v2`

Seeder configuration - `seeder`

Ingestor configuration - `ingestor`