.. _creating_new_collections:

Create a new collection step by step
==================================

The following tutorial will guide you through the process of creating and updating a set of 
configurations (and database) for an imaginary new dataset - layer.
Wherever possible it links to other parts of the documentation for further reference.

During the tutorial, elements of the
`EOxServer Data Model <https://docs.eoxserver.org/en/latest/users/coverages.html#data-model>`_
are used and should be understood although by defining the values, direct interaction with the 
EOxServer database models are usually not required:

- `Collection <https://docs.eoxserver.org/en/latest/users/coverages.html#collection>`_ + `CollectionType <https://docs.eoxserver.org/en/latest/users/coverages.html#collection-type>`_
- `Product <https://docs.eoxserver.org/en/latest/users/coverages.html#product>`_ + `ProductType <https://docs.eoxserver.org/en/latest/users/coverages.html#product-type>`_
- `Coverage <https://docs.eoxserver.org/en/latest/users/coverages.html#coverage>`_ + `CoverageType <https://docs.eoxserver.org/en/latest/users/coverages.html#coverage-type>`_
- `Browse <https://docs.eoxserver.org/en/latest/users/coverages.html#browse>`_ + `BrowseType <https://docs.eoxserver.org/en/latest/users/coverages.html#browse-type>`_ 
- `Storage <https://docs.eoxserver.org/en/latest/users/backends.html#storage>`_ + `StorageAuth <https://docs.eoxserver.org/en/latest/users/backends.html#storage-auth>`_

Examples of public configurations
---------------------------------

The following public View Server configuration values examples can be used as a further reference:

- `EOEPCA Demo Helm Chart <https://github.com/EOEPCA/eoepca/blob/demo/system/clusters/creodias/resource-management/hr-data-access.yaml#L25>`_
- `VS testing values <https://gitlab.eox.at/vs/vs-deployment/-/blob/main/testing/values-testing.yaml>`_

Analyze the data 
------------------

First, analyze the earth observation data that you will provide by View Server. Below you will find the different types of information necessary to create the configurations:

Data format
~~~~~~~~~~~

For ideal viewing performance of the data, images should be formatted as
`Cloud Optimized GeoTIFF (COG) <https://www.cogeo.org/>`_ or should at least have **internal overviews** and **internal tiling**.
If the data fulfill any of those two points, proceed to the next point.

If internal overviews are not present even in the case of large EO Data files,
rendering a 1x1 pixel image will cause the whole image file to be read, which 
will negatively impact the rendering performance.

An important attribute of the raster data is their **data type** (UInt16, Int32, and others).
Although View Server will generally be able to read any data type that GDAL can read, having this information is necessary for further steps.

To convert the data to COG it is suggested to:

- either manually `use GDAL tools <https://www.cogeo.org/developers-guide.html>`_ before ingesting the data to View Server (In this case the component preprocessor is not necessary)
- or configure and use the :ref:`preprocessor_configuration` and reference the preprocessed data instead.

Metadata format
~~~~~~~~~~~~~~~

It is also important to check the format of metadata files (sidecar files) next to the raster data.
View Server uses `SpatioTemporal Asset Catalog <https://stacspec.org>`_ (STAC) items internally as a metadata format, both for storage and for messaging between components.

In an ideal case, the STAC items describing the data and metadata are already generated and should be used.

Having STAC items generated is not a prerequisite for all data. View Server will understand some other metadata formats during ingestion. More on that later.

Data storage
~~~~~~~~~~~~

The next step is to clarify where the raster data and metadata are stored.

This does not have to be on the same infrastructure where View Server is going to be deployed. Having the data closer to the deployment (at the same cloud provider for example) should significantly speed up data access.

View Server supports the following access and storage protocols (the `fsspec` library should enable further extensions):

- s3
- OpenStack Swift
- local path
- http

Bands and rendering
~~~~~~~~~~~~~~~~~~~

Depending on the type of data (optical, radar, other) and the number of bands in the raster data, different types of rendering can be configured. In this step, it should be clarified how many bands are there in the raster data and which wavelengths (or general type of information) each band has. The knowledge of band structure will influence possible definitions of types of rendering further on.

Groups of products
~~~~~~~~~~~~~~~~~~

Products can or should be grouped or separated using a shared metadata property.

An example of ideal separation would be Processing levels:

- Level 1 should not be visualized together in one layer with Level 3 products
- SAR products: Single Look Complex (SLC) and Ground Range Detected (GRD) should be separated

Generate configurations
-----------------------

Let's continue to create View Server configuration values based on the 
knowledge about the products that have been gathered.

To find out all possible configurations for any of the values configuration keys please refer to the :ref:`helmvalues`.

As a foundation for a new set of values, the `default vs-deployment config values <https://gitlab.eox.at/vs/vs-deployment/-/blob/main/values.yaml>`_ can be used as an empty template to be filled.

.. warning::

  Database structure and models are created as a **first step** of deployment of View Server and
  is afterward **not** updated if the used values change (there are no database migrations performed between deploys).
  
  Therefore you have to create a consistent working configuration, it might be an iterative process involving deleting
  the persistent database storage between each redeploy of updated values
  if the changes involve database model changes. Refer to :ref:`Purge database <purge-db-k8s>` for the how-to.

  Further steps in this cookbook will contain a note if the configuration is used in the database structure or not.

Coverage Types
~~~~~~~~~~~~~~~~~~

Changes involve database structure: YES

The first concept to focus on during values creation is ``coverage_types``. The objective of this step is to:

- either map the raster data type and band order to the existing ``coverage_type`` definition
- or alternatively, define a new ``coverage_type`` 

The possible values and meaning of the ``coverage_type`` are described in :ref:`Global.coverageTypes <global_coverage_types>`.

If there already is an existing ``coverage_type`` with the same type of bands, just in a different order, (near-infrared band of data is for example not a last band, as in ``RGBNir coverage_type``, but first), then for the sake of clarity, it is always better to create a new ``coverage_type`` although it is not strictly necessary, as for the rendering step, the band order (which band corresponds to which RGB color) can be changed.

.. note::

  Pay attention to the following keys when defining a new ``coverage_type``:

  - coverageTypes[i].name - needed for collections definition
  - coverageTypes[i].bands[i].identifier - needed for browses definition

Product Types
~~~~~~~~~~~~~~~~~~

Changes involve database structure:

- YES for the ``global.productTypes`` key and all its values except for following: ``filter``, ``coverages``.

The second, even more, important step, is to create ``productType`` definitions. Each ``productType`` represents an EOxServer ``Product Type`` model and some of its links to other models:

- ``BrowseType`` EOxServer model - specifies renderings (one to many) via ``browses`` key - refer to :ref:`browse-types-cookbook` for guidance on how to fill this key
- Which ``data assets`` will map to which EOxServer ``Coverage Type`` model - ``coverages`` key. There can be multiple ``data assets`` named ``STAC Item entries`` for multiple ``coverages``.
- To which ``collections`` will the product from the ``product_type`` be added. One product can be added to multiple collections if the ``product_type`` is allowed for those collections. Refer to :ref:`collections-cookbook`.

The possible values and meaning of the ``product_type`` are described in :ref:`Global.productTypes <global_product_types>`.

.. _browse-types-cookbook:

Browse Types
~~~~~~~~~~~~

Changes involve database structure: YES

The third important step is to define ``browses`` (rendering) definitions for each ``productType``. Each ``browses`` entry represents an EOxServer ``Browse Type`` model, therefore adding an available `WMS Layer` to the renderer service.

Multiple simple band expressions and pre-made functions can be used in the ``band.expression`` value. `Full list of usable functions <https://github.com/EOxServer/eoxserver/blob/564408f27bf855d6c4cd214e82373eb465591ca2/eoxserver/render/browse/functions.py>`_.

The band specifications inside the ``expression`` (red, pan, gray) need to match those defined in the selected ``coverage_type`` and correspond to the meaning of the raster data itself.
The names of the color specification in browse_type name (red, green, blue, grey) are to be used as-is and reference the stretching into RGB (or grayscale) spectrum of the WMS output image.

If `browse.asset` key has a value with a name of a STAC asset, this asset will be used to as a `Browse` Model. 
This is a way to attempt to register an asset without a `'data' role`. It is preferred for cases, when a viewing ready Browse has been already pregenerated rather than trying to fit it to a `Coverage` model. The Browse behaves slightly differently than Coverages - for example does not allow WCS to be used with it, but at the same time does not need exact georeferencing of image, just that the footprint is extracted correctly in the original STAC item.

Some examples of configured expressions are:

1) percentile rendering of 2-98% of precomputed histogram stretched to 1-256 with configured defaults if individual STAC Item does not have computed statistics contained in metadata. It also additionally masks our pixels in range 1-10 as extra no data.

.. code-block:: yaml

  TRUE_COLOR:
    red:
      expression: "interpolate(red, percentile(red, var('percmin', 2), 1), percentile(red, var('percmax', 98), 10), 1, 256, var('clip', True),[var('nodata_start',1),var('nodata_end',10)])"
      range:
        - 1
        - 256

2) pansharpening operation on the source RGBNir Pan coverages

.. code-block:: yaml
  
  TRUE_COLOR_PANSHARPEN:
    red:
      expression: pansharpen(pan, red, green, blue, nir)[0]
      range: [0, 1000]
      nodata: 0
    green:
      expression: pansharpen(pan, red, green, blue, nir)[1]
      range: [0, 1000]
      nodata: 0
    blue:
      expression: pansharpen(pan, red, green, blue, nir)[2]
      range: [0, 1000]
      nodata: 0

3) hillshade rendering of DEM height data in EPSG:4326 with some parameters of the formula specified as "rendering variables" - allowing the WMS client to specify values

.. code-block:: yaml
  
  hillshade:
    grey:
      expression: hillshade(gray, var('zfactor', 5), 111120, var('azimuth', 315), var('altitude', 45), var('alg', 'Horn'))
      range: [0, 255]
      nodata: 0

4) Default unnamed browse type with 0-255 color range on 4 bands mapped to STAC Item Asset with name `browse`.

.. code-block:: yaml
  
  "":
    asset: browse

.. _collections-cookbook:

Collections
~~~~~~~~~~~

Changes involve database structure: YES

The fourth step is to define all ``collections`` grouping the Products. For each collection, it is necessary to add their allowed ``product_types`` and ``coverage_types``.

Example configuration for creating three ``collections``: Level_1, Level_3 and a shared one:

.. code-block:: yaml

  collections:
    VHR_IMAGE_2018:
      product_types:
        - DOV_MS_L1A
        - DOV_MS_L3A
      coverage_types:
        - RGBNir
    VHR_IMAGE_2018_Level_1:
      product_types:
        - DOV_MS_L1A
      coverage_types:
        - RGBNir
    VHR_IMAGE_2018_Level_3:
      product_types:
        - DOV_MS_L3A
      coverage_types:
        - RGBNir

The part of ``productType`` values corresponding to the above added ``collections`` key could be for example:

.. code-block:: yaml
  
  productTypes:
    - name: DOV_MS_L1A
      collections:
        - VHR_IMAGE_2018
        - VHR_IMAGE_2018_Level_1
    - name: DOV_MS_L3A
      collections:
        - VHR_IMAGE_2018
        - VHR_IMAGE_2018_Level_3

The possible values and meaning of the ``collections`` are described in :ref:`Global.collections <global_collections>`.

Displaying data
~~~~~~~~~~~~~~~~~~

Changes involve database structure: NO

The fifth step influences how the layers can be displayed via the ``client`` service and which tilesets will be exposed by the ``cache`` service.

The possible values and meaning of the ``layers`` and ``overlayLayers`` are described in :ref:`Global.layers <global_layers>` and :ref:`Global.overlayLayers <global_overlayLayers>`.

External access
~~~~~~~~~~~~~~~~

Changes involve database structure: NO

The sixth step is to define external access to View Server components. If the values are going to be deployed on Kubernetes, it is possible to use View Server's ingress configuration - refer to :ref:`global_ingress`.

If there is already an external setup configured in the system (external ingress, traefik, etc.), the View Server ingress configurations should be completely disabled by:

.. code-block:: yaml

  ingress:
    tls: false

How to get the data in
~~~~~~~~~~~~~~~~~~~~~~

Changes involve database structure: NO

Storage
^^^^^^^^^^

The seventh step in the workflow is to see where the data are located for the View Server to correctly reference them and ingest them to get the information about Product data and metadata into the database.

Possible values and meaning of the ``storage`` are described in :ref:`global_storage`.

For successful ingestion, at least the ``data`` key (location of data) needs to be filled according to the used protocol to access the data on the storage.

There are currently three ways how to ingest data into View Server and they might require further configuration.

Optionally ``preprocessor`` can be used to convert data format beforehand. Refer to :ref:`Preprocessor configuration  schema <preprocessor_configuration>`.

Local storage
^^^^^^^^^^^^^^^

.. warning::
  If the files to be ingested are on a local storage, the storage folder(s) need to be mounted into the containers of services, which need access to them. For direct registration without preprocessing, the services would be registrar and renderer.
  
  The mounting needs to be configured on the level of ``helm release`` or ``docker-compose templates``. Each node (master or worker) which will possibly host that service needs to have access to the data folder as well.

Example docker compose configuration mounting a folder `/data/test1`` into renderer container path `/data`` is following

.. code-block:: yaml

  renderer:
    volumes:
      - type: bind
        source: /data/test1
        target: /data

Global data storage configuration in ``values.yaml`` for using this folder would look like:

.. code-block:: yaml

  global:
    storage:
      data:
        directory-data:
          type: "directory"
          root_directory: "/data/"


Ingestion
^^^^^^^^^^^^^^

1) Direct ingestion of STAC Item JSON strings to ``redis register_queue``.

This process is suitable if the STAC items of Products already exist and for one-off ingestion campaigns - collections that do not require any regular updates or additions.

No special configuration except for ``storage.data`` key is necessary.

2) Using the harvester service for a *pulling* approach

If you configured the harvester, it will harvest new or updated data from various endpoints and protocols and convert the metadata and data to STAC internally and then push it to other components (preprocessor, registrar).

Harvester-specific configuration is required. Refer to :ref:`Harvester configuration schema <harvester_configuration>`.

3) Ingestor for legacy Browse Reports - *pushing* approach

Ingestor-specific configuration is required. Refer to :ref:`Ingestor configuration schema <ingestor_configuration>`.

Optionally refer to :ref:`ingestion` chapter for more information.

Global env
~~~~~~~~~~~

The last important step is to modify the ``global.env`` key which lists all environment variables and their values that all services have access to.

It specifies database and Django passwords, which should be changed as well.

Refer to :ref:`global_env` for more information.

Individual service configurations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Additionally, most View Server services are configurable using their keys in the values. Refer to :ref:`Individual service configurations <global_component_specific>` for more information.