Operator Setup¶

In order to work with the operator and manage your application stacks it is required to install the operator into your cluster, give it sufficient rights and connect it to the peripheral systems which shall be used. The operator itself is a Spring Boot based application that is provided as an OCI image. A Helm chart for the initial installation of the operator into your cluster is provided and can be used to perform the basic setup. More about the required steps can be found in the subsequent sections.

AWS-specific setup instructions

Note that the current version of the operator only supports AWS as an IaaS provider. Further providers haven't been implemented yet. For this reason also the following setup of the operator is tightly bound to AWS as well. Although it may run in different environments this hasnt been elaborated and tested yet. Further instructions will follow soon!

1 Provisioning Of Operator AWS Resources & Database¶

In order to deploy the operator to your cluster you'll have to provision some resources such as a database and some AWS resources.

1.1 The Operator Database¶

The operator needs a database for its internal management of the models and further configurations. Here are the requirements to be met:

Vendor: PostgreSQL
Versions: No special requirement (tested with versions 13+)
User & Schema: A normal schema with regular user rights is totally sufficient for the operator to run (see sample setup script below)

Operator Sample Database Setup

CREATE ROLE operator WITH
    LOGIN
    PASSWORD 'DB_PASSWORD' --(1)!
    NOSUPERUSER
    INHERIT
    NOCREATEDB
    NOCREATEROLE
    NOREPLICATION;

CREATE DATABASE pxf_operator WITH
    OWNER = operator
    ENCODING = 'UTF8'
    LC_COLLATE = 'en_US.UTF-8'
    LC_CTYPE = 'en_US.UTF-8'
    CONNECTION LIMIT = -1;

Replace this placeholder with a custom password for your database user. This password will later be put into a secret for the operator itself.

1.2 The AWS Resources¶

In AWS we must configure an IAM role that enables the operator to manage all resource types that are supported by the models. Since this is depending on the concreate cluster setup as well as personal taste the subsequent section on that topic describes just a possible setup which proofed stable for quite a while now.

Furthermore we also require some secrets in the secret store that is linked into the cluster. Again this is a matter of configuration and personal taste, so we just give an example setup.

1.2.1 The IAM Role¶

In the following the requirements to the IAM role are listed. Some custom policies as well as the trust relationship come with sample JSON data.

Required standard policies:

AmazonS3FullAccess
AmazonSNSFullAccess
AmazonSQSFullAccess
AWSCloudFormationFullAccess
AWSKeyManagementServicePowerUser
AWSLambda_FullAccess

Custom policies

IamRoleManagementParameterStoreAccess

    {
        "Statement": [
            {
                "Action": [
                    "iam:ListRoleTags",
                    "iam:UntagRole",
                    "iam:TagRole",
                    "iam:DeletePolicy",
                    "iam:CreateRole",
                    "iam:AttachRolePolicy",
                    "iam:PutRolePolicy",
                    "iam:PassRole",
                    "iam:DetachRolePolicy",
                    "iam:DeleteRolePolicy",
                    "iam:ListPolicyTags",
                    "iam:ListRolePolicies",
                    "iam:CreatePolicyVersion",
                    "iam:GetRole",
                    "iam:GetPolicy",
                    "iam:ListEntitiesForPolicy",
                    "iam:DeleteRole",
                    "iam:UpdateRoleDescription",
                    "iam:TagPolicy",
                    "iam:CreatePolicy",
                    "iam:ListPolicyVersions",
                    "iam:UntagPolicy",
                    "iam:UpdateRole",
                    "iam:GetRolePolicy",
                    "iam:DeletePolicyVersion"
                ],
                "Resource": "arn:aws:iam::ACCOUNT_ID:*", //(1)!
                "Effect": "Allow"
            }
        ]
    }

Replace ACCOUNT_ID with the id of your AWS account!

    {
        "Statement": [
            {
                "Action": [
                    "ssm:PutParameter",
                    "ssm:DeleteParameter",
                    "ssm:RemoveTagsFromResource",
                    "ssm:AddTagsToResource",
                    "ssm:ListTagsForResource",
                    "ssm:GetParametersByPath",
                    "ssm:GetParameters",
                    "ssm:GetParameter",
                    "ssm:DeleteParameters"
                ],
                "Resource": "arn:aws:ssm:REGION:ACCOUNT_ID:parameter/*", //(1)!
                "Effect": "Allow"
            },
            {
                "Action": [
                    "ssm:DescribeParameters"
                ],
                "Resource": "*",
                "Effect": "Allow"
            }
        ]
    }

Replace ACCOUNT_ID with the id of your AWS account and REGION by the target AWS region (e.g. eu-central-1)

2 Installation Of The Operator Helm Chart¶

The operator can be installed into the cluster using a Helm Chart which is available in this Github repository.

Our demo setup uses a Flux CD Helm Release in order to parameterize and install the chart. Furthermore, the usage of Flux makes it very easy to keep the operator application up to date. The GitOps approach requires some config changes in the respective Git repository only. Our proposed Flux-based demo setup involves some resources to be created.

2.1 Kubernetes Cluster Prerequisites¶

In order to install the operator Helm chart properly like with our demo setup the cluster should be setup to fulfill some simple prerequisites which are listed below.

2.1.1 External Secrets¶

In order to manage secrets properly and securely Kubernetes offers the concept of secrets. Although this can be used to store secrets inside the cluster it is often somehow unhandy since it possibly exposes secrets to cluster administrators or devops engineers. So we rather like to integrate some more common secrets management systems which might be more familiar to your company. This way we can separate secrets management from actual devops tasks. This is where External Secrets Operator (ESO) comes into game which integrates external secret management systems into the cluster.

Our demo installation contains the ESO and which is connected to the AWS Secrets Manager as well as the AWS Systems Manager's Parameter Store. The operator itself also requires external secrets managers in order to retrieve and manage secrets for the respective deployments. Currently, we only use the AWS Parameter Store implementation.

2.1.2 ExternalDNS¶

Although it is not required it is very handy when you want to fully automate your operator-based deployments without requiring manual work for DNS host name assignment to your deployments. Kubernetes ExternalDNS synchronizes exposed Kubernetes Services and Ingresses with DNS providers which means that it is able to create, modify and remove entries in your DNS provider based on the ingress configurations of your deployments.

Our demo installations use ExternalDNS with slightly different configurations. While the development cluster is configured to be able to fully manage DNS entries the externalDNS operator on the production cluster is set up to only upsert DNS entries which means that a deletion of DNS entries is forbidden in order to not destroy production setups by accident.

2.2 Required Kubernetes Cluster Resources¶

Kubernetes Cluster Role, Role Binding & Operator Service Account

Cluster RoleOperator Service AccountCluster Role Binding

RBAC Cluster roles contain rules that represent a set of permissions for specific resources. A cluster role itself is a non-namespaced resource.

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: pxf-operator-cr #(1)!
    rules: #(2)!
      - apiGroups:
          - helm.toolkit.fluxcd.io
          - kustomize.toolkit.fluxcd.io
          - source.toolkit.fluxcd.io
          - notification.toolkit.fluxcd.io
        resources:
          - '*'
        verbs:
          - create
          - update
          - patch
          - delete
          - get
          - list
          - watch
      - apiGroups:
          - apps
          - batch
        resources:
          - deployments
          - statefulsets
          - daemonsets
          - jobs
        verbs:
          - get
          - list
          - watch
          - create
          - update
          - patch
          - delete
      - apiGroups:
          - external-secrets.io
        resources:
          - '*'
        verbs:
          - get
          - list
          - watch
      - apiGroups:
          - ""
        resources:
          - namespaces
          - secrets
          - configmaps
        verbs:
          - create
          - delete
          - update
          - patch
          - get
          - list
          - watch
      - apiGroups:
          - apiextensions.k8s.io
        resources:
          - customresourcedefinitions
        verbs:
          - get
          - list
          - watch

Replace the cluster role name by any name that matches your naming scheme.
Further API groups and verbs might become necessary on further operator capabilities evolution!

A service account is a type of non-human account that, in Kubernetes, provides a distinct identity in a Kubernetes cluster.

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      annotations:
        eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/eksctl-... #(1)!
      name: pxf-operator #(2)!
      namespace: pxf-operator #(3)!

Replace the value by the ARN of your EKS cluster!
Replace the service account name by any name that matches your naming scheme.
Replace the namespace by the namespace into which you want to deploy the operator.

A cluster role binding grants the cluster-wide resource permissions defined in a cluster role.

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: pxf-operator-pxf-operator-pxf-operator-cr #(1)!
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: pxf-operator-cr #(2)!
    subjects:
      - kind: ServiceAccount #(3)!
        name: pxf-operator
        namespace: pxf-operator

Replace the name of the cluster role binding by any name that matches your naming scheme.
Refer to the name of the cluster role here!
Specify your created operator service account as a subject in order to connect the cluster role to this service account. This will assign the required cluster-wide permissions to the operator application.

Flux CD Kubernetes Resources

Flux State Git Repository SecretFlux State Git RepositoryFlux State Repository Kustomization

Declare a Kuberenetes secret which is used to grant access to our Flux state Git repository in which our Operator Helm Release is stored.

    apiVersion: v1
    kind: Secret
    type: Opaque
    metadata:
      name: pxf-flux-state-deploy-token #(1)!
      namespace: flux-system #(2)!
    data: #(3)!
      identity: LS0...
      identity.pub: ZWN...
      known_hosts: Z2l...

Replace the secret name by any name that matches your naming scheme.
This is a Flux CD specific resource so we place it in the flux-system namespace.
This is base 64 encoded data containing some ssh setup.

Our Flux CD Git Repository contains the Helm Release definition for the operator deployment.

    apiVersion: source.toolkit.fluxcd.io/v1
    kind: GitRepository
    metadata:
      name: pxf-flux-state #(1)!
      namespace: flux-system #(2)!
    spec:
      interval: 1m0s
      timeout: 60s
      url: ssh://git@github.com/vnrag/pxf-flux-state #(3)!
      ref:
        branch: cluster-preprod #(4)!
      secretRef:
        name: pxf-flux-state-deploy-token #(5)!

Replace the repository name by any name that matches your naming scheme.
This is a Flux CD specific resource so we place it in the flux-system namespace.
Please replace this URL with your specific git repository in which you are managing your deployment configuration.
Defines the branch to be checked out (might also refer to tags, ...)
Please refer to the secret containing the git access details.

A Flux CD Kustomization is a Custom Resource Definition and the counterpart of Kustomize’s kustomization.yaml config file of the repository we refer to.

    apiVersion: kustomize.toolkit.fluxcd.io/v1
    kind: Kustomization
    metadata:
      name: pxf-flux-state-read #(1)!
      namespace: flux-system #(2)!
    spec:
      force: false
      interval: 1m
      timeout: 3m0s
      wait: true
      prune: true
      suspend: false
      sourceRef: #(3)!
        kind: GitRepository
        name: pxf-flux-state

Replace the name of the kustomization by any name that matches your naming scheme.
This is a Flux CD specific resource so we place it in the flux-system namespace.
Refer to the Flux state Git repository that manages your operator Helm release.

2.3 The Helm Release Definition - A Git Repository¶

In order to deploy the operator Helm chart into our cluster Flux CD needs a Helm Release together with the operator Helm repository. In our sample setup this is managed in a Git repository which we call Flux state repository and is pulled automatically and regularly ba Flux in order to keep the deployed resources in sync with our intended deployment specification. The relevant part of our repository is the folder pxf-operator and consists of the following:

Flux State Repository Contents

HelmRepositorySecret.yamlHelmRepository.yamlHelmRelease.yamlkustomization.yamlkustomizeconfig.yamlvalues.yaml

Declare a Kuberenetes secret which is used to access the OCI repository that provides the operator Helm charts.

    apiVersion: v1
    kind: Secret
    type: kubernetes.io/dockerconfigjson
    metadata:
      name: operator-github-registry #(1)!
      namespace: flux-system #(2)!
    data:
      .dockerconfigjson: ey... #(3)!

Replace the secret name by any name that matches your naming scheme.
This is a Flux CD specific resource so we place it in the flux-system namespace.
This is a base 64 encoded string that contains the docker configuration which authorizes access to our registry. It looks like this: {"auths":{"ghcr.io":{"username":"vnr-private","password":"ghp_...","auth":"dm5y...hVag=="}}} and can be created using this sample command kubectl create secret docker-registry my-oci-secret --docker-server=REGISTRY_URL --docker-username=USERNAME --docker-password=PASSWORD --docker-email=EMAIL -n my-namespace

The OCI repository provides our operator Helm charts.

    apiVersion: source.toolkit.fluxcd.io/v1
    kind: OCIRepository
    metadata:
      name: pxf-operator #(1)!
      namespace: flux-system #(2)!
    spec:
      interval: 30s
      secretRef:
        name: operator-github-registry #(3)!
      type: oci
      url: oci://ghcr.io/vnrag/pxf-operator-helm #(4)!

Replace the repository name by any name that matches your naming scheme.
This is a Flux CD specific resource so we place it in the flux-system namespace.
Please refer to the secret we created prior!
This is the URL of the operator helm repository and shouldn't be changed.

The actual Helm Release definition which parameterizes the operator Helm chart for deployment.

    apiVersion: helm.toolkit.fluxcd.io/v2
    kind: HelmRelease
    metadata:
      name: pxf-operator #(1)!
      namespace: flux-system #(2)!
    spec: #(3)!
      interval: 30s
      targetNamespace: pxf-operator
      chart:
        spec:
          chart: pxf-operator
          version: '>=0.3.0 <1.0.0'
          interval: 30s
          reconcileStrategy: ChartVersion
          sourceRef:
            kind: HelmRepository
            name: pxf-operator
            namespace: flux-system
      valuesFrom: #(4)!
        - kind: ConfigMap
          name: pxf-operator-values

Replace the repository name by any name that matches your naming scheme.
This is a Flux CD specific resource so we place it in the flux-system namespace.
Please refer to the Helm Release docs for further information!
Defines that the actual configuration values for the operator deployment are taken from a config map

This Flux CD Kustomization of the repository is the counterpart of the kustomization that we already deployed into the cluster which defines the repository scan. This kustomization is deployment specific and instructs Kustomize which descriptors belong to the application and how they should be processed.

    apiVersion: kustomize.config.k8s.io/v1beta1
    kind: Kustomization
    namespace: flux-system #(1)!
    resources: #(2)!
      - HelmRepository.yaml
      - HelmRelease.yaml
      - HelmRepositorySecret.yaml
    configMapGenerator: #(3)!
      - name: pxf-operator-values
        files:
          - values.yaml=values.yaml
    configurations: #(4)!
      - kustomizeconfig.yaml

This is a Flux CD specific resource so we place it in the flux-system namespace.
Defines the resources that belong to the deployment.
Here we instruct Kustomize to turn the application config values into config maps.
Loads a special configuration for the processing of config values.

This config tells Kustomize that a specific field inside the HelmRelease should be treated as a reference to our ConfigMap.

    nameReference:
      - kind: ConfigMap
        version: v1
        fieldSpecs:
          - path: spec/valuesFrom/name
            kind: HelmRelease

This is the main deployment configuration file which carries the actual environment specific configuration values for the operator deployment. It is separated into several parts which are described shortly in the respective annotations.

    image:
      tag: 1.5.15 #(1)!

    securityContext:
      runAsUser: 10000 #(2)!

    envs: #(3)!
      - name: SPRING_PROFILES_ACTIVE
        value: "api,dev"
      - name: SPRING_SECURITY_OAUTH2_RESOURCESERVER_JWT_PUBLICKEYLOCATION
        value: "classpath:pxuser/preprod.pub"
      - name: DB_URL
        value: "jdbc:postgresql://fulfillmentv2.rds.cloud.internal:5432/pxf_operator"
      - name: DB_NAME
        value: "pxf_operator"
      - name: DB_SCHEMA_NAME
        value: "pxf_operator"
      - name: DB_USERNAME
        value: "default_operator"
      - name: PLUGINS_MULTITENANCY_TENANTS_0
        value: "PLX"
      - name: PLUGINS_PXUSER_JWT_TOKEN_URL
        value: "https://user.api.preprod.pl-x.cloud"
      - name: PLUGINS_PXUSER_REST_API_AUTH_CLIENTID
        value: "preprod-user-pxf"
      - name: PLUGINS_PXUSER_REST_CLIENTS_PERMISSIONS_URL
        value: "https://roles-permissions.api.preprod.pl-x.cloud"
      - name: OPERATOR_ADAPTERS_IAAS_OPERATORSECRETS_PROVIDER
        value: "AWS"
      - name: OPERATOR_ADAPTERS_IAAS_OPERATORSECRETS_STORE
        value: "SSM"
      - name: OPERATOR_ADAPTERS_IAAS_AWS_ENABLED
        value: "true"
      - name: OPERATOR_ADAPTERS_IAAS_AWS_OIDC_PROVIDER_URL
        value: "oidc.eks.eu-central-1.amazonaws.com/id/84CE1395A6BC1D213C51DC77F6FCA225"
      - name: OPERATOR_ADAPTERS_IAAS_AWS_CLOUDFORMATION_WATCHTIMEOUTMINUTES
        value: "15"
      - name: OPERATOR_SILO_UPDATE_WAITSECONDSBEFORERELOAD
        value: "15"
      - name: OPERATOR_ADAPTERS_K8S_FLUX_DEPLOYMENTTIMEOUTMINUTES
        value: "15"

    secrets: #(4)!
      - name: DB_PASSWORD
        key: /preprod/pxf-operator/DB_PASSWORD
      - name: PLUGINS_PXUSER_REST_API_AUTH_CLIENTSECRET
        key: /preprod/pxf-operator/PLUGINS_PXUSER_REST_API_AUTH_CLIENTSECRET

    serviceAccount: #(5)!
      name: pxf-operator

    ingress: #(6)!
      enabled: true
      className: traefik
      annotations:
        kubernetes.io/tls-acme: 'true'
        traefik.ingress.kubernetes.io/router.tls: 'true'
        traefik.ingress.kubernetes.io/router.entrypoints: websecure
      hosts:
        - host: operator.preprod.pxf.aws-vnr.de
          paths:
            - path: /
              pathType: Prefix
      tls:
        - secretName: operator-preprod-pxf-aws-vnr-de
          hosts:
            - operator.preprod.pxf.aws-vnr.de

The image tag version of the operator OCI image.
Keep this value as is! This is the id of the user running the application inside the container and is required to configure the security context properly.
Under envs you can specify environment variables that shall be passed to the operator via its Kubernetes config map. Note that you can pass every variable but only some are known and processible by the operator application. Please refer to the operator Helm Chart repository for further information.
Here you define mappings from environment variables to keys of your external secrets implementation. Since the helm chart is implemented to create an external secrets configuration your cluster will also require such a connector.
Here you can connect the operator deployment with the service account we deployed to the cluster up-front. This provides the deployment with some crucial permissions for managing cluster resources as well as external resources in your managed cloud provider accounts.
Use this section to configure the ingress implementation you want to use to make your installation accessible from outside the cluster.