How-to

Ansible environment

There are 2 different environments used to run Ansible:

  • AWX from which all Ansible playbooks are run

  • molecule to run tests locally or in gitlab-ci

The Ansible reference version is the one that comes with AWX. AWX 14.0.0 comes for example with Ansible 2.9.11.

Important

Both environments should stay synchronised. When udating AWX, the molecule environment shall be updated as well. The reverse is also true. When adding an external dependency to the molecule environment, it shall be added to AWX if not already present.

AWX environment

The Ansible environment used by AWX is a Python virtual environment part of the awx docker image. It’s installed under /var/lib/awx/venv/ansible. To add extra requirements to this environment, update the awx Dockerfile. Tag the repository to release a new image. See AWX deployement.

Molecule environment

The environment used for molecule tests is a conda environment defined in the conda-environments repository. To update this environment, update the molecule_env.yml file. Tagging this repository will automatically update all gitlab-runners. Developers have to update manually their local environment using that file.

EPICS Archiver Appliance

Updating the Archiver Appliance

Updating tomcat

Tomcat is installed manually because installing via yum will pull java. We want to manage java via another Ansible role.

  • Download the new version of tomcat from https://tomcat.apache.org: apache-tomcat-<version>.tar.gz

  • Upload it to Artifactory under https://artifactory.esss.lu.se/artifactory/swi-pkg/apache/tomcat

  • Update the epicsarchiverap_tomcat_version variable under vars/main.yml in the ics-ans-role-epicsarchiverap role

  • Sync the following tomcat configuration templates in the role with the files from the archive (under the conf directory):

    • context.xml.j2

    • server.xml.j2

    • tomcat-users.xml.j2

    • web.xml.j2

  • Check the list of variables in the bin/catalina.sh script from the archive. Update the variables defined in the setenv.sh.j2 template from the role if required.

EPICS Controls VLAN

When a new Controls VLAN is created (to host IOCs), there are several things to update:

  • Archiver Appliance: the VLAN broadcast address shall be added to one archiver (epicsarchiverap_epics_ca_addr_list variable)

  • Alarm Server: the VLAN broadcast address shall be added to the alarm server (epics_ca_addr_list variable)

  • Channel Finder: a new RecCeiver shall be deployed on the VLAN to send data to the Channel Finder. See the recceiver group.

  • EPICS Gateways: configure existing gateways to access this network. Create a new one dedicated to this network if required.

  • LCR workstations (TN only): add the VLAN broadcast address to the epics_ca_addr_list variable in the local_control_room group or one of its subgroup

EPICS Gateways

Channel Access Gateway

To deploy a new EPICS CA Gateway, the following variables are required:

  • epics_cagateway_client_broadcast

  • epics_cagateway_write: set to false for read-only GW, all for read-write

See ics-ans-role-epics-cagateway for more information.

For ro-epics-gw-tn and rw-epics-gw-tn, epics_cagateway_client_broadcast is set to epics_addr_list_broadcast_tn which is defined in the all group.

PV Access Gateway

To deploy a new EPICS PVA Gateway, the following variables are required:

  • pva_gateway_clients

  • pva_gateway_readonly: set to true or false

  • pva_gateway_servers

  • pva_gateway_static_routes: list of static routes to access the clients (if required)

See ics-ans-role-pva-gateway for more information.

For ro-epics-gw-tn and rw-epics-gw-tn, the variables epics_pva_gateway_clients_tn, epics_pva_gateway_server_clients_tn and epics_pva_gateway_static_routes_ieg are defined in the all group.

ESS Notify

ESS Notify is an application to send notifications to mobile phones. It includes a server that acts as a proxy and a mobile client.

ESS Notify client

The iOS client source can be found here: https://gitlab.esss.lu.se/ics-software/ess-notify

ESS Notify server

ess-notify-server is Python web application built with FastAPI. The API is exposed via Swagger UI: https://notify.esss.lu.se/docs

The application doesn’t store any password. Users are authenticated using LDAP. On successful login, a JWT is created and shall be sent in the Authorization header by any further requests. The token has an expire time set to 30 days by default. The value can be changed using the ACCESS_TOKEN_EXPIRE_MINUTES variable defined in the ess_notify_servers group.

Administrators define a list of services using the API (POST /api/v1/services/). Each service includes:

  • id: UUID (Universally unique identifier)

  • category: name of the service

  • color: color of the service used to display notifications in the client

  • owner: responsible for the service

Users subscribe to the desired service from the client application.

A new notification can be sent using the POST /api/v1/services/{service_id}/notifications endpoint. It can be done by any application (logbook, alarm server). This doesn’t require any authentication (only the service id is needed), but a check is performed on the IP address. The request shall come from an IP part of the ALLOWED_NETWORKS variable defined in the ess_notify_servers group.

A notification includes:

  • title: title of the notification

  • subtitle: extra text

  • url: optional url to redirect to in the client

When a notification is received by the server, a push is sent to all users who subscribed to the service.

Note that rate limiting is enabled to prevent a flood of notifications to be sent. This is achieved using traefik. Default values are configured in the role. See https://doc.traefik.io/traefik/v1.6/configuration/commons/#rate-limiting and https://doc.traefik.io/traefik/v1.6/configuration/backends/docker/#on-containers for more information.

ess-pynotify

ess-pynotify is a Python library to send notifications via the ESS Notify Server. It can be used both as a library and as a cli tool. Refer to the README.md.

Sending a notification only requires a POST to the /api/v1/services/{service_id}/notifications endpoint. It can easily be done using curl. This library is only provided as a convenience and for easier integration.

Hashicorp vault [WIP]

Attention

Still a WIP due to the procedures and standards not being completely decided yet.

Hashicorp vault secures, stores, and tightly controls access to tokens, passwords, certificates, API keys.

You can find our instance here hashicorp-vault.tn.esss.lu.se:8200

How to move CSEntry variables to Gitlab

  1. Move variables to gitlab

  2. Delete/clean variables from csentry

  3. Test the migrations

Move variables to Gitlab

The ansible group would usually have a playbook and it is in that playbook we would be migrating our variables to.

The corresponding variables found in csentry ansible groups and host variables can be directly copied to group_vars/group_name.yml and respective host_vars/host_name.yml.

If they don’t exist; we should create them. See the folder structure down below for an example:

├── group_vars
│   └── group_name.yml
├── host_vars
│   ├── host_name1.yml
│   └── host_name2.yml
.
.
.
.
└── molecule
    └── default
        └── molecule.yml

The variable precedence from highest to lowest in our workflow:

  • extra_vars in awx

  • host_vars in playbook

  • csentry host_vars

  • group_vars in playbook

  • csentry group_vars

  • vars in default/main.yml

Caution

Please check all hosts in csentry and also the playbook itself for legacy variables that needs to be moved.

Don’t forget that we also have technicalnetwork,labnetworks and all. Which also may contain related variables.

There might be several other parent/children groups related to the ansible group variables we need to migrate.

Delete CSentry variables

After all variables have been migrated please delete the variable entries in csentry.

Test the migration

The next step should be testing the migrated variables. We should test it with Molecule and after that also run it against the hosts. Preferably in AWX with check mode; which do not apply any changes to the host and will only show you if there are any changes. If there are no changes we can then run it in run mode.

If you are using hashicorp lookup for vaulted secret variables you will need to change the token for the job template. See This section

Using hashicorp vaulted secrets

Hint

For variables that are vaulted and needs to be moved to hashicorp vault please see this section for how to decrypt them.

Here is an example for how to use a hashicorp vaulted variable will look like:

test_postgres_user: "{{ lookup('hashi_vault', 'secret=secret/data/tn/test-app/postgres:data')['user'] }}"
test_postgres_password: "{{ lookup('hashi_vault', 'secret=secret/data/tn/test-app/postgress:data')['password'] }}"

These can be added to group_vars and host_vars but might cause an issue while testing it with molecule. Thus we will need to overwrite the vaulted variables in molecule.yml like this:

# molecule/default/molecule.yml
provisioner:
  name: ansible
  inventory:
    group_vars:
      group_name:
        test_postgress_password: ""

As such molecule will be using these variables instead.

Here is a more extensive example of how variables may look like:

provisioner:
  name: ansible
  inventory:
    group_vars:
      group_name:
        test_app:
          test_edition: sigma
          test_version: 1.2.4
    host_vars:
      host_name1:
        test_variablelist:
          - list1
          - list2
        test_variable:
          test: "hey"
      host_name2:
        test_edition: alpha

The variable precedence from highest to lowest with molecule:

  • host_vars in molecule.yml

  • host_vars in playbook

  • group_vars in molecule.yml

  • group_vars in playbook

  • vars in role default/main.yml

Creating and using Hashicorp vault secrets

We can manually add secrets in Hashicorp vault hashicorp-vault.tn.esss.lu.se:8200

From the secret tab we can access our “secret engine” secrets and from there create a new secret var with “Create secret” button. The secret var is then added in the path specified with the key:value pair.

For example if we look at the ics-ans-olog-es playbook we can see in one of the variables how the lookup for the secret var is written.

ldap.manager.password: "{{ lookup('hashi_vault', 'secret=secret/data/tn/olog-es/ldap:data')['password'] }}"

Lets look further into what’s written: 'secret=secret/data/tn/olog-es/ldap:data')['password']

The first part of the path secret is our secret engine. Then data for the content in there. tn is the network domain. Then olog-es our application/group name. ldap is the category we have created to store our values. (It could be database, token or anything else). Then the data inside ldap.

Lastly we have the key:value pair in []. In this case password. We could also store user and any other pair here.

For example

'secret=secret/data/tn/olog-es/postgres:data')['user'] and 'secret=secret/data/tn/olog-es/postgres:data')['password'].

The path can actually be written however one wants but for consistency we decided with this.

At the moment our secrets are stored based on network zones for example TN, GPN, CSlab and then application/group name. This might be changed in the future.

Note: We should keep production and staging secret vars separate for best practice and create new ones if they are not.

For more examples of how everything is set up please see these roles and their corresponding vaulted secret vars.

Certificate role [WIP]

If our playbook is running the ics-ans-role-certificate role we need to include this to our group_vars to be able to make a certificate signing request.

Add these to our group_vars/group_name.yml

certificate_adcs_username: "{{ lookup('hashi_vault', 'secret=secret/data/gpn/adcs/cert-request-user:data')['user'] }}"
certificate_adcs_password: "{{ lookup('hashi_vault', 'secret=secret/data/gpn/adcs/cert-request-user:data')['password'] }}"

If our host is not “tn.esss.lu.se” or “cslab.esss.lu.se”. We need to include this to the host_vars to fetch the correct certification

Add these to our host_vars/host_name.yml

certificate_custom_key: "{{ lookup('hashi_vault', 'secret=secret/data/gpn/certificate/esss.lu.se:data')['key'] }}"
certificate_custom_certificate: "{{ lookup('hashi_vault', 'secret=secret/data/gpn/certificate/esss.lu.se:data')['certificate'] }}"
certificate_custom_certificate_chain: "{{ lookup('hashi_vault', 'secret=secret/data/gpn/certificate/esss.lu.se:data')['certificate-chain'] }}"
Using Hashicorp vault token in AWX

We will need correct access to lookup our secret variables in hashicorp vault. This can be done with a small modification to the job template in AWX. or else you might get errors like these during run.

Decrypting secret variables

Hint

The ansible vaulted variable can usually be found in bitwarden.

To decrypt an ansible vaulted variable we would need a vault password which can be found in bitwarden under Ansible vault password.

If you haven’t set the ANSIBLE_VAULT_PASSWORD_FILE we will need to input the vault password manually during the decrypt process when it prompts you.

We will also need to change the format of the text to something like the examples below.

$ ansible-vault decrypt
- sample output -
Vault password: 
Reading ciphertext input from stdin
$ANSIBLE_VAULT;1.1;AES256
39656637333832303638363264393033653433346634356438316636643964666332373630356564
3461316164666139626630343930376233363832653064310a303631306235333365666463393834
38643330306534333065323033643838303338386664353637653131346530623836393366346430
3565663666663539370a623665353663383563643961633761303932616564313662663066623831
3064 #Press Enter once then Ctrl + D
Decryption successful
test

This will also work, without newlines between the hashed numbers.

$ ansible-vault decrypt
- sample output -
Vault password: 
Reading ciphertext input from stdin
$ANSIBLE_VAULT;1.1;AES256
396566373338323036383632643930336534333466343564383166366439646663323736303565643461316164666139626630343930376233363832653064310a303631306235333365666463393834386433303065343330653230336438383033383866643536376531313465306238363933663464303565663666663539370a6236653536633835636439616337613039326165643136626630666238313064 #Press Enter once then Ctrl + D
Decryption successful
test

Important

There needs to be a newline between $ANSIBLE_VAULT;1.1;AES256 and the hashed numbers. Also notice there are no single quotation marks between the numbers.

FAQ

I get this error during run.

{
  "msg": "An unhandled exception occurred while templating '{{ lookup('hashi_vault', 'secret=secret/data/gpn/adcs/cert-request-user:data')['user'] }}'. Error was a <class 'ansible.errors.AnsibleError'>, original message: An unhandled exception occurred while running the lookup plugin 'hashi_vault'. Error was a <class 'ansible.errors.AnsibleError'>, original message: No Vault Token specified",
  "_ansible_no_log": false
}

See This section


Monitoring using CLI tools

alerta is the unified command-line tool, terminal GUI and Python SDK for the alerta monitoring system.

It can be used to send, query, tag, change status, delete, dump historical data or view raw alert data. It can also be used to send heartbeats to the alerta server, and generate alerts based on missing or slow heartbeats.

The client tool can be installed through the alerta.yml located in the conda-environments repository.

Setup the configuration file ~/.alerta.conf

[DEFAULT]
timezone = Europe/Stockholm
output = json
profile = production

[profile production]
endpoint = https://<ALERTA URL>/api
key = <API-KEY>
sslverify = no
timeout = 10.0
debug = no

Set the following environment variables:

export ALERTA_CONF_FILE=~/.alerta.conf
export ALERTA_DEFAULT_PROFILE=production

echo "export ALERTA_CONF_FILE=~/.alerta.conf" >> ~/.bashrc
echo "export ALERTA_DEFAULT_PROFILE=production" >> ~/.bashrc
source ~/.bashrc

To display alerts in “top” UNIX output format alerta top

See the alerta CLI how to guide

GitLab Runners

The GitLab runners provided by CSI are shared runners. Tags are used to allocate a job to a runner. Anyone can use a runner by specifying the proper tag. Note that in some case, runners could be restricted to a specific project or group. This would make sense for molecule runners for example.

To add a new GitLab runner, create a new VM and add it to the gitlab_runner group. By default, two executors are deployed as defined by the gitlab_runner_to_register variable in that group.

  • a docker executor with the docker tag

  • a shell executor with the molecule and packer tags

If you want to deploy something else, you can overwrite this variable at the host level. See gitlab-runner01 for example that was only defined with the docker executor and the xilinx tag.

Warning

The current ics-ans-role-gitlab-runner Ansible role only registers a runner if not only registered. Modifying the runner configuration won’t trigger any update. The runner has to be unregistered or the configuration modified manually.

The gitlab-runner configuration is stored under /etc/gitlab-runner/config.toml. See GitLab Runner commands and GitLab Runner deployement for more information.

Local Control Room (LCR)

The main Ansible group for the LCR workstations is the local_control_room group, where global settings are defined. Each LCR workstation should be part of that group or a subgroup. The workstations are indeed divided in several subgroups:

Those subgroups were created to allow different settings on the workstations, especially the epics_ca_addr_list variable to give access to different networks. They have also been used to deploy different versions of CS-Studio (upgrade some subgroups first).

Note that global settings for OpenXAL or Phoebus shall be defined in the application group: openxal or phoebus. The local_control_room group is a child of those groups.

The LCR workstations are deployed using several playbooks:

Each template can be run on a specific workstation, the full local_control_room group or a subgroup.

Moving a service from GPN to the DMZ

Moving a service requires to create a new VM and deploy it from scratch using Ansible. If the service uses a database or local data, a backup has to be restored.

Note

DNS on esss.lu.se domain is managed by IT. There is no link with CSEntry. Updating hosts in CSEntry on the InitialOP-DMZ or pr-srv-esss-lu-se networks has no impact on DNS.

The following describes the steps performed to move the csentry-test server as an example:

  • Register the new host in CSEntry: https://csentry.esss.lu.se/network/hosts/view/csentry-test Note that the old host could be moved to the new network, but a random MAC has to be generated. It’s easier to delete the old interface and create a new one, or create a new host (if the same hostname is used, previous entry has to be deleted). In this case csentry-test was a cname that was deleted before to register the new host.

  • Trigger the new VM creation from CSEntry

  • Deploy the new VM. If a static inventory is used, update it and specify the new IP using the ansible_host variable. This is automatically done when using csentry inventory (the DNS doesn’t need to be up-to-date).

  • Stop the old service. For CSEntry, stopping traefik is enough: sudo docker stop traefik_proxy

  • Run a backup to get the latest state. For CSEntry: sudo /usr/local/sbin/dump-db

  • Copy the result /dumps/csentry_db-20201126-1426.sql.gz to the new machine

  • Restore the backup on the new machine. For CSEntry: sudo /usr/local/sbin/restore-db /tmp/csentry_db-20201126-1426.sql.gz

  • After restoring the backup, some extra actions might have to be performed. For CSEntry, the elasticsearch database needs to be synchronized with the data restored in postgres. Run: sudo docker exec csentry_web flask reindex. Or stop the csentry_web container and re-launch the playbook for the handler to be triggered.

  • Ask IT to update the DNS

  • The new service is up and running!

  • Create a SNOW ticket to delete the old GPN VM

Python deployment

There are many ways to deploy Python applications/tools.

OS package

If the tool is available as a RPM, this is the easiest way to deploy it. Use yum in that case. This solution also works to deploy a simple Python script with dependencies available via the OS. An example is the slackalarm Python script. See the playbook.

Pex / Shiv

For command line tools that are only available from pypi or require recent dependencies, pex and shiv are good options. They both build fully self-contained Python zipapps with all their dependencies included.

shiv requires Python >= 3.6. This is the recommended utility for Python 3 compatible tools. If Python 2 is required, use pex.

ansible-galaxy is packaged with pex for inclusion in the Development Machine (RPM wasn’t available at the time). See https://gitlab.esss.lu.se/ics-infrastructure/ansible-pex.

Docker images are available to easily create a GitLab CI pipeline based on those utilities. See awx-shiv as example.

Docker

Docker is a good alternative to run larger applications. This is the solution used for CSEntry or galaxy-bot.

This is also a solution used for tools required by GitLab CI. awxkit image is used by many pipelines to trigger AWX jobs.

Conda environment

To install multiple libraries or tools that require non Python dependencies (like epics-base), an alternative is to use a conda environment. Conda allows us to choose the Python version and to install extra requirements.

There are two Ansible modules to manage conda environments:

This solution is used for example to deploy:

  • epics-base in the LCR

  • EPICS CA and PVA Gateways