Ietor
Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

cljdoc Operations

Audience

You are on the cljdoc ops team or want to learn about cljdoc infracstructure.

This config is specific to cljdoc production. If you want to bring up your own cljdoc-like service, you can use our config as a basis, but you’ll need to make changes.

Prerequsites

Secrets

Exoscale Creds

Terraform access to Exoscale is configured in ops/exoscale/infrastructure/secrets.tfvars:

exoscale_api_key = <your key here>
exoscale_api_secret = <your value here>
Important
Protect this file, it contains secrets. Good idea to chmod 600 secrets.tfvars.

Packer can read this file format but is picky about the filename, so we’ll create a soft link to satisfy it:

cd ops/exoscale/image
ln -s ../infrastructure/secrets.tfvars ./secrets.pkrvars.hcl

Authorized SSH Keys

Authorized keys are stored in a non-version controlled tfvars file.

The structure is:

base_authorized_key = "<base pub key here>"

additional_authorized_keys = {
  descriptive-key1-here = "<additional pub key 1 here>",
  descriptive-key2-here = "<additional pub key 2 here>"
}

Terraform might suggest that changes can be applied in place, but you’ll need to taint the compute instance, and apply it for changes to be applied correctly. Yup, this means we can’t currently change authorized ssh keys without recreating the cljdoc compute instance.

This file will be shared securely, as needed, with cljdoc ops team members.

Software

You’ll need the following software installed:

  • Packer to create our server host image

  • Terraform to create and manage our infrastructure

Optionally install:

Overview

Here’s what cljdoc looks like from an ops perspective.

ops overview.drawio

Philosophy for Changes

To stay sane, we want to avoid making any infrastructure changes directly to prod. All changes should be carried out with Packer, Terraform, and automatic deployments carried out by our CircleCI job.

Cljdoc Host Image (Packer)

The cljdoc host image is our single node image that will host all of our compute infrastructure.

We use Packer to create machine images for the cljdoc host compute image. Sources are under ./exoscale/image/ The images are based on an Exoscale Debian template. The following software is installed:

  • Docker - hosts our traefik load balancer and cljdoc server

  • Nomad - orchestrates deployment of cljdoc and traefik

  • Consul - used by traefik and nomad for config and discovery

Of interest:

Creating a Cljdoc Host Image (Packer)

Tip
Packer refers to secrets you setup in Secrets.

Change to the appropriate dir:

cd ops/exoscale/image

Optionally validate:

packer validate -var-file=secrets.pkrvars.hcl debian-cljdoc.pkr.hcl

And finally build on Exoscale

packer build -var-file=secrets.pkrvars.hcl debian-cljdoc.pkr.hcl

This will create a new image template named debian-cljdoc-YYYYMMDD-HHmm on Exoscale. You will need to explicitly reference this image by this name from terraform’s main.tf.

Tip
It’s a good idea to occassionally log into the Exoscale Portal and under Compute→Templates, delete old unused debian-cljdoc templates.

Infrastructure (Terraform)

We use Terraform to create resources on Exoscale, including:

  • Simple Object Store bucket for cljdoc backups

  • Compute instance for the cljdoc server host

  • DNS config

Note
The cljdoc.org domain should be configured to point to Exoscale.

Common Commands

Tip
These commands require secrets to be configured as described in Secrets.

First, change to the appropriate dir:

cd ops/exoscale/infrastructure

Terraform might request you run init, or if you change/add modules you might need to do it without being asked:

terraform init

To validate config:

terraform validate

To view the plan terraform would carry out:

terraform plan -var-file=secrets.tfvars

To carry out the plan:

terraform apply -var-file=secrets.tfvars

To sync the server state back to terraform:

terraform refresh -var-file=secrets.tfvars

Retrieving outputs:

terraform output
terraform output -json
terraform output cljdoc_static_ip

To taint the compute instance for recreation on next apply:

terraform taint module.main_server.exoscale_compute_instance.cljdoc_01

After updating plugins or plugin versions (currently in provider.tf), run:

terraform init -upgrade

This upgrades and locks those changes to .terraform.lock.hcl

Creating a Cljdoc Docker Image

The cljdoc docker image runs on the cljdoc host.

bb docker-image

This will package the cljdoc application in a Docker container. A tag will be determined based on number of commits, branch and commit SHA. Docker images are published to Docker Hub during CI. See .circleci/config.yml.

Tip

Run bb clean first when testing your image locally. This will ensure you are not working with stale inputs.

Orchestration (Nomad)

To deploy the cljdoc service to the provisioned infrastructure we use Nomad. While Nomad provides a convenient CLI interface, it has proven easier to generate Nomad job specs using Clojure and submit them to the Nomad server via the Nomad REST API.

The relevant code is under /ops/exoscale/deploy/.

Deployment is carried out by CircleCI, see deploy-to-nomad job in /.circleci/config.yml

This will fail unless the Docker Hub has a cljdoc image with the provided tag. The tag names are determined based on Git commit count, branch and HEAD and images are pushed to Docker Hub as part of CI.

Accessing Nomad

./ops/nomad.clj username@ip

Where username is your ssh login and ip is nomad’s IP address. You can optionally specify an identify-file:

./ops/nomad.clj -i ~/ssh/my-keyfile username@ip

The script launches an SSH process forwarding ports 4646 (nomad), 8500 (consul), 8080 (traefik), and 9010 (for access via jconsole or visualvm).

If you have Nomad installed locally, you can now run nomad comands like the following:

nomad status cljdoc
nomad alloc logs -f 683ade58
nomad deployment list

Hit ^D to to close the session and forwarded ports.

Backing Up Data

The SQLite database is automatically backed up daily by cljdoc to Exoscale cljdoc-backups bucket.

Our current backup retention strategy is:

  • 7 daily

  • 4 weekly

  • 12 monthly

  • 2 yearly

If cljdoc does not find a database on startup, it will automatically restore the most recent one from the cljdoc-backups bucket.

Bound Host

By default the cljdoc web server binds to localhost. This is a safe default for development work.

In production, we run the cljdoc web server from a docker container. The production docker container launches the cljdoc web server with the cljdoc.host JVM system property to override the localhost default to 0.0.0.0.

SSL Certificates

Traefik generates SSL certificates automatically through Let’s Encrypt.

Checking for Vulnerabilities

Experts will uncover vulnerabilities in some of the technologies we use. It is inevitable.

We use clj-watson to scan cljdoc dependencies for known security issues in our dependencies. You must specify a NVD database token, get yours here: https://nvd.nist.gov/developers/request-an-api-key

Example usage from cljdoc root:

CLJ_WATSON_NVD_API_KEY=your-token-here bb nvd-scan

Replace your-token-here with your actual token.

You can also optionally specify OSSIndex credentials.

Vulnerabilities and suggested fixes are written to the terminal. Be aware that the scan sometimes reports false positives. After some careful verification, you can quiet false positives via suppresions.xml.

Other tools such as trivy can identify security holes. Trivy seems to be good at finding issues in docker images and configuration.

Upgrading the Exoscale Compute Instance

Cljdoc has a great zero downtime story triggered by commits to master. This all happens within the the single Exoscale compute instance.

Sometimes the compute instance will need to updated. This is not currently a zero downtime operation.

Typically this involves:

  • Update packer config and deploy new template to Exoscale via the packer tool

  • Reference the new template from terraform config from a new (decoupled from Exoscale Elastic IP) compute instance deployed by terraform tool

  • Test the compute instance, ssh in, deploy from command line, etc.

  • After satisified, make the new instance live by attaching it to Exoscale’s elastic ip, and decoupling the old instance from elastic ip (all via terraform)

  • Trigger a deploy to the new instance via a code commit to github master (can be a normal PR merge)

  • Test that cljdoc.org comes up and is working as expected.

    • It should grab the latest db backup from the Exoscale object store, which will be at most a day stale, any missing builds should be automaticaly scheduled.

    • SSL certs should be automatically regenerated from letsencrypt.

  • Delete the old compute instance by deleting it from the terraform config and applying via the change via the terraform tool

  • Delete the old template via the Exoscale web portal

Example deploy cmd (run from ops/exoscale/deploy dir). Triple check the IP address matches your test compute instance. Grab a valid docker-tag from docker hub.

clojure -M -m cljdoc.deploy deploy \
        --nomad-ip <staging-instance-ip-here> \
        --docker-tag <docker-tag-here> \
        -k ~/.ssh/id_ed25519_exo \
        -u debian \
        -s secrets.edn \
        --omit-tls-domains true \
        --lets-encrypt-env staging \
        --cljdoc-config-override-map '{:cljdoc/server {:enable-db-backup? false}}'

The --omit-tls-domains, --lets-encrypt-env, and --cljdoc-config-override-map options are wholly to support testing. They are not used for prod deploys that happen automatically from CircleCI.

Here’s a couple of examples of upgrades, the commits should tell the story:

  • 2026-01-27

  • 2024-11-28

Exoscale is generous with their hosting, so please be sure delete any unused resources. Always do so through config, never through the Exoscale web portal (except for templates).