You are on the cljdoc ops team or want to learn about cljdoc infracstructure.
This config is specific to cljdoc production. If you want to bring up your own cljdoc-like service, you can use our config as a basis, but you’ll need to make changes.
Terraform access to Exoscale is configured in ops/exoscale/infrastructure/secrets.tfvars:
exoscale_api_key = <your key here>
exoscale_api_secret = <your value here>|
Important
|
Protect this file, it contains secrets.
Good idea to chmod 600 secrets.tfvars.
|
Packer can read this file format but is picky about the filename, so we’ll create a soft link to satisfy it:
cd ops/exoscale/image
ln -s ../infrastructure/secrets.tfvars ./secrets.pkrvars.hclAuthorized keys are stored in a non-version controlled tfvars file.
The structure is:
base_authorized_key = "<base pub key here>"
additional_authorized_keys = {
descriptive-key1-here = "<additional pub key 1 here>",
descriptive-key2-here = "<additional pub key 2 here>"
}
Terraform might suggest that changes can be applied in place, but you’ll need to taint the compute instance, and apply it for changes to be applied correctly.
Yup, this means we can’t currently change authorized ssh keys without recreating the cljdoc compute instance.
This file will be shared securely, as needed, with cljdoc ops team members.
You’ll need the following software installed:
Optionally install:
-
Nomad if you want to run it locally against production, see Accessing Nomad.
To stay sane, we want to avoid making any infrastructure changes directly to prod. All changes should be carried out with Packer, Terraform, and automatic deployments carried out by our CircleCI job.
The cljdoc host image is our single node image that will host all of our compute infrastructure.
We use Packer to create machine images for the cljdoc host compute image. Sources are under ./exoscale/image/ The images are based on an Exoscale Debian template. The following software is installed:
Of interest:
-
./exoscale/image/debian-cljdoc.pkr.hcl - tells Packer how to build our image
-
./exoscale/image/conf - some service config files installed by our Packer config
|
Tip
|
Packer refers to secrets you setup in Secrets. |
Change to the appropriate dir:
cd ops/exoscale/imageOptionally validate:
packer validate -var-file=secrets.pkrvars.hcl debian-cljdoc.pkr.hclAnd finally build on Exoscale
packer build -var-file=secrets.pkrvars.hcl debian-cljdoc.pkr.hclThis will create a new image template named debian-cljdoc-YYYYMMDD-HHmm on Exoscale.
You will need to explicitly reference this image by this name from terraform’s main.tf.
|
Tip
|
It’s a good idea to occassionally log into the Exoscale Portal and under Compute→Templates, delete old unused debian-cljdoc templates.
|
We use Terraform to create resources on Exoscale, including:
-
Simple Object Store bucket for cljdoc backups
-
Compute instance for the cljdoc server host
-
DNS config
|
Note
|
The cljdoc.org domain should be configured to point to Exoscale.
|
|
Tip
|
These commands require secrets to be configured as described in Secrets. |
First, change to the appropriate dir:
cd ops/exoscale/infrastructureTerraform might request you run init, or if you change/add modules you might need to do it without being asked:
terraform initTo validate config:
terraform validateTo view the plan terraform would carry out:
terraform plan -var-file=secrets.tfvarsTo carry out the plan:
terraform apply -var-file=secrets.tfvarsTo sync the server state back to terraform:
terraform refresh -var-file=secrets.tfvarsRetrieving outputs:
terraform output
terraform output -json
terraform output cljdoc_static_ipTo taint the compute instance for recreation on next apply:
terraform taint module.main_server.exoscale_compute_instance.cljdoc_01After updating plugins or plugin versions (currently in provider.tf), run:
terraform init -upgradeThis upgrades and locks those changes to .terraform.lock.hcl
The cljdoc docker image runs on the cljdoc host.
bb docker-imageThis will package the cljdoc application in a Docker container.
A tag will be determined based on number of commits, branch and commit SHA.
Docker images are published to Docker Hub during CI.
See .circleci/config.yml.
|
Tip
|
Run |
To deploy the cljdoc service to the provisioned infrastructure we use Nomad. While Nomad provides a convenient CLI interface, it has proven easier to generate Nomad job specs using Clojure and submit them to the Nomad server via the Nomad REST API.
The relevant code is under /ops/exoscale/deploy/.
Deployment is carried out by CircleCI, see deploy-to-nomad job in /.circleci/config.yml
This will fail unless the Docker Hub has a cljdoc image with the provided tag. The tag names are determined based on Git commit count, branch and HEAD and images are pushed to Docker Hub as part of CI.
./ops/nomad.clj username@ipWhere username is your ssh login and ip is nomad’s IP address.
You can optionally specify an identify-file:
./ops/nomad.clj -i ~/ssh/my-keyfile username@ipThe script launches an SSH process forwarding ports 4646 (nomad), 8500 (consul), 8080 (traefik), and 9010 (for access via jconsole or visualvm).
If you have Nomad installed locally, you can now run nomad comands like the following:
nomad status cljdoc
nomad alloc logs -f 683ade58
nomad deployment listHit ^D to to close the session and forwarded ports.
The SQLite database is automatically backed up daily by cljdoc to Exoscale cljdoc-backups bucket.
Our current backup retention strategy is:
-
7 daily
-
4 weekly
-
12 monthly
-
2 yearly
If cljdoc does not find a database on startup, it will automatically restore the most recent one from the cljdoc-backups bucket.
By default the cljdoc web server binds to localhost.
This is a safe default for development work.
In production, we run the cljdoc web server from a docker container.
The production docker container launches the cljdoc web server with the cljdoc.host JVM system property to override the localhost default to 0.0.0.0.
Traefik generates SSL certificates automatically through Let’s Encrypt.
Experts will uncover vulnerabilities in some of the technologies we use. It is inevitable.
We use clj-watson to scan cljdoc dependencies for known security issues in our dependencies. You must specify a NVD database token, get yours here: https://nvd.nist.gov/developers/request-an-api-key
Example usage from cljdoc root:
CLJ_WATSON_NVD_API_KEY=your-token-here bb nvd-scanReplace your-token-here with your actual token.
You can also optionally specify OSSIndex credentials.
Vulnerabilities and suggested fixes are written to the terminal.
Be aware that the scan sometimes reports false positives.
After some careful verification, you can quiet false positives via suppresions.xml.
Other tools such as trivy can identify security holes. Trivy seems to be good at finding issues in docker images and configuration.
Cljdoc has a great zero downtime story triggered by commits to master. This all happens within the the single Exoscale compute instance.
Sometimes the compute instance will need to updated. This is not currently a zero downtime operation.
Typically this involves:
-
Update packer config and deploy new template to Exoscale via the packer tool
-
Reference the new template from terraform config from a new (decoupled from Exoscale Elastic IP) compute instance deployed by terraform tool
-
Test the compute instance, ssh in, deploy from command line, etc.
-
After satisified, make the new instance live by attaching it to Exoscale’s elastic ip, and decoupling the old instance from elastic ip (all via terraform)
-
Trigger a deploy to the new instance via a code commit to github master (can be a normal PR merge)
-
Test that cljdoc.org comes up and is working as expected.
-
It should grab the latest db backup from the Exoscale object store, which will be at most a day stale, any missing builds should be automaticaly scheduled.
-
SSL certs should be automatically regenerated from letsencrypt.
-
-
Delete the old compute instance by deleting it from the terraform config and applying via the change via the terraform tool
-
Delete the old template via the Exoscale web portal
Example deploy cmd (run from ops/exoscale/deploy dir).
Triple check the IP address matches your test compute instance.
Grab a valid docker-tag from docker hub.
clojure -M -m cljdoc.deploy deploy \
--nomad-ip <staging-instance-ip-here> \
--docker-tag <docker-tag-here> \
-k ~/.ssh/id_ed25519_exo \
-u debian \
-s secrets.edn \
--omit-tls-domains true \
--lets-encrypt-env staging \
--cljdoc-config-override-map '{:cljdoc/server {:enable-db-backup? false}}'The --omit-tls-domains, --lets-encrypt-env, and --cljdoc-config-override-map options are wholly to support testing. They are not used for prod deploys that happen automatically from CircleCI.
Here’s a couple of examples of upgrades, the commits should tell the story:
Exoscale is generous with their hosting, so please be sure delete any unused resources. Always do so through config, never through the Exoscale web portal (except for templates).