Skip to main content

Automation & Infrastructure as Code

Source: Marc Mercer (SRE Lead) — sre-iac repository, Rev 1.0, 2026-02-24

The infrastructure follows a hybrid automation approach: Terraform handles resource provisioning, Ansible handles configuration management, and Helm/ArgoCD handles Kubernetes application deployment.

Tooling Overview

LayerToolPurpose
Resource provisioningTerraform + AtmosLXC containers, VMs, networks, volumes
Configuration managementAnsibleHost bootstrap, domain enrollment, certificates, reverse proxy
OpenStack deploymentKolla-AnsibleContainerized OpenStack control plane lifecycle
Kubernetes applicationsHelmApplication deployment and lifecycle
GitOpsArgoCD (planned)Kubernetes application lifecycle in OpenStack environment
Secrets managementInfisicalCentralized secrets with Kubernetes operator integration
Certificate issuanceacme.sh + ZeroSSLWildcard ECDSA certificates via DNS-01 challenge

Terraform: Infrastructure Provisioning

Provider (current Proxmox): telmate/proxmox

Repository structure:

terraform/
containers.tf # All LXC container definitions
variables.tf # Resource allocation variables
terraform.tfvars # Environment-specific values
modules/
lxc_container/ # Reusable container module

Container Module Pattern

module "gitlab-01" {
source = "./modules/lxc_container"
hostname = "gitlab-01"
vmid = 701
node = "pmx-02"
ostemplate = "hpserver-storage:vztmpl/almalinux-9-default_20240911_amd64.tar.xz"
rootfs_storage = "local-lvm"
rootfs_size = "24G"
memory = 16384
cores = 2
ip = "10.10.96.41/20"
gateway = "10.10.96.1"
tags = "almalinux;gitlab"
mountpoints = [
{ mountpoint = "/data", storage = "local-lvm", size = "2T" }
]
}

VMID Naming Convention

RangeAssigned To
100sCore infrastructure (domain controllers, proxies, VPN)
200sKubernetes nodes (control plane and workers)
700sApplication services (GitLab, MongoDB, etc.)

Workflow

terraform plan # Review planned changes
terraform apply # Apply infrastructure changes
terraform destroy -target=module.container-name # Destroy specific resource (use with caution)

Ansible: Configuration Management

Inventory: Dynamic discovery via community.proxmox.proxmox plugin — automatically groups hosts by Proxmox tags (e.g., groupdc, groupk8s_worker, groupproxy).

Vault management: All sensitive credentials stored in group_vars/all/vault.yml (Ansible Vault AES-256). Vault password file .vault_pass excluded from git.

CredentialPurpose
acme_emailZeroSSL account identifier
aws_access_key_id / aws_secret_access_keyRoute53 DNS management for cert validation
kerberos_principal / kerberos_passwordFreeIPA enrollment automation
proxmox_api_user / proxmox_api_tokenProxmox API authentication
vaulted_dockerhub_username / vaulted_dockerhub_tokenDocker Hub registry authentication

Key Ansible Roles

roles/common/ — Base Configuration

Applied to all containers. Domain enrollment, SSH hardening, and baseline configuration.

Task FilePurpose
ipa_enroll.ymlAutomated FreeIPA domain enrollment
firewalld.ymlFirewall rule management
chrony.ymlNTP time synchronization

Enrollment pattern:

- name: Enroll host with FreeIPA
command: >
ipa-client-install --unattended
--principal {{ kerberos_principal }}
--password {{ kerberos_password }}
--domain anshinhealth.net
--realm ANSHINHEALTH.NET
--server dc-01.anshinhealth.net
--server dc-02.anshinhealth.net

roles/certificates/ — Certificate Lifecycle

Full SSL/TLS certificate lifecycle via acme.sh and ZeroSSL:

  1. Clone/update acme.sh repository with DNS plugins
  2. Decrypt existing certificate state from Ansible Vault
  3. Issue or renew wildcard certificates via ZeroSSL DNS-01 challenge
  4. Re-encrypt private keys and renewal config with Ansible Vault
  5. Clean up temporary files while preserving renewal state

Managed certificate domains:

  • anshinhealth.net + *.anshinhealth.net
  • apps.anshinhealth.net + *.apps.anshinhealth.net
  • svcs.anshinhealth.net + *.svcs.anshinhealth.net

roles/reverse-proxy/ — Caddy Configuration

  • Installs Caddy from EPEL repository
  • Decrypts and deploys vault-encrypted certificates
  • Generates modular configuration files per upstream service
  • Validates configuration before applying (atomic updates)
  • Certificate permissions: 600 for keys, 644 for certs (owned by caddy service user)

roles/k8s/ — Kubernetes Node Configuration

  • Container runtime configuration
  • Registry mirror configuration (registries.yaml for Docker Hub pull-through cache)
  • Storage provisioner setup
  • Node labels and taints

Playbook Execution

# Full site configuration
ansible-playbook playbooks/site.yml

# Targeted by host group
ansible-playbook playbooks/site.yml -l groupdc # Domain controllers
ansible-playbook playbooks/site.yml -l groupk8s_worker # K8s workers
ansible-playbook playbooks/site.yml -l groupproxy # Reverse proxy

# Targeted by individual host
ansible-playbook playbooks/site.yml -l gitlab-01

# Certificate management
ansible-playbook playbooks/certificates.yml

# Reverse proxy deployment
ansible-playbook playbooks/reverse-proxy.yml

acme.sh: Certificate Issuance

AttributeValue
CAZeroSSL
Challenge TypeDNS-01
DNS ProvidersRoute53 (primary), Cloudflare (secondary)
Key TypeECDSA (Elliptic Curve)
Validity90 days
Renewal Threshold30 days before expiration
acme.sh/
account.conf
anshinhealth.net_ecc/
anshinhealth.net.key # Private key (Vault-encrypted in git)
anshinhealth.net.cer # Certificate
fullchain.cer # Full certificate chain
anshinhealth.net.conf # Renewal configuration
apps.anshinhealth.net_ecc/
svcs.anshinhealth.net_ecc/

Helm: Kubernetes Application Deployment

ChartNamespaceValues FilePurpose
kube-prometheus-stackmonitoringkubernetes/monitoring/helm-values/kube-prometheus-stack-values.yamlPrometheus, Grafana, AlertManager, node-exporter, kube-state-metrics
Blackbox Exportermonitoringkubernetes/monitoring/helm-values/Endpoint monitoring
Karmamonitoringkubernetes/monitoring/helm-values/AlertManager dashboard
external-dnskube-systemCustom valuesRFC2136 DNS updates to FreeIPA
MetalLBmetallb-systemCustom valuesBare-metal LoadBalancer
Infisicalinfisicalkubernetes/infisical/helm-values/infisical-standalone-values.yamlSecrets management
# Monitoring stack
cd kubernetes/monitoring && ./deploy.sh all

# Infisical (database, server, operator)
cd kubernetes/infisical && ./deploy.sh

Utility Scripts

ScriptPurposeUsage
scripts/ipa-dns-grant.pyManages FreeIPA DNS zone permissions for service accounts — grants external-dns TSIG key access to specific DNS zonesRun after adding new DNS zones
scripts/r53-to-bind9.pyExports Route53 DNS records to BIND9 formatUsed for DNS migration and auditing

Kolla-Ansible: OpenStack Deployment (Planned)

Kolla-Ansible will manage the deployment and lifecycle of the entire OpenStack platform. Every OpenStack service (Nova, Neutron, Cinder, Glance, Keystone, Heat, Octavia, Designate, Trove, Manila) runs as a Docker container.

Key configuration files:

  • globals.yml — Service selection, networking backends, storage backends, TLS config
  • inventory/ — Host inventory defining control, compute, network, and storage roles
Needs Input

Specific globals.yml parameters, Kolla-Ansible version, and target OpenStack release are pending.


Terraform: OpenStack Provisioning (Planned)

The same Terraform patterns used for Proxmox will be adapted for the OpenStack provider, using the same AWS-parity subnet layout (10.100.0.0/16 with /20 blocks per type). Atmos will be used for multi-environment configuration management.


Document Control

RevDateAuthorDescription
1.02026-02-24Marc MercerInitial release