Architecture Overview

Source: Marc Mercer (SRE Lead) — sre-iac repository + Bryan Lee (Engineering Director), Rev 2.0, 2026-03-22

Current vs Future State

This document covers three distinct states of the Anshin Health infrastructure. Sections are clearly labeled [CURRENT STATE], [FUTURE STATE], or [PLANNED] so it is always clear what is running today versus what is being built toward.

Vision

Anshin Health operates a private cloud infrastructure built on OpenStack with AWS-parity service abstractions. The platform provides the same compute, storage, networking, and container orchestration primitives found in AWS — same Terraform patterns, same EKS distribution, same subnet architecture — so that workloads developed here could move to AWS if the business requires it. OpenStack is the production target platform. AWS parity is a design goal that preserves future optionality, not a migration plan.

This approach delivers full infrastructure ownership, zero licensing costs, and the confidence that architectural decisions made today will not become technical debt if cloud requirements change.

Compliance Posture

Anshin Health operates in the healthcare domain. While the staging environment does not process real Protected Health Information (PHI) — all patient data is synthetic, generated via Synthea — the infrastructure is designed to align with SOC 2 Type II and HIPAA Security Rule controls. This ensures that architectural patterns, access controls, and operational procedures established in staging can be carried forward to production environments that do handle PHI.

Compute Platform — Three Tiers

The Anshin compute platform consists of three distinct tiers, each with a different operational role and deployment timeline:

Tier	Hardware	Status	Hypervisor	Purpose
Proxmox Dev Platform	HP #1 (ML350 Gen9)	✅ CURRENT STATE	Proxmox VE	All active K8s, services, development
OpenStack Cluster	HP #2, #3 (+ HP #1 eventually, + HP #5)	🔵 FUTURE STATE	Kolla-Ansible (Docker containers)	Production private cloud
Open Air GPU Farm	HP #4 (open-air ML350 configuration)	🟡 COMING SOON	Proxmox VE	AI inference, LLM serving, fine-tuning

[CURRENT STATE] Proxmox Development Platform

HP #1 (pmx-01, 10.10.96.5) is the sole active compute node. All Anshin services, Kubernetes workloads, and development infrastructure run here today.

What Is Running on Proxmox Now

pmx-01 (HP #1 — HP ProLiant ML350 Gen9)
  │
  ├── K8s cluster (K3s v1.32.3 — 6-node on AlmaLinux 9.5 VMs)
  │     ├── control-01 (10.10.97.1)
  │     ├── control-02 (10.10.98.1)
  │     ├── worker-01  (10.10.97.11)
  │     ├── worker-02  (10.10.97.12)
  │     ├── worker-03  (10.10.98.11)
  │     └── worker-04  (10.10.98.12)
  │
  ├── Application VMs
  │     ├── rp-01      (10.10.96.22) — Caddy reverse proxy (14 service upstreams)
  │     ├── gitlab-01  (10.10.96.41) — Self-hosted GitLab
  │     ├── nb-01      (10.10.96.21) — NetBird VPN gateway
  │     ├── db-01      (10.10.98.51) — Standalone PostgreSQL
  │     ├── dc-01      (10.10.96.11) — FreeIPA primary domain controller
  │     ├── dc-02      (10.10.96.12) — FreeIPA replica domain controller
  │     └── deepseek-01 (10.10.96.80) — DeepSeek AI inference (SSH only)
  │
  └── Storage: QNAP NAS (qnap-01, 10.10.96.31) — NFS for all K8s PVs

Current network: Flat /20 (10.10.96.0/20), Cisco router as gateway, 3× Juniper EX2200 operating as unmanaged L2 switches. No VLANs yet. SRX320 pending deployment.

MetalLB VIP (ingress-nginx): 10.10.98.40 — all Kubernetes service ingress routes here.

[FUTURE STATE] OpenStack Private Cloud Cluster

HP #2 and HP #3 (currently powered off, iLO accessible) will form the OpenStack cluster. HP #1 will eventually migrate off Proxmox into OpenStack as well, once HP #2 and #3 are stable. A fifth ML350 Gen9 (not yet purchased) will be added as an OpenStack backup/spare node.

OpenStack Design Principles

Principle	Implementation
Infrastructure as Code everywhere	Terraform for provisioning, Ansible for configuration, Kolla-Ansible for OpenStack
Clean network segmentation	Six functional VLANs across three physical switch planes
Compliance-aware by default	SOC 2 / HIPAA access controls and audit logging from day one
AWS-parity abstractions	OpenStack services map 1:1 to AWS equivalents
Converged infrastructure	Compute and Ceph storage co-located on the same nodes

OpenStack Services (Target)

OpenStack Service	AWS Equivalent	Purpose
Nova	EC2	Virtual machine provisioning
Neutron	VPC / Subnets	Software-defined networking
Cinder + Ceph RBD	EBS	Block storage (SSD and HDD tiers)
Glance	AMI	VM image management
Keystone	IAM	Identity and access control
Heat	CloudFormation	Infrastructure orchestration
Trove	RDS	Database as a service
Octavia	ALB / NLB	Load balancing
Designate	Route 53	DNS management
Ceph RadosGW	S3	Object storage (S3 API)
Manila + CephFS	EFS	Shared filesystems

Ceph Storage (Target)

A converged Ceph cluster will run on HP #2 and HP #3 (and HP #1 when migrated), co-located with OpenStack compute. Two storage tiers:

SSD pool: 8× 1TB enterprise SSDs per node. Serves Cinder block volumes for databases, boot disks, high-IOPS workloads.
HDD pool: 16× 1TB enterprise HDDs per node. Serves RadosGW object storage, Manila shared filesystems, and bulk capacity.

Kubernetes on OpenStack (Target)

EKS Anywhere on Nova VMs — multi-cluster topology:

Cluster	Purpose
dev	Application development and CI
prod	Production-grade application hosting
infra	Platform services (Infisical, GitLab, monitoring)

HP #5 — OpenStack Backup Node [PLANNED]

A fifth HP ProLiant ML350 Gen9 with identical specifications to HP #1–#3 (2× E5-2697 v4, 512GB RAM, full storage complement) will be purchased once the OpenStack cluster is fully operational. This node provides spare capacity, resiliency, and a maintenance window target so rolling updates can be performed without reducing cluster capacity.

[FUTURE STATE] Open Air GPU Farm

HP #4 is an HP ProLiant ML350 Gen9 that has been stripped of its outer chassis and mounted in an open-air rack frame with 12 active cooling fans. The ML350 motherboard, CPUs, RAM, and PSUs are all in use — only the enclosure has been removed. This configuration allows mounting NVIDIA consumer/prosumer GPUs externally via PCIe risers, which would not fit inside the standard ML350 tower form factor.

This node runs Proxmox VE independently and is not part of the OpenStack cluster. No GPUs will reside in OpenStack nodes.

GPU Configuration

Slot	GPU	VRAM	Architecture	Initial Role
1	NVIDIA RTX 8000	48GB GDDR6	Turing (TU102)	Large model inference (70B+ params)
2	NVIDIA RTX 3090	24GB GDDR6X	Ampere (GA102)	Mid-size models, embeddings, ASR/TTS
3	NVIDIA RTX 3090	24GB GDDR6X	Ampere (GA102)	Fine-tuning, batch inference, RAG
4	RESERVED	TBD	TBD	To be determined post go-live

Total VRAM (initial): 96GB (48 + 24 + 24). The 4th GPU will be selected based on which models and workloads are in highest demand once the farm is operational.

OS storage: 1TB USB SSD (Proxmox VE OS and VM disks) + 64GB USB thumbdrive (recovery ISO / backup).

Proxmox VM Architecture — Recommended Configuration

Each GPU gets its own dedicated Proxmox VM with PCIe passthrough (VT-d), providing near-native GPU performance and complete workload isolation.

HP #4 — Open Air GPU Farm (Proxmox VE)
  │
  ├── gpu-vm-01 (RTX 8000 — 48GB)
  │     ├── vCPUs: 16   RAM: 48GB
  │     ├── Services: Ollama (large models), vLLM
  │     ├── Models: Llama 3 70B, Qwen 72B, Mistral Large, DeepSeek 67B
  │     └── Role: Primary LLM inference — largest context windows
  │
  ├── gpu-vm-02 (RTX 3090 #1 — 24GB)
  │     ├── vCPUs: 8    RAM: 40GB
  │     ├── Services: Ollama (mid-size models), Faster-Whisper (ASR), Coqui/Kokoro TTS
  │     ├── Models: Llama 3 8B, Phi-3, Mistral 7B, embedding models
  │     └── Role: Real-time speech services + mid-size inference
  │
  ├── gpu-vm-03 (RTX 3090 #2 — 24GB)
  │     ├── vCPUs: 8    RAM: 40GB
  │     ├── Services: Axolotl / Unsloth (fine-tuning), batch inference
  │     ├── Models: Fine-tuning base: Llama 3 8B, Phi-3
  │     └── Role: Fine-tuning pipelines, RAG, batch workloads
  │
  └── gpu-vm-04 (RESERVED — 4th GPU slot)
        └── Configuration TBD after go-live based on observed demand

RAM allocation rationale: 128GB total − OS overhead (~4GB) = 124GB usable. VM allocation: 48 + 40 + 40 = 128GB planned. In practice Proxmox balloon driver and RAM not actively used by VMs will be reclaimed. Adjust per observation after go-live.

Network: 2× 10G SFP+ (10Gtek Intel 82599ES NIC):

Port 1 → SX3008-01 (network plane VLAN, inter-service communication)
Port 2 → SX3008-02 (storage plane VLAN, NFS access to QNAP / future Ceph)

AI Services Stack (per VM):

Service	VM	Purpose
Ollama	gpu-vm-01, gpu-vm-02	LLM serving with REST API
vLLM	gpu-vm-01 (optional)	High-throughput inference with batching
Faster-Whisper	gpu-vm-02	ASR (speech-to-text)
Coqui TTS / Kokoro	gpu-vm-02	Text-to-speech synthesis
Axolotl / Unsloth	gpu-vm-03	Fine-tuning framework
OpenWebUI	K8s (anshin-dev-svc)	Web UI for all Ollama endpoints

No GPU in OpenStack

The three HP ML350 servers that will form the OpenStack cluster (HP #2, #3, and eventually HP #1) contain no GPUs. All GPU workloads are exclusively on HP #4 (Proxmox). This is deliberate — Proxmox provides superior PCIe passthrough and GPU isolation compared to OpenStack Nova with libvirt passthrough.

Identity Infrastructure — FreeIPA [CURRENT STATE]

FreeIPA provides centralized identity across all environments:

Domain: anshinhealth.net
Realm: ANSHINHEALTH.NET
DCs: dc-01 (10.10.96.11, primary), dc-02 (10.10.96.12, replica)
Services: DNS (internal split-brain), LDAP, Kerberos, Certificate Authority, NTP
Integration: Automated host enrollment, external-dns RFC2136 updates, LDAP auth for Grafana and other services

Network — Summary

Phase	Status	Edge	Network Plane	Storage Plane	OOB Plane
Current Reality	✅ NOW	Cisco router (flat /20, no VLANs)	3× EX2200 (unmanaged)	Same flat network	Same flat network
Interim	🔵 PENDING	Juniper SRX320	Juniper EX2200-24T (VLANs configured)	Juniper EX2200-24T (isolated)	Juniper EX2200-24P
Target	🔵 FUTURE	TP-Link ER7412-M2	TL-SX3008F (10G) + TL-SG3428 (1G)	TL-SX3008F (10G, isolated) + TL-SG3428 (1G)	Juniper EX2200-24P (repurposed)

See Network Architecture and VLAN & IP Allocation for full detail.

Phase-Based Implementation

Phase 1: Foundation [IN PROGRESS]

✅ Proxmox on HP #1 — all current services running
✅ FreeIPA domain controllers on dc-01/dc-02
✅ K3s cluster (6 nodes) with MetalLB, cert-manager, ingress-nginx
✅ Caddy reverse proxy, GitLab, NetBird VPN
🔵 SRX320 deployment — replace Cisco bridge, enable VLAN segmentation
🔵 EX2200 VLAN configuration (configs ready in sre-iac repo)

Phase 2: GPU Farm [COMING SOON]

🔵 Power on HP #4 open-air GPU farm
🔵 Install Proxmox VE (1TB USB SSD)
🔵 Configure PCIe passthrough (VT-d) for 3 GPUs
🔵 Deploy gpu-vm-01, gpu-vm-02, gpu-vm-03
🔵 Connect 10G NICs to TP-Link switch fabric
🔵 Deploy Ollama, Faster-Whisper, TTS services

Phase 3: TP-Link Omada Network [PLANNED]

🔵 Deploy TP-Link ER7412-M2 as edge router
🔵 Deploy 2× SX3008F (network + storage 10G planes)
🔵 Deploy 2× SG3428 (1G access switches)
🔵 Install 10Gtek dual SFP+ NICs in all ML350 servers
🔵 Connect 1.5m DAC cables (8 total: 4 servers × 2 planes)
🔵 Configure Omada SDN — VLAN enforcement, policies

Phase 4: OpenStack Core [FUTURE]

🔵 Install Ubuntu 24.04 LTS on HP #2 and HP #3
🔵 LACP bonding and VLAN config on new switch fabric
🔵 Kolla-Ansible OpenStack deployment
🔵 Ceph cluster initialization (SSD + HDD pools)
🔵 Neutron networking with VLAN provider networks
🔵 Migrate HP #1 from Proxmox to OpenStack
🔵 Purchase and integrate HP #5 (backup/spare node)

Phase 5: Kubernetes on OpenStack [FUTURE]

🔵 EKS Anywhere cluster deployment on Nova VMs
🔵 Multi-cluster topology (dev, prod, infra)
🔵 GitOps via ArgoCD
🔵 Advanced OpenStack services (Trove, Octavia, Designate, Manila, RadosGW)

Operational Model

Layer	Tooling	Purpose
Infrastructure provisioning	Terraform + Atmos	VPC, networks, instances, volumes, object storage
Configuration management	Ansible	Host bootstrap, certificate deployment, domain enrollment
OpenStack deployment	Kolla-Ansible	Containerized OpenStack control plane lifecycle
Cluster management	EKS Anywhere + ArgoCD	Kubernetes cluster lifecycle and GitOps
Secrets management	Infisical	Centralized secrets with Kubernetes operator integration
Observability	kube-prometheus-stack	Prometheus, Grafana, Alertmanager, Karma, Blackbox Exporter
Certificate management	acme.sh + ZeroSSL	Wildcard ECDSA certificates via DNS challenge

Document Control

Classification	Internal — Infrastructure Documentation
Compliance Scope	SOC 2 Type II, HIPAA Security Rule (design target)
Data Classification	No PHI — Synthetic data only (Synthea) in staging environment
Review Cycle	Quarterly or upon significant infrastructure change

Rev	Date	Author	Description
1.0	2026-02-24	Marc Mercer	Initial release
2.0	2026-03-22	Anshin Engineering	Restructured into Current/Future-OpenStack/Future-GPU sections; added HP#4 open-air GPU farm details, HP#5 planned node, Proxmox VM architecture recommendation, phase implementation status

Vision​

Compliance Posture​

Compute Platform — Three Tiers​

[CURRENT STATE] Proxmox Development Platform​

What Is Running on Proxmox Now​

[FUTURE STATE] OpenStack Private Cloud Cluster​

OpenStack Design Principles​

OpenStack Services (Target)​

Ceph Storage (Target)​

Kubernetes on OpenStack (Target)​

HP #5 — OpenStack Backup Node [PLANNED]​

[FUTURE STATE] Open Air GPU Farm​

GPU Configuration​

Proxmox VM Architecture — Recommended Configuration​

Identity Infrastructure — FreeIPA [CURRENT STATE]​

Network — Summary​

Phase-Based Implementation​

Phase 1: Foundation [IN PROGRESS]​

Phase 2: GPU Farm [COMING SOON]​

Phase 3: TP-Link Omada Network [PLANNED]​

Phase 4: OpenStack Core [FUTURE]​

Phase 5: Kubernetes on OpenStack [FUTURE]​

Operational Model​

Document Control​