StackAI On-Prem CLI - Engineering Standards

📋 Purpose

This repository delivers StackAI On-Prem, a production-grade, on-prem-first AI platform for enterprise customers requiring strict data sovereignty. The stackai CLI orchestrates a 50+ service Docker Compose deployment across customer Ubuntu infrastructure.

Business stakes: $10M+ ARR Fortune 500 customers. Single CLI failure risks enterprise churn.

🎯 Command Contract (Immutable)

Enforce exactly 5 top-level commands. No exceptions.

stackai init           # Complete bootstrap (system → production)
stackai deploy {start,stop,status,logs,restart}  # Service lifecycle
stackai config {domains,tls,secrets,registry,saml,sso,env} # Configuration
stackai system {backup,restore,update,migrate,pull,build,cleanup,prune,templates,releases,import} # Operations
stackai diagnose {doctor,support,info}        # Observability

Never reference deprecated legacy top-level commands: stackai install, stackai configure, stackai start, stackai stop. Use the 5-command hierarchy above (e.g. stackai deploy start, stackai config env).

💼 Customer Context

Target: Banks, healthcare, government, manufacturing requiring:

  • 99.9% uptime SLA

  • Zero data exfiltration

  • 1-hour deployment (stackai init)

  • Self-service support (diagnose support)

Deployment profile: 64GB RAM, 16 cores, 1TB SSD Ubuntu 24.04 VMs.

Architecture: 4 layers (50+ services)

Persistence: ~/.config/stackai/data/ (gitignored, 100GB+)

🏗️ Repository Contract

Entry points:

  • install.sh → installs the stackai binary and installs bundled config files

  • stackai → 5 commands only

  • Docs: docs/COMMANDS.md

Version tracking:

  • .release file: Current deployed platform release (used to resolve service image versions)

  • versions.json: Platform release → subservice version mappings

  • CLI self-update: Atomic binary replacement with config updates

🎨 Operator Experience (Non-Negotiable)

⚖️ Contribution Principles

Core Mandates

  1. 5 commands only – No deviations, no legacy aliases

  2. On-prem first – Supports disconnected/runtime deployments; some commands perform outbound calls (e.g. stackai init public IP detection via ifconfig.me, stackai system update downloads from install.stack.ai)

  3. Security default – License-key encrypted vault for secrets, Docker credential helper for registry

  4. Backward compatible – Supports existing ~/.config/stackai/ deployments

  5. Support-first – Every failure path produces diagnose support bundle

Golden Rules

🔧 Code Standards (2025 Rust CLI)

Project Structure

Error Handling

Crate Contract

Purpose
Mandate
Prohibited

CLI

clap 4.5 derive

structopt

Async Runtime

tokio 1.0 + futures-util 0.3

std::thread blocking

Docker

bollard 0.19 + compose_spec 0.3 + indexmap 2.0

shelling out to docker for core orchestration (currently used for registry/build/diagnostics)

Prompts

inquire 0.7

dialoguer

Progress

indicatif 0.17

manual spinners

Config

figment 0.10 + serde_yaml 0.9

manual config parsing

Template

minijinja 2.0

handlebars

Colors

colored 2.0

ansi_term

Time/Date

chrono 0.4

manual time handling

Validation

regex 1.10

manual parsing

Encryption

aes-gcm 0.10 + argon2 0.5 + jsonwebtoken 10.4 + base64 0.22

keyring (headless incompatible)

Crypto Utils

rand 0.8 + uuid 1.6 + blake3 1.5

manual crypto

Compression

flate2 1.0 + tar 0.4 + zstd 0.13

manual archiving

File Ops

tempfile 3.10 + walkdir 2

unbounded blocking I/O on async hot paths (std::fs is used today; keep it off critical paths)

System Info

sysinfo 0.30

manual system calls

Error Handling

anyhow 1.0 + thiserror 1.0

panic!/unwrap

Logging

tracing 0.1 + tracing-subscriber 0.3

println!/eprintln for structured logs (UI output via print_* helpers is OK)

Reliability

backoff 0.4 + humantime 2.1

manual retries

Path Mgmt

dirs 5.0

hardcoded paths

Quality Gates

📊 Scenarios (Agent Playbook)

Scenario
Customer Goal
Exact Commands

First install

Production in 1hr

curl -sSf https://install.stack.ai | bash then stackai init

Health check

Verify uptime

stackai deploy status

Logs

Debug service

stackai deploy logs SERVICE -f

Backup

Disaster recovery

stackai system backup

List backups

Find backup to restore

stackai system backup --list

Restore

Rollback to previous state

stackai system restore backup.tar.gz

Support

Escalate

stackai diagnose support

Update

Zero-downtime update

stackai system update (automatic backup)

Update version

Update to specific version

stackai system update --version 1.0.9

Release management

List/set releases

stackai system releases list / stackai system releases set 1.0.9

Volume import

Migrate from old deployment

cd /old/path && stackai system import then stackai init

🎯 Success Criteria

🔄 Version Management & Updates

Version-Aware Updates

Implementation: Self-updatable CLI with version-aware platform updates.

Key Features:

  • CLI self-update capability (atomic binary replacement)

  • Version-aware updates with pinned subservice versions

  • Automatic backup before updates

  • Caching mechanism for downloaded releases

  • Checksum verification for security

  • Platform release tracking via .release file

  • Version mappings in versions.json

Update Flow:

  1. stackai system update creates automatic backup

  2. Resolves target version (flag → .release file → latest)

  3. Downloads release tarball (with cache support)

  4. Verifies checksum (mandatory unless --skip-checksum is used)

  5. Updates configuration files and CLI binary

  6. Updates .release file

  7. Pulls new Docker images

  8. Restarts services

Security: Checksum verification is mandatory by default to prevent supply chain attacks. Updates fail if checksums cannot be verified. Use --skip-checksum flag only in air-gapped environments.

Release Commands:

  • stackai system releases: Show current installed release and subservice versions

  • stackai system releases list: List all available releases with subservice mappings

  • stackai system releases set <VERSION>: Set target release (creates backup automatically)

Testing: Comprehensive update flow with version resolution, caching, checksum verification, and rollback capability.

💾 Backup & Restore

Comprehensive Backup System

Implementation: Complete state backup with checksum verification and version tracking.

Key Features:

  • Creates compressed .tar.gz archive with SHA256 checksum

  • Backs up PostgreSQL and MongoDB databases

  • Includes all configuration files (.env, docker-compose.yaml, config/ directory)

  • Includes version information (.release, versions.json)

  • Includes encrypted secrets vault (secrets.vault)

  • Generates manifest with metadata and component status

  • Automatic backup before updates and version changes

Backup Commands:

  • stackai system backup: Create backup (automatic before updates)

  • stackai system backup --name <NAME>: Create named backup

  • stackai system backup --list: List all available backups

Restore Features:

  • Checksum verification before restore

  • Safety backup of current state created automatically

  • Version compatibility checking

  • Complete state restoration (config, databases, versions)

Testing: Comprehensive backup/restore flow with checksum verification, version compatibility, and safety backups.

🔍 Diagnostics & Support

Comprehensive Support Bundle

Implementation: Complete diagnostic bundle for asynchronous troubleshooting.

Key Features:

  • Collects diagnostic data from all 50+ services (not just 7)

  • Automatic sanitization of sensitive information

  • Version information included

  • Error analysis and pattern detection

  • Resource usage per container

  • Network diagnostics

  • Complete configuration files (sanitized)

Support Bundle Includes:

  • System information (OS, Docker, resources)

  • Version information (CLI, StackAI, Docker images)

  • Configuration (sanitized compose and config files)

  • Container status (health, resources, restart counts)

  • Logs (last 1000 lines from all services, sanitized)

  • Error analysis (patterns, counts, summary)

  • Resource usage (CPU, memory, disk, network I/O)

  • Network diagnostics (interfaces, Docker networks, DB tests)

Security & Privacy:

  • All passwords, secrets, keys, tokens automatically redacted

  • Email addresses anonymized

  • No database contents or user data

  • Safe to send to support team

Testing: Comprehensive bundle generation with sanitization, error analysis, and resource collection.

📈 Strategic Alignment

CLI = Customer Retention Engine

Single CLI failure = $1M+ churn risk.


🔐 Security Architecture

Encrypted Secrets Vault

Implementation: License-key-based encryption using AES-256-GCM with Argon2 key derivation.

Key Features:

  • Each customer's secrets encrypted with their unique StackAI license key

  • Vault file: ~/.config/stackai/secrets.vault (600 permissions)

  • Only CLI with matching license can decrypt that customer's vault

  • HIPAA/ISO compliant for healthcare and enterprise deployments

  • Headless server compatible (no user session required)

Registry Credentials:

  • Encrypted in vault with license key (AES-256-GCM)

  • Automatically retrieved for system pull operations

  • Docker credential helper used for runtime authentication

Security Guarantees:

  • ✅ Encryption at rest (AES-256-GCM)

  • ✅ Customer isolation (license-key based)

  • ✅ No plain-text secrets in config files

  • ✅ Audit trail (license hash validation)

  • ✅ Air-gapped compatible (no external dependencies)

Testing: 13 comprehensive unit tests covering encryption, decryption, license validation, batch operations, and edge cases.

Customer Configuration Vault

Implementation: One-time customer configuration collection with license-key encrypted storage.

Key Features:

  • Customer provides all configuration once during stackai init

  • Configuration encrypted in vault using customer's license key

  • Automatic retrieval for all subsequent operations

  • No repeated credential prompts or manual configuration

Stored Configuration:

  • StackAI License Key (used for vault encryption/decryption)

  • Azure Private Registry Username & Password

  • Web URL (e.g., web.customer.com)

  • API URL (e.g., api.customer.com)

  • Supabase URL (e.g., db.customer.com)

  • S3 URL (e.g., s3.customer.com)

  • Temporal URL (e.g., temporal.customer.com)

Registry Authentication Flow:

  1. Installer (curl -sSf https://install.stack.ai | bash) collects license key and saves it into private file at ~/.config/stackai/license

  2. stackai init collects license (from file, env var or prompt), then collects registry credentials interactively

  3. Credentials stored encrypted in vault (license-key protected)

  4. stackai system pull automatically retrieves and uses credentials

  5. stackai config registry commands for manual authentication management

Init Command Behavior:

  • Always checks system prerequisites (Docker, Docker Compose)

  • Always interactive (except license can come from STACKAI_LICENSE_KEY env var)

  • Collects all configuration once, stores encrypted forever

  • All other commands require successful init completion

Security Guarantees:

  • ✅ Ask once, store forever - no repeated credential entry

  • ✅ License-key based encryption (same key as secrets vault)

  • ✅ Automatic authentication for all Docker operations

  • ✅ Private registry access without exposed plaintext credentials

Testing: 3 comprehensive integration tests covering authentication flow, vault security, and end-to-end init→pull pipeline.


Last updated

Was this helpful?