# StackAI On-Prem CLI - Engineering Standards

## 📋 Purpose

This repository delivers **StackAI On-Prem**, a production-grade, on-prem-first AI platform for enterprise customers requiring strict data sovereignty. The `stackai` CLI orchestrates a 50+ service Docker Compose deployment across customer Ubuntu infrastructure.

**Business stakes**: $10M+ ARR Fortune 500 customers. Single CLI failure risks enterprise churn.

## 🎯 Command Contract (Immutable)

**Enforce exactly 5 top-level commands. No exceptions.**

```
stackai init           # Complete bootstrap (system → production)
stackai deploy {start,stop,status,logs,restart}  # Service lifecycle
stackai config {domains,tls,secrets,registry,saml,sso,env} # Configuration
stackai system {backup,restore,update,migrate,pull,build,cleanup,prune,templates,releases,import} # Operations
stackai diagnose {doctor,support,info}        # Observability
```

**Never reference deprecated legacy top-level commands**: `stackai install`, `stackai configure`, `stackai start`, `stackai stop`. Use the 5-command hierarchy above (e.g. `stackai deploy start`, `stackai config env`).

## 💼 Customer Context

**Target**: Banks, healthcare, government, manufacturing requiring:

* 99.9% uptime SLA
* Zero data exfiltration
* 1-hour deployment (`stackai init`)
* Self-service support (`diagnose support`)

**Deployment profile**: 64GB RAM, 16 cores, 1TB SSD Ubuntu 24.04 VMs.

**Architecture**: 4 layers (50+ services)

```
L1 Databases: postgres, mongodb, weaviate, redis, minio
L2 Supabase:  kong, auth, realtime, storage, studio, pooler
L3 Backend:   stackend(LLM), stackrepl, unstructured
L4 Frontend:  nginx(TLS), stackweb(Next.js)
```

**Persistence**: `~/.config/stackai/data/` (gitignored, 100GB+)

## 🏗️ Repository Contract

```
~/.config/stackai/ (enforced layout)
├── docker-compose.yaml     # Generated/updated by the CLI (init/update/releases); do not edit manually
├── .env                    # Generated by init/config (decrypted for docker-compose)
├── secrets.vault           # Encrypted secrets (license-key protected)
├── config/                 # Templates + versions.json
└── data/                   # Persistence + bundles
```

**Entry points**:

* `install.sh` → installs the `stackai` binary and installs bundled config files
* `stackai` → 5 commands only
* Docs: `docs/COMMANDS.md`

**Version tracking**:

* `.release` file: Current deployed platform release (used to resolve service image versions)
* `versions.json`: Platform release → subservice version mappings
* CLI self-update: Atomic binary replacement with config updates

## 🎨 Operator Experience (Non-Negotiable)

```
✅ Progress bars: indicatif MultiProgress (>2s operations)
✅ Colors: 🟢🟡🔴 + bold (colored crate)
✅ Idempotent: safe to rerun
✅ Explicit: never silent failures
✅ Actionable: every error → next command

Example:
[🟢] Layer 1: postgres healthy ✓ (45s)
Next: stackai deploy status
Dashboard: https://customer.internal:3000
```

## ⚖️ Contribution Principles

### Core Mandates

1. **5 commands only** – No deviations, no legacy aliases
2. **On-prem first** – Supports disconnected/runtime deployments; some commands perform outbound calls (e.g. `stackai init` public IP detection via `ifconfig.me`, `stackai system update` downloads from `install.stack.ai`)
3. **Security default** – License-key encrypted vault for secrets, Docker credential helper for registry
4. **Backward compatible** – Supports existing `~/.config/stackai/` deployments
5. **Support-first** – Every failure path produces `diagnose support` bundle

### Golden Rules

```
✅ Recommend: stackai init/deploy/config/system/diagnose
❌ Never:     legacy top-level commands like install/configure (removed - use new structure)
✅ Always:    `~/.config/stackai/` paths via directories-next crate
🟡 Prefer:    bollard API client + compose_spec for orchestration/parsing
🟡 Current:   some operations shell out to `docker` (registry auth/validation, builds, diagnostics, prereq checks)
✅ Always:    Only include code that's actually used
```

## 🔧 Code Standards (2025 Rust CLI)

### Project Structure

```
src/
├── main.rs              # Thin CLI parser → dispatch
├── cli.rs               # Command structure (5 top-level commands)
├── commands/            # Command implementations organized by CLI hierarchy
│   ├── init.rs          # 🏠 Complete bootstrap (top-level)
│   ├── deploy/          # 🚀 Service management
│   │   ├── mod.rs
│   │   ├── start.rs     # deploy start
│   │   ├── stop.rs      # deploy stop
│   │   ├── restart.rs   # deploy restart
│   │   ├── status.rs    # deploy status
│   │   └── logs.rs      # deploy logs
│   ├── config/          # ⚙️ Configuration
│   │   ├── mod.rs
│   │   ├── domains.rs   # config domains
│   │   ├── tls.rs       # config tls
│   │   ├── secrets.rs   # config secrets
│   │   ├── registry.rs  # config registry
│   │   ├── saml.rs      # config saml
│   │   ├── sso.rs       # config sso
│   │   └── configure.rs # config env
│   ├── system/          # 🛠️ System management
│   │   ├── mod.rs
│   │   ├── backup.rs    # system backup
│   │   ├── restore.rs   # system restore
│   │   ├── update.rs    # system update
│   │   ├── migrate.rs   # system migrate
│   │   ├── pull.rs      # system pull
│   │   ├── build.rs     # system build
│   │   ├── cleanup.rs   # system cleanup
│   │   ├── prune.rs     # system prune
│   │   ├── templates.rs # system templates
│   │   └── releases.rs  # system releases
│   ├── diagnose/        # 🔍 Diagnostics
│   │   ├── mod.rs
│   │   ├── doctor.rs    # diagnose doctor
│   │   ├── support.rs   # diagnose support
│   │   └── info.rs      # diagnose info
│   └── mod.rs           # Common utilities (print_*, prompt_*)
├── docker.rs            # bollard Docker API abstraction
├── compose.rs           # compose_spec wrapper for docker-compose.yaml parsing
├── config.rs            # YAML config management (load_yaml/save)
├── vault.rs             # Encrypted secrets vault (license-key based)
└── validation.rs       # Input validation (service names, domains, IPs, versions)
```

### Error Handling

```
pub async fn run_init(args: &InitArgs) -> anyhow::Result<()> {
    Context::new(work_dir)?;
    bootstrap::system(&ctx, args).await?;
    config::generate(&ctx, args).await?;
    deploy::layers(&ctx).await?;
    Ok(())
}
```

### Crate Contract

| Purpose        | Mandate                                                     | Prohibited                                                                                      |
| -------------- | ----------------------------------------------------------- | ----------------------------------------------------------------------------------------------- |
| CLI            | clap 4.5 derive                                             | structopt                                                                                       |
| Async Runtime  | tokio 1.0 + futures-util 0.3                                | std::thread blocking                                                                            |
| Docker         | bollard 0.19 + compose\_spec 0.3 + indexmap 2.0             | shelling out to `docker` for core orchestration (currently used for registry/build/diagnostics) |
| Prompts        | inquire 0.7                                                 | dialoguer                                                                                       |
| Progress       | indicatif 0.17                                              | manual spinners                                                                                 |
| Config         | figment 0.10 + serde\_yaml 0.9                              | manual config parsing                                                                           |
| Template       | minijinja 2.0                                               | handlebars                                                                                      |
| Colors         | colored 2.0                                                 | ansi\_term                                                                                      |
| Time/Date      | chrono 0.4                                                  | manual time handling                                                                            |
| Validation     | regex 1.10                                                  | manual parsing                                                                                  |
| Encryption     | aes-gcm 0.10 + argon2 0.5 + jsonwebtoken 10.4 + base64 0.22 | keyring (headless incompatible)                                                                 |
| Crypto Utils   | rand 0.8 + uuid 1.6 + blake3 1.5                            | manual crypto                                                                                   |
| Compression    | flate2 1.0 + tar 0.4 + zstd 0.13                            | manual archiving                                                                                |
| File Ops       | tempfile 3.10 + walkdir 2                                   | unbounded blocking I/O on async hot paths (std::fs is used today; keep it off critical paths)   |
| System Info    | sysinfo 0.30                                                | manual system calls                                                                             |
| Error Handling | anyhow 1.0 + thiserror 1.0                                  | panic!/unwrap                                                                                   |
| Logging        | tracing 0.1 + tracing-subscriber 0.3                        | println!/eprintln for structured logs (UI output via `print_*` helpers is OK)                   |
| Reliability    | backoff 0.4 + humantime 2.1                                 | manual retries                                                                                  |
| Path Mgmt      | dirs 5.0                                                    | hardcoded paths                                                                                 |

### Quality Gates

```
cargo fmt --check
cargo clippy --all-features -- -D warnings
cargo test --all-features --doc
cargo audit
cargo udeps
```

## 📊 Scenarios (Agent Playbook)

| Scenario           | Customer Goal               | Exact Commands                                                       |
| ------------------ | --------------------------- | -------------------------------------------------------------------- |
| First install      | Production in 1hr           | `curl -sSf https://install.stack.ai \| bash` then `stackai init`     |
| Health check       | Verify uptime               | `stackai deploy status`                                              |
| Logs               | Debug service               | `stackai deploy logs SERVICE -f`                                     |
| Backup             | Disaster recovery           | `stackai system backup`                                              |
| List backups       | Find backup to restore      | `stackai system backup --list`                                       |
| Restore            | Rollback to previous state  | `stackai system restore backup.tar.gz`                               |
| Support            | Escalate                    | `stackai diagnose support`                                           |
| Update             | Zero-downtime update        | `stackai system update` (automatic backup)                           |
| Update version     | Update to specific version  | `stackai system update --version 1.0.9`                              |
| Release management | List/set releases           | `stackai system releases list` / `stackai system releases set 1.0.9` |
| Volume import      | Migrate from old deployment | `cd /old/path && stackai system import` then `stackai init`          |

## 🎯 Success Criteria

```
Deployment:     <1hr (init → production)
Backup/Restore: <30min roundtrip (with checksum verification)
Support MTTR:   <4hr (comprehensive bundle → async fix)
Upgrade:        Zero-downtime rolling (automatic backup)
Version Mgmt:   Version-aware updates with pinned subservices
Learning curve: 2min (5 commands)
Security:       License-key encrypted vault (HIPAA/ISO compliant)
```

## 🔄 Version Management & Updates

### Version-Aware Updates

**Implementation**: Self-updatable CLI with version-aware platform updates.

**Key Features**:

* CLI self-update capability (atomic binary replacement)
* Version-aware updates with pinned subservice versions
* Automatic backup before updates
* Caching mechanism for downloaded releases
* Checksum verification for security
* Platform release tracking via `.release` file
* Version mappings in `versions.json`

**Update Flow**:

1. `stackai system update` creates automatic backup
2. Resolves target version (flag → `.release` file → latest)
3. Downloads release tarball (with cache support)
4. **Verifies checksum** (mandatory unless `--skip-checksum` is used)
5. Updates configuration files and CLI binary
6. Updates `.release` file
7. Pulls new Docker images
8. Restarts services

**Security**: Checksum verification is mandatory by default to prevent supply chain attacks. Updates fail if checksums cannot be verified. Use `--skip-checksum` flag only in air-gapped environments.

**Release Commands**:

* `stackai system releases`: Show current installed release and subservice versions
* `stackai system releases list`: List all available releases with subservice mappings
* `stackai system releases set <VERSION>`: Set target release (creates backup automatically)

**Testing**: Comprehensive update flow with version resolution, caching, checksum verification, and rollback capability.

## 💾 Backup & Restore

### Comprehensive Backup System

**Implementation**: Complete state backup with checksum verification and version tracking.

**Key Features**:

* Creates compressed `.tar.gz` archive with SHA256 checksum
* Backs up PostgreSQL and MongoDB databases
* Includes all configuration files (`.env`, `docker-compose.yaml`, `config/` directory)
* Includes version information (`.release`, `versions.json`)
* Includes encrypted secrets vault (`secrets.vault`)
* Generates manifest with metadata and component status
* Automatic backup before updates and version changes

**Backup Commands**:

* `stackai system backup`: Create backup (automatic before updates)
* `stackai system backup --name <NAME>`: Create named backup
* `stackai system backup --list`: List all available backups

**Restore Features**:

* Checksum verification before restore
* Safety backup of current state created automatically
* Version compatibility checking
* Complete state restoration (config, databases, versions)

**Testing**: Comprehensive backup/restore flow with checksum verification, version compatibility, and safety backups.

## 🔍 Diagnostics & Support

### Comprehensive Support Bundle

**Implementation**: Complete diagnostic bundle for asynchronous troubleshooting.

**Key Features**:

* Collects diagnostic data from all 50+ services (not just 7)
* Automatic sanitization of sensitive information
* Version information included
* Error analysis and pattern detection
* Resource usage per container
* Network diagnostics
* Complete configuration files (sanitized)

**Support Bundle Includes**:

* System information (OS, Docker, resources)
* Version information (CLI, StackAI, Docker images)
* Configuration (sanitized compose and config files)
* Container status (health, resources, restart counts)
* Logs (last 1000 lines from all services, sanitized)
* Error analysis (patterns, counts, summary)
* Resource usage (CPU, memory, disk, network I/O)
* Network diagnostics (interfaces, Docker networks, DB tests)

**Security & Privacy**:

* All passwords, secrets, keys, tokens automatically redacted
* Email addresses anonymized
* No database contents or user data
* Safe to send to support team

**Testing**: Comprehensive bundle generation with sanitization, error analysis, and resource collection.

## 📈 Strategic Alignment

**CLI = Customer Retention Engine**

```
Enterprise ($250k+/yr): 100% uptime mandatory
Premium ($100k/yr):     Self-service support
Trial (Free):           Frictionless onboarding
```

**Single CLI failure = $1M+ churn risk.**

***

## 🔐 Security Architecture

### Encrypted Secrets Vault

**Implementation**: License-key-based encryption using AES-256-GCM with Argon2 key derivation.

**Key Features**:

* Each customer's secrets encrypted with their unique StackAI license key
* Vault file: `~/.config/stackai/secrets.vault` (600 permissions)
* Only CLI with matching license can decrypt that customer's vault
* HIPAA/ISO compliant for healthcare and enterprise deployments
* Headless server compatible (no user session required)

**Registry Credentials**:

* Encrypted in vault with license key (AES-256-GCM)
* Automatically retrieved for `system pull` operations
* Docker credential helper used for runtime authentication

**Security Guarantees**:

* ✅ Encryption at rest (AES-256-GCM)
* ✅ Customer isolation (license-key based)
* ✅ No plain-text secrets in config files
* ✅ Audit trail (license hash validation)
* ✅ Air-gapped compatible (no external dependencies)

**Testing**: 13 comprehensive unit tests covering encryption, decryption, license validation, batch operations, and edge cases.

### Customer Configuration Vault

**Implementation**: One-time customer configuration collection with license-key encrypted storage.

**Key Features**:

* Customer provides all configuration once during `stackai init`
* Configuration encrypted in vault using customer's license key
* Automatic retrieval for all subsequent operations
* No repeated credential prompts or manual configuration

**Stored Configuration**:

* StackAI License Key (used for vault encryption/decryption)
* Azure Private Registry Username & Password
* Web URL (e.g., web.customer.com)
* API URL (e.g., api.customer.com)
* Supabase URL (e.g., db.customer.com)
* S3 URL (e.g., s3.customer.com)
* Temporal URL (e.g., temporal.customer.com)

**Registry Authentication Flow**:

1. Installer (`curl -sSf https://install.stack.ai | bash`) collects license key and saves it into private file at `~/.config/stackai/license`
2. `stackai init` collects license (from file, env var or prompt), then collects registry credentials interactively
3. Credentials stored encrypted in vault (license-key protected)
4. `stackai system pull` automatically retrieves and uses credentials
5. `stackai config registry` commands for manual authentication management

**Init Command Behavior**:

* Always checks system prerequisites (Docker, Docker Compose)
* Always interactive (except license can come from `STACKAI_LICENSE_KEY` env var)
* Collects all configuration once, stores encrypted forever
* All other commands require successful `init` completion

**Security Guarantees**:

* ✅ Ask once, store forever - no repeated credential entry
* ✅ License-key based encryption (same key as secrets vault)
* ✅ Automatic authentication for all Docker operations
* ✅ Private registry access without exposed plaintext credentials

**Testing**: 3 comprehensive integration tests covering authentication flow, vault security, and end-to-end init→pull pipeline.

***


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.stackai.com/stackai-auto-cli/agents.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
