how it all works

under the hood

for the people who want to know what's actually happening when they hit "Launch." the tools, the tradeoffs, the reasoning.

the stack

Velveteen is a single Next.js application with a separate worker process. Both talk to the same PostgreSQL database. There's no microservice architecture, no Kubernetes, no message broker. The entire thing runs on one Hetzner VPS. I chose boring infrastructure on purpose. Reliability matters more than elegance at this stage, and every added service is another thing that can fail at 2 AM.

Next.js 16app & api

App Router, TypeScript, Tailwind. Handles the frontend, all API routes, and authentication. No separate backend service. Server components by default, client components only where interactivity demands it.

PostgreSQL + Drizzle ORMdatabase

Drizzle sits close to the SQL without the weight of Prisma. Type-safe queries, straightforward migrations, no runtime overhead. Postgres handles everything: users, projects, deploys, findings, and the job queue.

graphile-workerjob queue

Background jobs backed by Postgres. No Redis dependency, no separate queue infrastructure. The worker runs as its own Node process, picks up jobs from the database, and processes the deploy pipeline.

Caddyreverse proxy

Handles wildcard TLS certificates and routes subdomains to user containers automatically. Caddy's automatic HTTPS matters here because every new subdomain needs a valid cert, and managing that manually would be a nightmare.

Dockercontainer runtime

Each user app runs in its own container with hard resource limits: 256MB memory, half a CPU, no privileged access. Containers are isolated on a shared Docker network. Caddy routes traffic to them by subdomain.

why not kubernetes?

The target capacity right now is single-digit concurrent apps on a single machine. Kubernetes solves orchestration problems I don't have yet, and Docker Compose does the job fine. I'll revisit when user count actually pushes past what one box can handle.

the ai layer

Velveteen uses two Anthropic models for different jobs. The split is intentional. Scaffolding generation requires the model to reason about an entire codebase, understand framework conventions, and produce production-quality configuration files. That's a job for a larger model. Explaining scan findings is a structured transformation: take JSON input, produce plain English output. A smaller, faster model handles that perfectly well at a fraction of the cost.

Claude Sonnet 4.6scaffolding generation

Generates Dockerfiles, coding standards documents, .dockerignore files, and environment variable documentation. These tasks require understanding the full codebase: what framework is being used, how the project is structured, what patterns already exist. Sonnet reads the codebase summary and produces files that actually fit the project, not generic templates.

Claude Haiku 4.5finding explanations

Takes each security finding and produces three things: a plain-English explanation of what was found, a one-sentence explanation of why it matters, and a copy-paste prompt the user can hand to their coding AI to fix it. Haiku runs once per finding, in parallel. Fast and cheap enough to scale without worrying about cost per deploy.

why two models instead of one?

Cost and speed. A Haiku call to explain a single finding takes a fraction of a second and costs almost nothing. Running Sonnet for every finding explanation would be wildly wasteful given how structured the input and output are. Sonnet earns its cost on the scaffolding work where it actually needs to reason about the whole codebase.

the deploy pipeline

When you hit "Launch," a single background job kicks off and runs through seven stages. Each stage updates the deploy status in the database, and the frontend picks up changes via server-sent events polling every two seconds.

clone

Shallow clone of your repo's main branch into a temporary directory. Just the latest commit, no full history.

framework detection

Reads your project files to figure out what you built. Checks package.json for Next.js, React, Express. Checks requirements.txt and pyproject.toml for Flask, FastAPI, Django. If the heuristics don't match, Sonnet takes a look.

scaffold

Sonnet analyzes your codebase and generates the infrastructure files you're missing. A Dockerfile tailored to your framework with multi-stage builds, health checks, and a non-root user. A .dockerignore, a STANDARDS.md based on your existing code patterns, and a .env.example documenting every environment variable your code references.

scan

Four scanners run against your code and the built container image. Each scanner's output gets parsed into a normalized finding format with a severity level and blocking determination.

explain

Every finding gets sent to Haiku in parallel. Each comes back with a plain-English explanation and a fix prompt referencing your actual files and code.

gate

If any finding is marked as blocking, the deploy stops here. The status page shows exactly what needs fixing and how. No blocking findings? Through to build.

build & deploy

The Docker image gets tagged, any existing container gets stopped, and a new one spins up with resource limits. Caddy routes your subdomain to it. Your app has a URL.

the security scanners

Four tools, each looking for a different category of problem. The blocking logic is opinionated: leaked secrets always stop a deploy. Informational findings like the SBOM never block. The stuff in between follows a severity threshold.

Gitleaks

Scans your code and git history for secrets: API keys, passwords, tokens - anything that shouldn't be in a repo.

always blocking

Semgrep

Static analysis that catches insecure code patterns. SQL injection risks, unsafe deserialization, XSS vectors.

blocks on error severity

Trivy

Scans the built container image for known vulnerabilities in your dependencies and base image.

blocks on critical/high

Syft

Generates a full software bill of materials. Every dependency, every version. Useful for auditing.

never blocking

why these four specifically?

They cover the four main attack surfaces: secrets in code (Gitleaks), insecure code patterns (Semgrep), vulnerable dependencies (Trivy), and full dependency visibility (Syft). All four are open source, output structured JSON, and don't require paid tiers.

real-time status

The deploy status page uses server-sent events (SSE) from a Next.js API route. The worker updates deploy status and findings in Postgres as each stage completes. The SSE endpoint polls the database every two seconds and pushes changes to the browser.

It's not the most elegant approach. Postgres LISTEN/NOTIFY would give true push updates with no polling overhead. But SSE with short polling is easy to implement and easy to reason about when something breaks. A two-second delay is imperceptible when you're watching a deploy that takes a few minutes anyway.

hosting & isolation

Everything runs on a Hetzner CCX13: 2 vCPUs, 8GB of RAM, 80GB disk. The Velveteen app, the worker process, Postgres, Caddy, and user containers all share the same box. That's a tight fit, but I'd rather run lean and upgrade when there's a real reason to than throw money at infrastructure for a scale I haven't hit yet.

Every user container is isolated: capped at 256MB of memory and half a CPU core, no privileged access, running on a Docker network that's separate from the host. If a container crashes or misbehaves, it can't take the rest of the system down with it.

Realistic capacity is around three to five concurrent user apps alongside the platform itself. Plenty to prove the concept and get real users through the pipeline. Scaling from here means a bigger box first, and eventually splitting the worker onto its own machine.