# Lab detail — Authority you reclaim is authority you run

Raw substrate engineering behind the lab. The ruling and the numbers live on the page; this is
the record a practitioner needs to reproduce the shape of the work.

**What ships here:** the box, the layer-by-layer authority ledger, every wall hit and the fix, the
measured numbers, and the three reusable methods. Returns, not algorithms.

**What stays proprietary:** the corpus contents, the retrieval tuning (chunking, pool sizes, and the
recency logic), the owner-authored assessment thresholds, and the Fourth Cloud taxonomy. None of
that is required to reproduce the engineering; all of it is the moat.

**Secret handling:** no cloud credentials, API keys, or private keys appear in this file or in the
lab notes it derives from. The one owner-provided key used during the run was delivered over stdin
to a `chmod 600` env file on the box, never echoed, never logged, never committed.

---

## The box and the workload

**The box** is retained compute kept below the platform's abstraction: an NVIDIA DGX Spark, GB10
superchip, aarch64, DGX OS (Ubuntu 24.04), 119 GB unified memory, Docker unprivileged, NVIDIA
Container Toolkit. Reached over SSH through the vendor's sync tunnel.

**The workload** is a production advisory application (Flask, retrieval-augmented chat). It ran
all-in on one public cloud: Firestore for documents, vectors, threads, and idempotency; a hosted
embedding API; Vertex for generation; Firebase for identity; Firebase Hosting at the edge. It was
migrated as a clone. Production was never touched.

**The platform** assembled on the box: k3s (v1.36.x), CloudNativePG (CNPG) with bundled pgvector,
MinIO, Keycloak, vLLM, a local embedding model, Traefik. Every piece self-hosted.

---

## The authority ledger

Each managed dependency was replaced by a self-hosted equivalent and proven by a probe. Under the
Decision Authority Placement Model (DAPM), each move is authority shifting from Ceded to Retained,
and each carries an operational bill.

| Layer | Was (Ceded) | Now (Retained) | The bill you inherit |
|---|---|---|---|
| L0 Substrate | NVIDIA (metal) | NVIDIA (metal) | The one cession. Silicon and driver stay the vendor's. |
| L1A Store | Firestore | CloudNativePG + pgvector | Run the operator; drill the restore. |
| L1B Embeddings | hosted API | local model, CPU | Own the model and the re-index. |
| L2A Control + identity | managed platform, Firebase | k3s + operators, Keycloak | Day-2 is yours; the operator carries much of it. |
| L2B Execution | Vertex | vLLM | The memory budget and the uptime. |
| L2C Reasoning | (already yours) | owner + model | Never moved. Zero delegated by design. |

Five capabilities moved Ceded to Retained: store, embeddings, generation, session state, identity.
One stayed Ceded: the metal.

---

## The walls, and the fixes

The valuable half of any migration. Symptom, root cause, fix.

**1. Firestore emulator needs a JDK.** firebase-tools would not start the Firestore emulator
without Java 21. Fix: installed a Temurin JRE into the user's home, no root.

**2. LAN batch writes broke the pipe.** Importing the corpus over the LAN failed with a broken
pipe around 400-document batches. Fix: exported to JSONL and imported box-side in 25-document
batches. Slower, deterministic, survivable.

**3. Emulator client resolved IPv6, server bound IPv4.** The gRPC client dialed `::1` while the
emulator listened on `127.0.0.1`, so every call was refused. Pinning the host to IPv4 surfaced the
deeper truth: the emulator's data was in memory only, and a later restart wiped it. Ephemeral
scaffolding is not a data plane. That wipe became a deliberate test later.

**4. NVML cannot see unified memory.** The standard NVIDIA device plugin crash-loops on the GB10
with `error getting device memory: Not Supported`, so there is no schedulable `nvidia.com/gpu`
capacity, no quota, no scheduling. Fix: RuntimeClass injection (`NVIDIA_VISIBLE_DEVICES=all`) grants
GPU access without accounting. The accounting gap is closeable in the open-source device plugin,
left unbuilt on purpose.

**5. Split image stores.** A 20 GB image cached in Docker's store is invisible to k3s containerd,
so the pod fails `ErrImageNeverPull`. Fix: `docker save | k3s ctr images import`. The one step that
needs root on the box.

**6. A Service named `vllm` crashed vLLM.** Naming the Kubernetes Service `vllm` made k8s inject a
legacy Docker-link environment variable, `VLLM_PORT=tcp://<clusterIP>:8000`, which vLLM read as its
own port config and rejected as a malformed URI. vLLM's own docs name this footgun. Fix:
`enableServiceLinks: false` on the pod, and never name a Service to collide with a tenant's env
prefix.

**7. Unified memory has no arbitration.** The vLLM pod started to find 5.56 of 119.7 GiB free
because a host-side process still held its 80% reservation. The platform has no visibility into
host-side consumers of the shared pool. Fix: stop the host tenant, which is to say, complete the
migration. Measured budget with model, k3s, and CNPG resident: 107 of 119 GB used.

**8. Keycloak OOM on startup, not steady state.** Quarkus' augmentation step was OOM-killed at a
1500Mi limit and stable at 3Gi. Lesson: on shared memory, startup spikes are budgeted, not
discovered.

**9. The boot blocker you cannot see in the import graph.** The chat package constructed a Firestore
client at module import, as a side effect of a module-scope singleton. It is not an `import`
statement, so static analysis misses it, and it blocked a credential-free boot. Fix: a store seam
with a lazy client. Found by the cold-import probe below, which then surfaced the next two boot
requirements one at a time: a required cookie-signing secret, and a second eager cloud client in a
dead async feature (made non-fatal).

**10. `kubectl exec -i` steals your script.** Inside a `bash -s` stream, `kubectl exec -i` reads the
same stdin the shell is reading the script from, and silently consumes the rest of it. Fix: drop
`-i` on any exec that passes its command as arguments.

**11. Readiness probed a 401.** The pod served for minutes while Kubernetes called it not-ready,
because the readiness path returned 401 by design (it requires a guest identity). Fix: point the
probe at a health endpoint that returns 200.

**12. PID 1 was the shell, not the server.** The container ran `sh -c "gunicorn ..."`, so PID 1 was
`sh`. `kill -HUP 1` went to the shell, which ignored it, and the server never reloaded. Fix: find
the gunicorn master (its parent is PID 1) and signal that.

**13. Gate on Ready, not phase.** An early probe exec'd against a pod by name before it was
exec-ready; the cluster reports a healthy phase before the pod can accept an exec. A false failure.
Lesson: gate on pod Ready, not cluster phase.

---

## The numbers

| Measure | Value |
|---|---|
| Capabilities moved Ceded to Retained | 5 |
| Capabilities still Ceded | 1 (the metal) |
| Corpus re-embedded locally | 3,489 chunks, cloud-free ingest, model cache offline |
| Restore drill | 3,489 / 3,489 vectors intact, recovery ~2s |
| End-to-end answer, cloud-free | grounded and cited, evidence 0.916, over HTTPS ingress |
| Measured memory budget | 107 / 119 GB (model + k3s + CNPG) |
| Warm local embed latency | ~50 ms |
| Canon functions scored from the migration | 26 |

---

## The methods (reusable)

**Kill-the-cloud proof.** Run the serve path with every cloud credential unset and the model cache
forced offline. If it still retrieves, embeds, generates, and answers, the path is local, proven by
the absence of a fallback rather than by inspection. A credential that is present cannot be trusted
to be unused; a credential that is absent leaves no doubt.

**Cold-import probe.** To find import-time couplings, import the application entrypoint in a
credential-denied sandbox. Whatever it demands to load, it just declared as a boot requirement.
Deterministic, one line, and it walks the requirements one at a time. It does not attempt static
detection of construction-at-import, which is most of the way to writing an interpreter. Execution
is the cheaper oracle.

**Assessment by deployment.** Kubernetes cannot be scored as a product because it is an assembly.
Bind it to a single production workload and the workload becomes the product boundary: everything
inside it is testable, everything outside it is out of scope. Every capability score is then earned
by a validator that executes during a real deployment, not inferred from a datasheet. The
quantitative output is the Fourth Cloud row `bounded-kubernetes-fourthcloud`, live in the canon.

---

## Boundary, restated

Reproduce the engineering from this document. The corpus, the retrieval tuning, the assessment
thresholds, and the taxonomy are not here, and they are not needed to repeat the shape of the work.
The lab is the story of how a workload became the boundary that made the assessment possible. The
authority is the thing that moved.
