Editorial lab

Authority you reclaim is authority you run

The pitch every cloud-exit deck makes: leave the managed platform and take control back. This lab tests it on one production application. The Virtual CTO Advisor, all-in on a single cloud, migrates to the box (the DGX Spark, retained compute kept below the platform’s abstraction) until the serve path runs with no cloud credentials in the environment. The question is not whether it can run local. It is how much decision authority actually comes home, and what it costs to hold. The one-line loss: every layer you move from Ceded to Retained is a decision you now own and a system you now operate.

By Keith Townsend · July 4, 2026

The call

The verdict

One production workload, one DGX Spark, one owner. A cloud-free serve path proven end to end by executing probes. This is an authority measurement, not a cost claim and not an on-prem-at-scale claim, scoped to the system that actually ran.

DoRead a cloud exit as a Decision Authority Placement Model (DAPM) decision, layer by layer, not an all-or-nothing platform swap. Each layer moved from Ceded to Retained is authority you gain and an operational bill you accept. The migration priced that bill per layer so the trade is visible before it is signed.
DoBound the platform question with a real workload before you score it. Kubernetes is not a product, it is an assembly, and it cannot be graded in the abstract. One production application draws the boundary that makes every function testable by execution instead of by datasheet. That single move is what put a Kubernetes row in the canon.
Don’tMistake a bounded retention win for on-prem at scale. This ran one workload on one box. Cooling, power, facilities, multi-tenancy, and estate breadth never re-entered, and they change the trade.
Don’tReclaim a layer you are not ready to operate. Retained authority is Retained responsibility. The pager comes with it, and the layers you leave Ceded may be the ones worth paying someone else to hold.

Self-funded. No vendor paid for this answer, and the model that drafted the page is one of the dependencies the migration reclaimed; the findings were owner-validated against the recorded artifacts.

The walkthrough

Video

Video slot — sponsored labs fill this with a series.
The bench

How I know

The application moved to retained compute one layer at a time, and each move was recorded as a shift in decision authority. The document and vector store left Firestore for CloudNativePG and pgvector. Embeddings left a hosted API for a local model. Generation left Vertex for vLLM. Session state left Firestore for Postgres. Identity left Firebase for Keycloak. At the end the serve path ran end to end with no cloud credentials present.

Every step was validated by a probe that executed, not by a claim. The vector store survived a force-kill wipe-and-restore drill with all 3,489 chunks intact and a two-second recovery. The reasoning loop returned a grounded, cited answer over the ingress. Identity issued a real signed token and the application accepted it, refusing the request without one. The migration is the evidence, and that evidence is what a bounded Fourth Cloud assessment of Kubernetes was built from: the canon row bounded-kubernetes-fourthcloud, now live, scored on twenty-six functions earned by execution.

What the bench measured

Capabilities moved Ceded → Retained
store, embeddings, generation, session state, identity
5
Capabilities still Ceded
the metal: GB10 silicon and its driver blob
1
Corpus re-embedded locally
cloud-free ingest, no credentials, Hugging Face offline
3,489 chunks
Restore drill
the force-kill wipe replayed on purpose
3,489 / 3,489 intact, ~2s
End-to-end answer, cloud-free
grounded and cited, over the HTTPS ingress, no cloud creds
evidence 0.916
Canon functions scored from the migration
the bounded-kubernetes-fourthcloud row
26

The detail

The migration was executed as authority accounting, not as a lift-and-shift. Each managed dependency was replaced by a self-hosted equivalent and then proven by a probe. The store swap went behind the application’s existing retrieval provider seam, so the ranking logic validated on the cloud is the same code path on pgvector. The thread and session store had no such seam and had to be given one, which surfaced the real cost of retention: an eager cloud client in the import path blocked a cloud-free boot until the coupling was made lazy behind a store interface.

The kill-the-cloud test is the load-bearing proof. With no cloud credentials in the environment and the model cache offline, the application retrieved from pgvector, embedded locally, generated on vLLM, and answered in the owner’s voice. Identity was the last dependency to come home: the verify path swapped from Firebase to OpenID Connect against a Keycloak realm, and the application accepted a signed token and refused a request without one.

The single cession is the substrate. GPU access works through a runtime-class injection; GPU accounting does not, because the driver cannot report unified memory. That gap is closeable in the open-source device plugin, so it is Retained authority left unbuilt by choice, not a vendor lock. Every other layer is owned outright, and every owned layer added an operational bill that the assessment records as a priced gap rather than a silent assumption.

The obvious objection

This is a homelab. It does not scale, and the public cloud is cheaper anyway.

The lab does not claim scale or cost. It measures a different axis: authority placement. Whether the cloud is cheaper at a given utilization is a real question, and it is not this one. The finding holds independent of scale. Every capability that moved from Ceded to Retained is now a decision the owner makes and a system the owner runs.

Scale does not soften the authority result, it sharpens it. At estate scale the operational bill of each Retained layer grows, which is exactly why the placement decision has to be made deliberately rather than by default. The lab priced that bill per layer so the responsibility is legible before the migration, not discovered after it.

And the workload under test is the real production application, not a toy. It returned a grounded, cited answer with an evidence score of 0.916 over an HTTPS ingress, with no cloud credentials in the environment. The retention is measured, not asserted.

Where it belongs

Where each layer belongs

LayerWasNow
Layer 0 · Compute
Compute & Network Fabric
The metal. The GB10 and its driver stay NVIDIA’s, the one cession, and even the GPU-accounting gap above it is open source and closeable if it ever earns the effort.
CededCeded
Layer 1A · Storage
Data Storage & Governance
Firestore to CloudNativePG and pgvector, behind the app’s own retrieval seam. The bill: run the operator, drill the restore.
CededRetained
Layer 1B · Retrieval
Context Management & Retrieval
A hosted embedding API to a local model on the box. The bill: own the model and the re-index, cloud-free at ingest.
CededRetained
Layer 2A · Orchestration
Infrastructure Orchestration
A managed control plane to k3s and the operators you run, with identity reclaimed from Firebase to Keycloak on the platform’s own Postgres. The bill: day-2 is yours, though the operator carries much of it.
CededRetained
Layer 2B · Runtime
Application Runtime & Execution
Vertex to vLLM, serving the application over the ingress. The bill: the memory budget and the uptime.
CededRetained
Layer 2C · Reasoning
Agentic Infrastructure — The Reasoning Plane
The reasoning authority never left. It was always the owner and the model. Zero delegated to any platform, by design, the same zero every vendor carries.
RetainedRetained
What it opened

The two questions this lab now knows to ask

Where is the line between Retained and Delegated when you run someone else’s open-source operator?

CloudNativePG carried real day-2 for the data tenant: provisioning, failover, backup, replica creation, all drilled. The lab counts that as Retained because the code is yours to fork, but the operational reality is closer to a delegation you can revoke. A later lab can measure how far operator-encoded day-2 goes before the authority has effectively moved.

How much of the operational bill can be re-delegated without ceding the decision?

Every Retained layer added a pager. The open question is how much of that load can shift to managed open-source tooling, a backup service, an observability stack, a policy engine, while the decision stays with the owner. The assessment’s priced gaps are the map of where that trade is available.

The bound

What it did not prove

  • Not a claim about on-prem at scale. One workload, one box, one owner. Cooling, power, facilities, multi-tenancy, and estate breadth did not re-enter, and each changes the operational bill.
  • Not a cost claim. Total cost of ownership against the public cloud was not measured; the axis here is authority, not price.
  • Identity federation was proven on one tenant, not as an estate-wide identity plane.
  • The GPU-accounting gap was shown to be closeable in open source, not closed. The lab left it unbuilt on purpose.
In the author’s words

Notes from the author, Keith Townsend

I set out to migrate a workload. I thought the deliverable was a runbook, proof that a model could transcribe a migration once a human had run it. That happened, and it turned out to be the least interesting thing in the room.

The interesting part was what I could not do at first: score Kubernetes. Kubernetes is not a product, it is an assembly, and I had left it out of the canon because grading it honestly was impossible. The workload solved that. One production application became the product boundary. Inside it, every function was fair game and testable by execution. Outside it, nothing was. That one decision is what put a Kubernetes row in the instrument, as a peer to the bundled platforms, on the same functions.

Then the reframe I did not see coming. I was not measuring Kubernetes. I was measuring DAPM. Firestore to CloudNativePG, Vertex to vLLM, Firebase to Keycloak, none of those were technology swaps. Each was decision authority moving from Ceded to Retained, and each one bought me authority at the price of a system to keep alive. I thought I was building a Kubernetes platform. Looking back, I was measuring how much authority I could take back from the public cloud, and what it cost to hold it. Kubernetes was the mechanism. Authority was the thing that moved.

How it was built

Method and disclosure

Self-funded, no sponsor. The application was migrated as a clone and production was never touched. The serve path was proven with no cloud credentials in the environment.

What ships: the substrate, the authority movements, the operational bill per layer, and the raw lab detail. What stays proprietary: the corpus contents, the retrieval tuning, and the owner-authored assessment thresholds.

The quantitative record is the Fourth Cloud assessment bounded-kubernetes-fourthcloud, live in the canon at cloud.layer2c.com. This lab is the story of how a workload became the boundary that made that assessment possible.

Download the raw lab detail (Markdown)