# Lab detail — Migration Control Plane (why LLM-authored control planes drift)

Raw lab detail for labs.layer2c.com/labs/migration-control-plane. A construction experiment: a
frontier foundation model built the control plane from the published operating model
("From Migration Factory to Migration Control Plane," thectoadvisor.com, 2026-06-04), with the
author's production estate as the workload and the author as the only validator of the build.

**What ships:** the findings, the construction census, the taxonomy of inventory truth, the
failure specimens, the standing gates, the worker-versus-author law. **What stays proprietary:**
the scheduler ruleset, the inspector implementations, the landing-zone profiles, and the
estate-specific evidence. Returns, not algorithms.

## The question

Original: can a real migration run under the control-plane operating model, metered step by
step? Refined by the run: who may author the patterns and validators of a deterministic process
that uses LLMs as workers — and what happens when the authoring is delegated to the strongest
available model in a domain without documented patterns?

## The setup

- **Operating model under test:** the published migration control plane — playbook-driven,
  deterministic first, LLM-as-needed, validators determine done, humans hold exceptions.
- **Builder/subject:** a frontier LLM operating autonomously against the whitepaper.
- **Workload:** five real applications (two static SPAs, an RSS ingestion app, an SPA with a
  coupled backend, and a production AI advisory application), classified against two
  destination landing zones (a hyperscaler and an owned single-node system).
- **Validator of the construction:** the human owner. Nothing else existed yet — that absence
  became the experiment.

## What the deterministic half did (it worked)

- **Classification:** 5 apps × 2 destinations. Known-pattern through prohibited, with authority
  granted per the taxonomy (constrained automation, test scope, human review before promotion).
- **The refusal:** the scheduler classified the owned box as prohibited for the production app
  on absent critical capability classes — deterministically reproducing, at zero cost, the
  discovery a failed manual migration had made expensively. The refusal discriminates by
  workload: the same box classified viable-with-exceptions for a static site.
- **Inventory truth, three layers:** the published architecture read disagreed with the repo in
  3 places (runtime, retrieval tier, a missing service); the repo disagreed with reality in 2
  more (a wired-but-deprecated message bus; a load-bearing warehouse with zero code artifacts).
  Corrections flow one way: docs yield to code, code yields to the organization — and the
  organizational layer enters only through accountable human exception records (both
  directions: exclude-dead, add-invisible).
- **Runtime probes:** resource-level presence as the signal (enabled-API lists over-signal).
  The probes independently rediscovered the artifact-less warehouse and prompted for owner
  confirmation — the human-memory dependency converted into a deterministic question.

## The failure specimens (the lab's actual product)

0. **The ignored spec (the more damning class).** These failures were not undocumented-domain
   improvisation — the requirements sat in the published spec and were not followed. The paper
   enumerates ~15 scheduler input classes; operating autonomously, the model ran one (repo
   artifacts), and both misses the owner caught (a runtime-configured warehouse with no code
   artifact; the existing test/validator surface) map to classes explicitly on the documented
   list. The paper states a playbook begins as a human-governed migration pattern, never as an
   AI-generated artifact; the model authored one anyway (specimen 1). The paper maps
   known-pattern-with-exceptions to medium confidence; the model coded high. Fix, in every
   case, was the same: encode the requirement as deterministic structure that refuses to
   proceed — one inspector per input class with explicit not_inspected declarations, a
   provenance gate that refuses to load, a paper-faithful confidence map in the versioned
   ruleset. Documentation did not constrain the worker; encoding did.

1. **The tabletop playbook.** Asked to build the playbook component, the model produced a
   governed artifact — detection, transforms, validators, lifecycle — registered it as a draft,
   executed it, and passed it 6/6. No migration of the class had ever been observed; the
   validators were inventions; the system graded its own homework. Structural fix: provenance
   is mandatory (a playbook is a recipe from an observed successful migration plus the smoke
   tests actually run on both sides); a hypothesis is a `candidate` that may detect and never
   validate; done is never self-conferred.
2. **The missing baseline.** The model's old-world validators proved the site builds — nothing
   proved the live site worked before migration. Fix: a baseline phase captures the serving
   site as evidence (status, bytes, content hash for the home page, every route in the site's
   own sitemap, and discovery files), with a mechanical gate: no passing baseline, no
   transform; a failing baseline escalates (an unverifiable site cannot be migrated, only
   ruled on).
3. **The deceptive pass (namespace).** With domains not transferring, byte-identical comparison
   across origins passes precisely because both sides embed the old namespace — blessing a copy
   that self-identifies as a domain it doesn't serve. Proven live: two origins serving the same
   artifact compared 12/12 byte-identical while the new origin served 18 old-host
   self-references (canonical/og, sitemap locs, discovery links). Fix: namespace is a
   first-class baseline layer (DNS binding, redirects, embedded self-references); comparison
   joins on path; content verdict and namespace verdict are separate, always; migration scope
   must declare domain_transfer explicitly, converting the orphaned identity into a scoped,
   owner-acknowledged exception.

## The construction census (the number that matters)

| Measure | Value |
|---|---|
| Owner interventions counted as material corrections | Nine (they caught every material defect; corrections and standing gates do not map one-to-one) |
| Material defects the model self-caught by reflection | 0 |
| Model self-corrections from reality contact (live runs failing) | several — all code-level |
| Corrections encoded as standing gates | 5 (provenance, baseline, namespace audit, consistency check, confidence cap) |
| Model bugs later caught by its own encoded gates | 2, on the gate's first run |
| Total spend | about the price of a coffee; the expensive part never ran — no transform executed, because the control plane refused before spend |

Sorting the correction log by who caught what draws the line: every code-level error (an import
parser, a hostname regex, a duplicate derivation) was caught by deterministic checks or failing
runs. Every domain-level error (provenance, baseline, namespace) was caught by the owner, and
only the owner.

## The law

- **Documentation is necessary, not sufficient.** The model ignored spec requirements it had in
  hand until they were encoded as deterministic gates. A documented process governs an LLM
  worker only once it is code.
- With a **mature documented pattern**, delegating the build to a model is a decision:
  transcription, checkable by a non-expert against the document. Authority retained (DAPM:
  Delegated).
- **Without the pattern**, the model becomes the definer of correct in a domain it is not
  expert in, checkable only against reality by an expert — and nobody decided that. Authority
  ceded by drift.
- The ratchet is the honest mechanism: each human correction, encoded as deterministic
  structure, permanently retires its error class and begins catching the model's subsequent
  errors. The control plane is accumulated human judgment made deterministic.
- Corollary to the readiness model, now measured: what you cannot yet describe, you cannot yet
  automate — and you cannot delegate the describing of it either.

## What did NOT get settled (honesty)

- The control-plane model itself, under mature human-authored playbooks, was not tested — its
  authority warnings are what the lab kept confirming.
- No migration executed; no transform-time findings.
- One model, one run, autonomous. The run shows model strength did not solve the authority
  problem here; it does not establish that stronger models always drift.
- One expert owner, colocated with the evidence. Correction latency at organizational distance
  was not measured.
- Economics were not the subject.
