From a Single Terraform Root to Modules + Live: Phased Migration, Import, and State Surgery
This post records one thing I keep coming back to: if Terraform starts with only one root, and later you want to add dev / prod, add a shared layer, and add CI/CD boundaries, how do you migrate without affecting the existing environment?
My final conclusion is simple:
- do not rewrite everything in one shot
- do not split CI/CD first
- do not delete old roots before new roots converge
The method that actually works is phased migration.
What the pre-migration problem looks like
A very common early Terraform layout looks like this:
|
|
This layout works well in the early stage because it is fast.
But once requirements grow, multiple problems appear at the same time:
- shared resources and env resources live in the same root
- service root depends on platform root assumptions
- secrets are not environment-safe
- production domain logic leaks into dev
- CI/CD is coupled to legacy roots
At this point, the most dangerous thought is: delete the old roots directly and switch to a new structure.
That usually does not succeed.
My target structure
The goal is not just renaming folders. The goal is changing the ownership model.
|
|
The key points of this target structure are:
modules/*defines reusable infrastructure contractslive/*owns real state and environment realization
Migration order matters more than final structure
I think the most important thing is not what the end state looks like, but migration order.
I write the order this way:
- create new modules and live roots
- keep legacy roots during migration
- move or import resources into new live roots
- let new roots converge to a clean state
- then switch CI/CD to the new paths
- delete legacy roots only at the end
If you move step 5 and step 6 earlier, it is easy to break things.
Resource Adoption Method Selection
This is the most practical migration question.
I split it this way:
Cases for Adopting Existing Remote Resources
This bucket maps to terraform import for resources that already exist in the cloud provider and just are not yet owned by the new root:
- Cloud Run service
- Cloud SQL instance
- service account
- IAM member
- Artifact Registry repository
Using import is the natural choice for these resources:
|
|
Cases for Internal Ownership Moves
This bucket maps to terraform state mv, and the most typical one is random_password.
For this type of resource, the remote system does not directly keep the exact value represented in Terraform state. If you re-import or recreate it, you may not be “adopting an existing value”; you may be “generating a new value.”
That is very dangerous for things like DB passwords.
|
|
In this case, what you must protect is not the resource name, but the value itself.
Initial Cutover Edge Cases
In theory, we all want No changes as soon as the new root is in place.
In practice, first migration usually has exceptions, especially these cases:
- existing custom domain mapping
- manually populated secrets
- runtime values that were never written in tfvars
- old resources with names that no longer match the new contract
These items usually do not naturally fall into an idealized automation flow.
So now I split migration into two step types:
| Type | Description |
|---|---|
| structural migration | modules, roots, state ownership |
| first deploy exception handling | import-only resources, secret copy, runtime value capture |
If you force these two concerns into one straight line, the runbook becomes overly optimistic.
Deployment Pipeline Cutover Timing
I am now very sure about one thing:
Switch CI/CD at the end.
Reason is simple. As long as new live roots have not passed these checks:
live/sharedplan is cleanlive/dev/platformplan is cleanlive/dev/serviceplan is clean
CI/CD should not be changed to fully depend on new roots yet.
Otherwise, once auto deploy starts running, you hit all of these at once:
- code path changed
- workflow changed
- state ownership changed
- runtime contract changed
At that point, debugging becomes very hard.
A safer way is:
- make the new roots valid
- import or move state
- reconcile to zero drift
- only then switch workflows over
Exit Criteria for the Refactor
I now use a strict standard for myself.
It is not done just because terraform apply succeeds.
I check all of these:
- new roots can plan with no changes
- old roots no longer own migrated resources
- current workflows point only to new roots
- runtime behavior is still correct
- production-only resources still stay in production paths only
If one is missing, migration is not truly complete.
Phase Checklist I Reuse
If I do a similar migration again, I will directly reuse this checklist:
|
|
The benefit is clarity: you always know which phase is blocked, instead of treating migration as one vague large task.
Runtime Value Retrieval Commands
In this kind of migration note, what is easiest to forget is often not Terraform, but where to fetch current runtime values like BASE_URL. These are the commands I check most often.
Inspect Deployed Container Reference
|
|
Inspect Active Endpoint Value
|
|
InspectBASE_URLRuntime Value
|
|
Inspect Full Runtime Variable Array
|
|
Populate New Per-Env Credential Entry
|
|
Retrieve Latest Credential Content
|
|
Conclusion
The biggest risk in Terraform migration is not writing extra HCL files. The risk is binding ownership change, workflow switching, and runtime change together at the same time.
The truly stable way is:
- modules first
- live roots next
- state migration after that
- CI/CD switch later
- legacy deletion last
As long as order is right, a modules + live refactor can be very stable and does not need a big bang rewrite.