First Production Custom Domain Cutover on Cloud Run: Domain Mapping, Search Console, and Certificate Wait-Window Pitfalls

This post records several real pitfalls I hit when moving a Cloud Run custom domain from an existing environment to the first formal prod service.

On the surface, it looks like only pointing api.example.com to the new Cloud Run service. But what usually blocks you is not Terraform syntax. The real blockers are:

  • domain ownership
  • IAM identity
  • certificate provisioning
  • first deploy sequencing

If these are not thought through before execution, first-time prod enablement is very easy to stall at the last step.

Typical Situation Pattern

A common scenario looks like this:

  • custom domain already exists
  • old mapping points to a non-prod or legacy service
  • new prod service is ready
  • Terraform now manages the prod service and wants to own the domain mapping too

The most intuitive idea at this point is:

  • import the existing mapping
  • let Terraform replace it during the first prod deploy

This direction is not wrong, but it pulls in several prerequisites outside the provider.

First Failure Class

The first error I hit was not DNS and not the Cloud Run service itself. It was a message like:

1
Caller is not authorized to administer the domain api.example.com.

The key point of this error is not that Cloud Run is broken. It means:

  • the current deploy identity can talk to GCP
  • but it is not recognized as a domain owner for that domain

For Cloud Run custom domains, permission to create or recreate domain mapping is related to Search Console ownership.

If a new deploy-prod service account is taking over first formal prod cutover, that account must also be recognized as an owner of the domain.

Second Failure Class

This difference is easy to miss.

In Search Console:

  • Full user is not the same as Owner
  • Full user cannot satisfy domain ownership requirements for this flow

If deploy service account is only added as Full user, Terraform can still fail during domain mapping creation.

The more stable approach is:

  1. add the deploy-prod service account as an owner
  2. add it on the parent domain if possible
  3. wait a few minutes for the permission change to propagate
  4. rerun the failed deploy

Third Failure Class

During first prod cutover, a common pattern looks like this:

1
2
3
4
import {
  to = module.service.google_cloud_run_domain_mapping.api[0]
  id = "locations/us-east1/namespaces/my-project/domainmappings/api.example.com"
}

The benefit is that Terraform can adopt the existing mapping first, then detect route mismatch in plan, then replace.

But note:

  • import success only means state ownership is established
  • it does not mean the new domain target is ready yet

Whether cutover is truly complete still depends on mapping status and certificate status.

Fourth Failure Class

This is the easiest place to misdiagnose on first production enablement.

You may see:

  • DNS already points to ghs.googlehosted.com
  • DomainRoutable = True
  • but curl https://api.example.com still fails

At this point, DNS may be correct and Terraform may be complete.

A very common reason is managed certificate still provisioning.

I check domain mapping status first:

1
2
3
4
gcloud beta run domain-mappings describe \
  --domain api.example.com \
  --region us-east1 \
  --format="yaml(status)"

If you see states like:

1
2
3
Ready: True
CertificateProvisioned: True
DomainRoutable: True

Then the GCP control plane is basically done. If local HTTPS is still temporarily unreachable, it is often just edge propagation not fully converged yet.

Fifth Failure Class

If domain mapping is switched together with the first prod deploy, sequence must be strict.

I recommend this sequence now:

1
2
3
4
5
6
7
1. prepare prod platform resources
2. populate prod secrets
3. remove legacy domain mapping from the old Terraform state
4. import the mapping into the new prod service root
5. run the first prod deploy
6. verify mapping status and HTTPS readiness
7. remove the one-time import block

The easiest mistake is step 3 and step 4.

If old root still owns domain mapping while new root imports the same mapping, state ownership becomes inconsistent.

Sixth Failure Class

This is not a Cloud Run issue directly, but strongly related to first prod enablement.

If deploy-prod.yml already has push: main, one merge can itself become the first prod deploy.

This means you cannot review that PR with normal feature-merge mindset. You must treat it as a production deployment rollout event.

The safer approach is usually one of two:

  • keep the first prod deploy manual
  • or complete every prerequisite before the merge that enables auto prod deploy

As long as prerequisites are not finished, main should not auto-trigger first prod cutover.

Verification Model I Use

I verify completion in three layers:

Terraform Layer

  • prod service root plans cleanly
  • the import block is no longer needed
  • the old root no longer owns the mapping

Control-Plane Layer

  • mappedRouteName points to the prod service
  • Ready = True
  • CertificateProvisioned = True
  • DomainRoutable = True

User-Visible Layer

  • curl -I https://api.example.com/health succeeds
  • opening the domain in a browser works
  • service behavior matches prod, not legacy/dev

If only the first two layers pass and the third still fails, I usually do not call cutover complete.

Practical Mental Model

I now split this into two different questions:

  1. who owns the domain mapping state
  2. who is authorized to administer the domain

Terraform import handles only the first question.

Search Console ownership and Cloud Run domain authorization handle the second question.

If these are mixed together, it is easy to think import success means everything is ready.

Runtime Verification Commands

During first custom-domain cutover to prod, the most frequent lookups are mapping status and service URL. I keep these commands directly in notes.

Check Full Status Payload

1
2
3
4
gcloud beta run domain-mappings describe \
  --domain api.example.com \
  --region us-east1 \
  --format="yaml(status)"

Check Current Target Name

1
2
3
4
gcloud beta run domain-mappings describe \
  --domain api.example.com \
  --region us-east1 \
  --format='value(status.mappedRouteName)'

List Region-Level Entries

1
2
gcloud beta run domain-mappings list \
  --region us-east1

Check Endpoint Value

1
2
3
gcloud run services describe app-prod \
  --region us-east1 \
  --format='value(status.url)'

Check Current Container Artifact

1
2
3
gcloud run services describe app-prod \
  --region us-east1 \
  --format='value(spec.template.spec.containers[0].image)'

Check Runtime Traffic Detail

1
2
3
gcloud run services describe app-prod \
  --region us-east1 \
  --format='yaml(status.url,traffic)'

Conclusion

When cutting Cloud Run custom domain to formal prod service for the first time, the hard part is usually not HCL. The hard part is boundary conditions outside the cloud control plane.

The most important points are:

  • make sure the deploy identity is a real domain owner
  • treat the first prod deploy as a rollout event
  • verify mapping status and certificate status separately
  • remove one-time import scaffolding after cutover

If these are separated clearly, first prod enablement becomes much clearer.

References