Migrating From AWS EC2 to GCP Cloud Run: Architecture Decisions and Pitfalls
A complete record of moving my side project’s backend from AWS EC2 + Docker Compose to GCP Cloud Run, including concept mapping between the two providers, Terraform layering design, federated identity trust boundaries, and every issue caught during code review
Mapping Concepts Between the Two Major Cloud Providers
Both are cloud providers, but their naming and layering differ. Getting the correspondences straight before the move prevents confusion later when configuring resources.
Account and Organization Hierarchy
| Concept | AWS | GCP |
|---|---|---|
| Top-level governance unit | Organization | Organization |
| Billing and resource isolation | Account | Project |
| Region | Region (us-east-1) | Region (us-east1) |
AWS uses Accounts to isolate environments (dev account / prod account); GCP uses Projects. A GCP Project is lighter than an AWS Account โ creating a new Project is like opening a fresh isolation space without needing to spin up a separate account.
Compute Offerings
| Capability | AWS | GCP |
|---|---|---|
| VM | EC2 | Compute Engine |
| Serverless container | App Runner / ECS Fargate | Cloud Run |
| Container orchestration | EKS | GKE |
Cloud Run is the centerpiece of this migration. Its pitch: hand it a container image and it runs it โ no VM management, no scaling headaches. The closest AWS equivalent is App Runner, but Cloud Run is more mature.
Identity and Authorization
This is where the two diverge the most:
| Concept | AWS | GCP |
|---|---|---|
| Identity for services | IAM Role + Instance Profile | Service Account |
| Temporary identity for CI/CD | OIDC Provider + AssumeRoleWithWebIdentity | Workload Identity Federation (WIF) |
| Permission assignment | IAM Policy attached to Role | IAM Binding on resource or project |
| Permission inheritance direction | Policy โ Role โ Entity | Role โ Member (on resource / project) |
AWS workflow: create an IAM Role, attach a Policy (permissions) to it, then assign the Role to an EC2 instance (via Instance Profile) or let GitHub Actions assume it.
GCP workflow: create a Service Account, then add IAM Bindings on individual resources, binding roles to that SA. Permissions are distributed across resources rather than centralized in a single Policy document.
|
|
Data and Storage
| Capability | AWS | GCP |
|---|---|---|
| Object storage | S3 | Cloud Storage (GCS) |
| Managed relational DB | RDS | Cloud SQL |
| Secret management | Secrets Manager | Secret Manager |
| Container Registry | ECR / GHCR | Artifact Registry |
| Terraform state | S3 + DynamoDB (lock) | GCS (lock built-in) |
GCP’s Terraform state backend is simpler: S3 requires a separate DynamoDB table for state locking, while GCS has it built in.
Pipeline Authentication
Both providers let GitHub Actions exchange an OIDC token for temporary cloud credentials โ no static credentials needed:
|
|
GCP adds an extra step โ “impersonate Service Account” โ because every GCP API operation requires a SA identity. With AWS, OIDC directly yields Role credentials you can use right away.
Previous Layout Versus the Redesigned One
The Original Setup on Virtual Machines
|
|
The Redesigned Setup on Managed Containers
|
|
Gone: nginx, certbot, SSH, SCP, .env. The deploy workflow dropped from ~150 lines to ~60 lines.
Splitting Infrastructure-as-Code Across Three Tiers
I split Terraform into three independent root stacks:
|
|
Why Split at All
| Stack | Change frequency | Management | Contents |
|---|---|---|---|
| bootstrap | Rarely changes | Local manual | state bucket, WIF, CI/CD SA |
| platform | Occasionally | infra.yml workflow | DB, registry, secrets, runtime SA |
| service | Every deploy | deploy.yml workflow | Cloud Run service, IAM, domain mapping |
Benefit of the split: deploy only plans/applies the Cloud Run service โ it never touches the DB or registry. Faster runs, smaller blast radius.
Passing Data Across Tiers
The service stack needs the DB connection name and runtime SA email โ both live in the platform stack. My first version passed them manually via CLI -var flags. Code review flagged this: manually supplying values on every run is error-prone.
I switched to terraform_remote_state:
|
|
The deploy workflow now only passes a single variable โ api_image. Everything else comes from remote state.
Federated Credential Trust Boundaries
Base Configuration
GCP’s WIF maps GitHub OIDC token claims to GCP attributes via attribute mapping:
|
|
attribute_condition is a provider-level gate: tokens that do not satisfy the condition are rejected outright โ they cannot even be exchanged.
Pitfall: Overly Permissive Delegation Conditions
My first version only locked down repository_id and repository_owner_id โ no branch restriction. Review caught this: any workflow on any branch in the repo with id-token: write could exchange for GCP credentials. Adding assertion.ref == "refs/heads/main" scoped it to the main branch only.
Pitfall: Changeable Names Versus Stable Numeric Identifiers
GitHub OIDC tokens contain both repository (name, mutable) and repository_id (numeric, immutable). Renaming a repo breaks any condition based on the name. Using the numeric ID is safer:
|
|
Managed Database Configuration
Pitfall: Default Edition Versus Budget Tier
For PostgreSQL 17 and above, Cloud SQL defaults to the Enterprise Plus edition. But the cheapest tier, db-f1-micro, is only available under the Enterprise edition. Failing to specify the edition explicitly either lands you on a more expensive instance than expected or causes the apply to fail outright:
|
|
ssl_mode = "ENCRYPTED_ONLY" rejects all unencrypted connections. Combined with no authorized_networks, even a direct TCP connection over public IP won’t work โ only the Cloud SQL Auth Proxy can get through.
Connecting Managed Containers to the Database
Cloud Run v2 has a built-in Cloud SQL Auth Proxy. It mounts a Unix socket into the container via a volume:
|
|
The database URL uses a Unix socket path, assembled by Terraform and stored in Secret Manager:
|
|
No app code changes needed โ it already reads the DATABASE_URL environment variable.
Pitfall: Internal Routing Flag Only Works on Non-Public Addresses
I initially added enable_private_path_for_google_cloud_services = true, hoping Cloud Run would route traffic over Google’s internal network. But Google’s documentation explicitly states this setting only takes effect under Private IP mode โ it is meaningless for a Public IP instance. I removed it; no point keeping a no-op setting.
Narrowing Credential Access
Pitfall: Granting Credential Reader Role Too Broadly
My first version gave the runtime SA roles/secretmanager.secretAccessor at the project level, meaning it could read every secret in the project. I changed it to per-secret IAM:
|
|
Pitfall: Impersonation Role Bound Too Broadly
The CI/CD Terraform SA needs roles/iam.serviceAccountUser so Cloud Run can “act as” the runtime SA. My first version placed this at the project level โ meaning it could impersonate any SA in the project. I narrowed it to bind only on the specific runtime SA:
|
|
This is the GCP equivalent of AWS iam:PassRole. AWS uses a Condition to restrict which service a role can be passed to; GCP uses a resource-level IAM binding to restrict which SA can be impersonated.
Vanity Hostnames: Native Routing Versus Global Load Balancer
Cloud Run supports two ways to bind a custom domain:
| Option | Cost | Status |
|---|---|---|
| Cloud Run domain mapping | Free | Preview / Limited Availability |
| Global External ALB + serverless NEG | ~$18/month | GA (General Availability) |
For a dev environment, domain mapping is sufficient, but keep in mind:
- It is not GA โ not recommended for production
- The Terraform SA must be a verified owner of the domain (explained below)
- DNS uses a CNAME record, not an A record
- Upgrading to an ALB later requires no app code or domain changes โ only the Terraform resource changes
Pitfall: Ownership Proof Tied to a Webmaster Tool
Cloud Run domain mapping requires domain ownership to be verified through Google Search Console. This is unique to GCP โ neither AWS nor Azure has this requirement:
| Cloud Provider | Custom Domain Verification |
|---|---|
| GCP Cloud Run | Google Search Console domain verification |
| AWS App Runner / ALB | ACM (Certificate Manager) DNS validation |
| Azure Container Apps | Set directly on the resource, add a DNS record to verify |
With AWS and Azure, you configure a custom domain on the cloud service, it gives you a DNS record (CNAME or TXT), you add it at your DNS provider, and verification completes. The entire flow stays within the cloud platform.
GCP is different: domain ownership is tied to Google Search Console (originally a tool for SEO / webmasters), and it is bound to a Google account. This means:
- Your personal Google account verifying a domain does not mean the Terraform SA also has permission. A SA is a separate identity โ it must be added as a verified owner in Search Console.
- This is a manual step โ it cannot be done via Terraform or any API. You have to go into the Search Console UI and manually add the SA’s email.
- If you forget this step,
terraform applywill fail atgoogle_cloud_run_domain_mappingwith a domain verification failure error, which is not immediately obvious as a Search Console issue.
Steps to fix:
|
|
The reason for this design likely traces back to Google’s product history โ Search Console was already where Google managed domain ownership, and Cloud Run merely adopted that verification system instead of building its own from scratch the way AWS and Azure did.
Pipeline Comparison
Previous Pipeline (~150 Lines)
|
|
Redesigned Pipeline (~60 Lines)
|
|
Key differences:
- No SSH keys, .env files, certbot, or nginx to manage
- Image digest is immutable (sha256), not a mutable tag
- Cloud Run configuration is managed by Terraform โ no drift
- Secrets live in Secret Manager, not passed in at deploy time
Remote Backend Hardening
Terraform state contains sensitive data like database passwords. The state bucket needs hardening:
|
|
uniform_bucket_level_access: enforce IAM-only access, disable ACLspublic_access_prevention = "enforced": even if IAM is misconfigured, the bucket stays privateversioning: recover state if it gets overwritten or corrupted
One advantage of using GCS as a Terraform backend: state locking is built in โ no need to create a separate DynamoDB table like on AWS.
Cost
| Item | AWS EC2 | GCP Cloud Run |
|---|---|---|
| Compute | ~$8-15/month (t3.micro 24/7) | Nearly free (low traffic stays in free tier) |
| Database | Free (container) | ~$7-10/month (Cloud SQL db-f1-micro) |
| SSL | Free (certbot) | Free (auto-managed) |
| Registry | Free (GHCR) | Low (Artifact Registry) |
| Total | ~$8-15/month | ~$8-12/month |
Costs are similar, but operations complexity dropped sharply: no more VM patching, SSH key rotation, certbot renewal, or nginx config.
Lessons Learned
- Map concepts first: GCP and AWS have very different IAM models โ Service Account versus IAM Role requires a mindset shift. Sort out the correspondences before migrating.
- Split your Terraform stacks by change frequency: deploy should be fast โ don’t plan the entire world on every push.
- Lock down federated identity trust boundaries from day one: scope to repo + branch, use immutable IDs instead of names.
- Follow least privilege at every layer: project-level IAM is convenient but will bite you eventually. Per-resource bindings are safer.
- Specify the managed database edition explicitly: PG 17+ defaults to Enterprise Plus; for the lowest cost tier, set
ENTERPRISE. - Don’t assume features exist: every “this feature should be there” assumption needs to be verified against the documentation.
References
- Cloud Run Overview
- Cloud Run Domain Mapping
- Cloud SQL Connect from Cloud Run
- Cloud SQL Editions
- Cloud SQL Configure SSL
- Workload Identity Federation with Deployment Pipelines
- Secret Manager Access Control
- Artifact Registry Overview
- Terraform GCS Backend
- Terraform Google Provider
- Cloud Run Domain Mapping Troubleshooting
- Verify Site Ownership (Search Console)
- GitHub Actions OIDC Token Claims