Migrating From AWS EC2 to GCP Cloud Run: Architecture Decisions and Pitfalls

A complete record of moving my side project’s backend from AWS EC2 + Docker Compose to GCP Cloud Run, including concept mapping between the two providers, Terraform layering design, federated identity trust boundaries, and every issue caught during code review

Mapping Concepts Between the Two Major Cloud Providers

Both are cloud providers, but their naming and layering differ. Getting the correspondences straight before the move prevents confusion later when configuring resources.

Account and Organization Hierarchy

Concept AWS GCP
Top-level governance unit Organization Organization
Billing and resource isolation Account Project
Region Region (us-east-1) Region (us-east1)

AWS uses Accounts to isolate environments (dev account / prod account); GCP uses Projects. A GCP Project is lighter than an AWS Account โ€” creating a new Project is like opening a fresh isolation space without needing to spin up a separate account.

Compute Offerings

Capability AWS GCP
VM EC2 Compute Engine
Serverless container App Runner / ECS Fargate Cloud Run
Container orchestration EKS GKE

Cloud Run is the centerpiece of this migration. Its pitch: hand it a container image and it runs it โ€” no VM management, no scaling headaches. The closest AWS equivalent is App Runner, but Cloud Run is more mature.

Identity and Authorization

This is where the two diverge the most:

Concept AWS GCP
Identity for services IAM Role + Instance Profile Service Account
Temporary identity for CI/CD OIDC Provider + AssumeRoleWithWebIdentity Workload Identity Federation (WIF)
Permission assignment IAM Policy attached to Role IAM Binding on resource or project
Permission inheritance direction Policy โ†’ Role โ†’ Entity Role โ†’ Member (on resource / project)

AWS workflow: create an IAM Role, attach a Policy (permissions) to it, then assign the Role to an EC2 instance (via Instance Profile) or let GitHub Actions assume it.

GCP workflow: create a Service Account, then add IAM Bindings on individual resources, binding roles to that SA. Permissions are distributed across resources rather than centralized in a single Policy document.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# AWS: policy is attached to the role
IAM Role
โ”œโ”€โ”€ Trust Policy: who can assume this role
โ””โ”€โ”€ Permission Policy: what this role can do

# GCP: bindings are on each resource
Service Account (identity only, no embedded permissions)
โ”œโ”€โ”€ Project IAM Binding: roles/cloudsql.client โ†’ SA
โ”œโ”€โ”€ Secret IAM Binding: roles/secretmanager.secretAccessor โ†’ SA
โ””โ”€โ”€ Bucket IAM Binding: roles/storage.objectViewer โ†’ SA

Data and Storage

Capability AWS GCP
Object storage S3 Cloud Storage (GCS)
Managed relational DB RDS Cloud SQL
Secret management Secrets Manager Secret Manager
Container Registry ECR / GHCR Artifact Registry
Terraform state S3 + DynamoDB (lock) GCS (lock built-in)

GCP’s Terraform state backend is simpler: S3 requires a separate DynamoDB table for state locking, while GCS has it built in.

Pipeline Authentication

Both providers let GitHub Actions exchange an OIDC token for temporary cloud credentials โ€” no static credentials needed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# AWS OIDC flow
GitHub Actions
  โ†’ request OIDC token from GitHub
  โ†’ send to AWS STS (AssumeRoleWithWebIdentity)
  โ†’ get temporary AWS credentials
  โ†’ use credentials to call AWS APIs

# GCP WIF flow
GitHub Actions
  โ†’ request OIDC token from GitHub
  โ†’ send to GCP STS (token exchange)
  โ†’ get federated token
  โ†’ impersonate Service Account
  โ†’ use SA credentials to call GCP APIs

GCP adds an extra step โ€” “impersonate Service Account” โ€” because every GCP API operation requires a SA identity. With AWS, OIDC directly yields Role credentials you can use right away.

Previous Layout Versus the Redesigned One

The Original Setup on Virtual Machines

1
2
3
4
5
6
7
8
9
GitHub Actions
  โ”œโ”€โ”€ build โ†’ push to GHCR
  โ””โ”€โ”€ deploy โ†’ SSH into EC2
                 โ”œโ”€โ”€ pull image
                 โ”œโ”€โ”€ write .env
                 โ”œโ”€โ”€ docker compose up
                 โ””โ”€โ”€ certbot + nginx

Internet โ†’ EC2 โ†’ nginx โ†’ API container โ†’ Postgres container

The Redesigned Setup on Managed Containers

1
2
3
4
5
GitHub Actions
  โ”œโ”€โ”€ build โ†’ push to Artifact Registry โ†’ resolve digest
  โ””โ”€โ”€ deploy โ†’ terraform apply (service stack)

Internet โ†’ Cloud Run (domain mapping) โ†’ API โ†’ Cloud SQL (Auth Proxy)

Gone: nginx, certbot, SSH, SCP, .env. The deploy workflow dropped from ~150 lines to ~60 lines.

Splitting Infrastructure-as-Code Across Three Tiers

I split Terraform into three independent root stacks:

1
2
3
4
infra/terraform/
  bootstrap/   # one-time, local apply
  platform/    # infra.yml workflow
  service/     # deploy.yml workflow

Why Split at All

Stack Change frequency Management Contents
bootstrap Rarely changes Local manual state bucket, WIF, CI/CD SA
platform Occasionally infra.yml workflow DB, registry, secrets, runtime SA
service Every deploy deploy.yml workflow Cloud Run service, IAM, domain mapping

Benefit of the split: deploy only plans/applies the Cloud Run service โ€” it never touches the DB or registry. Faster runs, smaller blast radius.

Passing Data Across Tiers

The service stack needs the DB connection name and runtime SA email โ€” both live in the platform stack. My first version passed them manually via CLI -var flags. Code review flagged this: manually supplying values on every run is error-prone.

I switched to terraform_remote_state:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
data "terraform_remote_state" "platform" {
  backend = "gcs"
  config = {
    bucket = "my-project-tfstate"
    prefix = "platform"
  }
}

locals {
  db_connection_name = data.terraform_remote_state.platform.outputs.db_connection_name
  cloud_run_sa       = data.terraform_remote_state.platform.outputs.cloud_run_service_account_email
}

The deploy workflow now only passes a single variable โ€” api_image. Everything else comes from remote state.

Federated Credential Trust Boundaries

Base Configuration

GCP’s WIF maps GitHub OIDC token claims to GCP attributes via attribute mapping:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
resource "google_iam_workload_identity_pool_provider" "github" {
  attribute_mapping = {
    "google.subject"                = "assertion.sub"
    "attribute.repository_id"       = "assertion.repository_id"
    "attribute.repository_owner_id" = "assertion.repository_owner_id"
    "attribute.repository"          = "assertion.repository"
    "attribute.ref"                 = "assertion.ref"
  }

  attribute_condition = join(" && ", [
    "assertion.repository_id == \"<REPO_ID>\"",
    "assertion.repository_owner_id == \"<OWNER_ID>\"",
    "assertion.ref == \"refs/heads/main\"",
  ])

  oidc {
    issuer_uri = "https://token.actions.githubusercontent.com"
  }
}

attribute_condition is a provider-level gate: tokens that do not satisfy the condition are rejected outright โ€” they cannot even be exchanged.

Pitfall: Overly Permissive Delegation Conditions

My first version only locked down repository_id and repository_owner_id โ€” no branch restriction. Review caught this: any workflow on any branch in the repo with id-token: write could exchange for GCP credentials. Adding assertion.ref == "refs/heads/main" scoped it to the main branch only.

Pitfall: Changeable Names Versus Stable Numeric Identifiers

GitHub OIDC tokens contain both repository (name, mutable) and repository_id (numeric, immutable). Renaming a repo breaks any condition based on the name. Using the numeric ID is safer:

1
2
3
# get immutable IDs
gh api repos/<owner>/<repo> --jq '.id'
gh api users/<owner> --jq '.id'

Managed Database Configuration

Pitfall: Default Edition Versus Budget Tier

For PostgreSQL 17 and above, Cloud SQL defaults to the Enterprise Plus edition. But the cheapest tier, db-f1-micro, is only available under the Enterprise edition. Failing to specify the edition explicitly either lands you on a more expensive instance than expected or causes the apply to fail outright:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
resource "google_sql_database_instance" "main" {
  database_version = "POSTGRES_17"

  settings {
    edition = "ENTERPRISE"
    tier    = "db-f1-micro"

    ip_configuration {
      ipv4_enabled = true
      ssl_mode     = "ENCRYPTED_ONLY"
    }
  }
}

ssl_mode = "ENCRYPTED_ONLY" rejects all unencrypted connections. Combined with no authorized_networks, even a direct TCP connection over public IP won’t work โ€” only the Cloud SQL Auth Proxy can get through.

Connecting Managed Containers to the Database

Cloud Run v2 has a built-in Cloud SQL Auth Proxy. It mounts a Unix socket into the container via a volume:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
template {
  volumes {
    name = "cloudsql"
    cloud_sql_instance {
      instances = ["project:region:instance-name"]
    }
  }

  containers {
    volume_mounts {
      name       = "cloudsql"
      mount_path = "/cloudsql"
    }
  }
}

The database URL uses a Unix socket path, assembled by Terraform and stored in Secret Manager:

1
postgresql+psycopg://user:password@/dbname?host=/cloudsql/project:region:instance

No app code changes needed โ€” it already reads the DATABASE_URL environment variable.

Pitfall: Internal Routing Flag Only Works on Non-Public Addresses

I initially added enable_private_path_for_google_cloud_services = true, hoping Cloud Run would route traffic over Google’s internal network. But Google’s documentation explicitly states this setting only takes effect under Private IP mode โ€” it is meaningless for a Public IP instance. I removed it; no point keeping a no-op setting.

Narrowing Credential Access

Pitfall: Granting Credential Reader Role Too Broadly

My first version gave the runtime SA roles/secretmanager.secretAccessor at the project level, meaning it could read every secret in the project. I changed it to per-secret IAM:

1
2
3
4
5
6
resource "google_secret_manager_secret_iam_member" "runtime_access" {
  for_each  = toset(local.secrets)
  secret_id = google_secret_manager_secret.secrets[each.value].secret_id
  role      = "roles/secretmanager.secretAccessor"
  member    = "serviceAccount:${runtime_sa_email}"
}

Pitfall: Impersonation Role Bound Too Broadly

The CI/CD Terraform SA needs roles/iam.serviceAccountUser so Cloud Run can “act as” the runtime SA. My first version placed this at the project level โ€” meaning it could impersonate any SA in the project. I narrowed it to bind only on the specific runtime SA:

1
2
3
4
5
resource "google_service_account_iam_member" "terraform_acts_as_runtime" {
  service_account_id = google_service_account.cloud_run.name
  role               = "roles/iam.serviceAccountUser"
  member             = "serviceAccount:${terraform_sa_email}"
}

This is the GCP equivalent of AWS iam:PassRole. AWS uses a Condition to restrict which service a role can be passed to; GCP uses a resource-level IAM binding to restrict which SA can be impersonated.

Vanity Hostnames: Native Routing Versus Global Load Balancer

Cloud Run supports two ways to bind a custom domain:

Option Cost Status
Cloud Run domain mapping Free Preview / Limited Availability
Global External ALB + serverless NEG ~$18/month GA (General Availability)

For a dev environment, domain mapping is sufficient, but keep in mind:

  1. It is not GA โ€” not recommended for production
  2. The Terraform SA must be a verified owner of the domain (explained below)
  3. DNS uses a CNAME record, not an A record
  4. Upgrading to an ALB later requires no app code or domain changes โ€” only the Terraform resource changes

Pitfall: Ownership Proof Tied to a Webmaster Tool

Cloud Run domain mapping requires domain ownership to be verified through Google Search Console. This is unique to GCP โ€” neither AWS nor Azure has this requirement:

Cloud Provider Custom Domain Verification
GCP Cloud Run Google Search Console domain verification
AWS App Runner / ALB ACM (Certificate Manager) DNS validation
Azure Container Apps Set directly on the resource, add a DNS record to verify

With AWS and Azure, you configure a custom domain on the cloud service, it gives you a DNS record (CNAME or TXT), you add it at your DNS provider, and verification completes. The entire flow stays within the cloud platform.

GCP is different: domain ownership is tied to Google Search Console (originally a tool for SEO / webmasters), and it is bound to a Google account. This means:

  1. Your personal Google account verifying a domain does not mean the Terraform SA also has permission. A SA is a separate identity โ€” it must be added as a verified owner in Search Console.
  2. This is a manual step โ€” it cannot be done via Terraform or any API. You have to go into the Search Console UI and manually add the SA’s email.
  3. If you forget this step, terraform apply will fail at google_cloud_run_domain_mapping with a domain verification failure error, which is not immediately obvious as a Search Console issue.

Steps to fix:

1
2
3
4
5
6
1. Go to Google Search Console
2. Select the verified property for your domain
3. Settings โ†’ Users and permissions โ†’ Add user
4. Enter the Terraform SA email (e.g. github-actions-terraform@project.iam.gserviceaccount.com)
5. Set permission to Owner
6. Save

The reason for this design likely traces back to Google’s product history โ€” Search Console was already where Google managed domain ownership, and Cloud Run merely adopted that verification system instead of building its own from scratch the way AWS and Azure did.

Pipeline Comparison

Previous Pipeline (~150 Lines)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
deploy:
  steps:
    - Configure AWS credentials (OIDC)
    - Get EC2 public IP
    - Validate 12+ secrets and variables
    - SCP compose files to EC2
    - SSH into EC2 and run deploy script:
        - write .env
        - install certbot + cronie
        - docker login to GHCR
        - docker compose pull / down / up
        - certbot + cron
    - Health check (HTTPS with HTTP fallback)

Redesigned Pipeline (~60 Lines)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
build:
  steps:
    - Authenticate to GCP (WIF)
    - Configure Docker for Artifact Registry
    - docker build + push
    - Resolve immutable digest (sha256)

deploy:
  steps:
    - Authenticate to GCP (WIF)
    - terraform init / plan / apply (service stack)
    - Health check (Cloud Run URL)
    - Health check (custom domain)

Key differences:

  • No SSH keys, .env files, certbot, or nginx to manage
  • Image digest is immutable (sha256), not a mutable tag
  • Cloud Run configuration is managed by Terraform โ€” no drift
  • Secrets live in Secret Manager, not passed in at deploy time

Remote Backend Hardening

Terraform state contains sensitive data like database passwords. The state bucket needs hardening:

1
2
3
4
5
6
7
8
resource "google_storage_bucket" "terraform_state" {
  versioning {
    enabled = true
  }

  uniform_bucket_level_access = true
  public_access_prevention    = "enforced"
}
  • uniform_bucket_level_access: enforce IAM-only access, disable ACLs
  • public_access_prevention = "enforced": even if IAM is misconfigured, the bucket stays private
  • versioning: recover state if it gets overwritten or corrupted

One advantage of using GCS as a Terraform backend: state locking is built in โ€” no need to create a separate DynamoDB table like on AWS.

Cost

Item AWS EC2 GCP Cloud Run
Compute ~$8-15/month (t3.micro 24/7) Nearly free (low traffic stays in free tier)
Database Free (container) ~$7-10/month (Cloud SQL db-f1-micro)
SSL Free (certbot) Free (auto-managed)
Registry Free (GHCR) Low (Artifact Registry)
Total ~$8-15/month ~$8-12/month

Costs are similar, but operations complexity dropped sharply: no more VM patching, SSH key rotation, certbot renewal, or nginx config.

Lessons Learned

  1. Map concepts first: GCP and AWS have very different IAM models โ€” Service Account versus IAM Role requires a mindset shift. Sort out the correspondences before migrating.
  2. Split your Terraform stacks by change frequency: deploy should be fast โ€” don’t plan the entire world on every push.
  3. Lock down federated identity trust boundaries from day one: scope to repo + branch, use immutable IDs instead of names.
  4. Follow least privilege at every layer: project-level IAM is convenient but will bite you eventually. Per-resource bindings are safer.
  5. Specify the managed database edition explicitly: PG 17+ defaults to Enterprise Plus; for the lowest cost tier, set ENTERPRISE.
  6. Don’t assume features exist: every “this feature should be there” assumption needs to be verified against the documentation.

References