Migrating From AWS EC2 to GCP Cloud Run: Architecture Decisions and Pitfalls

Lou Chang included in DevOps

2026-03-11 About 2600 words 12 minutes

Contents

A complete record of moving my side project’s backend from AWS EC2 + Docker Compose to GCP Cloud Run, including concept mapping between the two providers, Terraform layering design, federated identity trust boundaries, and every issue caught during code review

Mapping Concepts Between the Two Major Cloud Providers

Both are cloud providers, but their naming and layering differ. Getting the correspondences straight before the move prevents confusion later when configuring resources.

Account and Organization Hierarchy

Concept	AWS	GCP
Top-level governance unit	Organization	Organization
Billing and resource isolation	Account	Project
Region	Region (us-east-1)	Region (us-east1)

AWS uses Accounts to isolate environments (dev account / prod account); GCP uses Projects. A GCP Project is lighter than an AWS Account — creating a new Project is like opening a fresh isolation space without needing to spin up a separate account.

Compute Offerings

Capability	AWS	GCP
VM	EC2	Compute Engine
Serverless container	App Runner / ECS Fargate	Cloud Run
Container orchestration	EKS	GKE

Cloud Run is the centerpiece of this migration. Its pitch: hand it a container image and it runs it — no VM management, no scaling headaches. The closest AWS equivalent is App Runner, but Cloud Run is more mature.

Identity and Authorization

This is where the two diverge the most:

Concept	AWS	GCP
Identity for services	IAM Role + Instance Profile	Service Account
Temporary identity for CI/CD	OIDC Provider + AssumeRoleWithWebIdentity	Workload Identity Federation (WIF)
Permission assignment	IAM Policy attached to Role	IAM Binding on resource or project
Permission inheritance direction	Policy → Role → Entity	Role → Member (on resource / project)

AWS workflow: create an IAM Role, attach a Policy (permissions) to it, then assign the Role to an EC2 instance (via Instance Profile) or let GitHub Actions assume it.

GCP workflow: create a Service Account, then add IAM Bindings on individual resources, binding roles to that SA. Permissions are distributed across resources rather than centralized in a single Policy document.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


# AWS: policy is attached to the role
IAM Role
├── Trust Policy: who can assume this role
└── Permission Policy: what this role can do

# GCP: bindings are on each resource
Service Account (identity only, no embedded permissions)
├── Project IAM Binding: roles/cloudsql.client → SA
├── Secret IAM Binding: roles/secretmanager.secretAccessor → SA
└── Bucket IAM Binding: roles/storage.objectViewer → SA

Data and Storage

Capability	AWS	GCP
Object storage	S3	Cloud Storage (GCS)
Managed relational DB	RDS	Cloud SQL
Secret management	Secrets Manager	Secret Manager
Container Registry	ECR / GHCR	Artifact Registry
Terraform state	S3 + DynamoDB (lock)	GCS (lock built-in)

GCP’s Terraform state backend is simpler: S3 requires a separate DynamoDB table for state locking, while GCS has it built in.

Pipeline Authentication

Both providers let GitHub Actions exchange an OIDC token for temporary cloud credentials — no static credentials needed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


# AWS OIDC flow
GitHub Actions
  → request OIDC token from GitHub
  → send to AWS STS (AssumeRoleWithWebIdentity)
  → get temporary AWS credentials
  → use credentials to call AWS APIs

# GCP WIF flow
GitHub Actions
  → request OIDC token from GitHub
  → send to GCP STS (token exchange)
  → get federated token
  → impersonate Service Account
  → use SA credentials to call GCP APIs

GCP adds an extra step — “impersonate Service Account” — because every GCP API operation requires a SA identity. With AWS, OIDC directly yields Role credentials you can use right away.

Previous Layout Versus the Redesigned One

The Original Setup on Virtual Machines

1
2
3
4
5
6
7
8
9


GitHub Actions
  ├── build → push to GHCR
  └── deploy → SSH into EC2
                 ├── pull image
                 ├── write .env
                 ├── docker compose up
                 └── certbot + nginx

Internet → EC2 → nginx → API container → Postgres container

The Redesigned Setup on Managed Containers

1
2
3
4
5


GitHub Actions
  ├── build → push to Artifact Registry → resolve digest
  └── deploy → terraform apply (service stack)

Internet → Cloud Run (domain mapping) → API → Cloud SQL (Auth Proxy)

Gone: nginx, certbot, SSH, SCP, .env. The deploy workflow dropped from ~150 lines to ~60 lines.

Splitting Infrastructure-as-Code Across Three Tiers

I split Terraform into three independent root stacks:

1
2
3
4


infra/terraform/
  bootstrap/   # one-time, local apply
  platform/    # infra.yml workflow
  service/     # deploy.yml workflow

Why Split at All

Stack	Change frequency	Management	Contents
bootstrap	Rarely changes	Local manual	state bucket, WIF, CI/CD SA
platform	Occasionally	infra.yml workflow	DB, registry, secrets, runtime SA
service	Every deploy	deploy.yml workflow	Cloud Run service, IAM, domain mapping

Benefit of the split: deploy only plans/applies the Cloud Run service — it never touches the DB or registry. Faster runs, smaller blast radius.

Passing Data Across Tiers

The service stack needs the DB connection name and runtime SA email — both live in the platform stack. My first version passed them manually via CLI -var flags. Code review flagged this: manually supplying values on every run is error-prone.

I switched to terraform_remote_state:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


data "terraform_remote_state" "platform" {
  backend = "gcs"
  config = {
    bucket = "my-project-tfstate"
    prefix = "platform"
  }
}

locals {
  db_connection_name = data.terraform_remote_state.platform.outputs.db_connection_name
  cloud_run_sa       = data.terraform_remote_state.platform.outputs.cloud_run_service_account_email
}

The deploy workflow now only passes a single variable — api_image. Everything else comes from remote state.

Federated Credential Trust Boundaries

Base Configuration

GCP’s WIF maps GitHub OIDC token claims to GCP attributes via attribute mapping:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


resource "google_iam_workload_identity_pool_provider" "github" {
  attribute_mapping = {
    "google.subject"                = "assertion.sub"
    "attribute.repository_id"       = "assertion.repository_id"
    "attribute.repository_owner_id" = "assertion.repository_owner_id"
    "attribute.repository"          = "assertion.repository"
    "attribute.ref"                 = "assertion.ref"
  }

  attribute_condition = join(" && ", [
    "assertion.repository_id == \"<REPO_ID>\"",
    "assertion.repository_owner_id == \"<OWNER_ID>\"",
    "assertion.ref == \"refs/heads/main\"",
  ])

  oidc {
    issuer_uri = "https://token.actions.githubusercontent.com"
  }
}

attribute_condition is a provider-level gate: tokens that do not satisfy the condition are rejected outright — they cannot even be exchanged.

Pitfall: Overly Permissive Delegation Conditions

My first version only locked down repository_id and repository_owner_id — no branch restriction. Review caught this: any workflow on any branch in the repo with id-token: write could exchange for GCP credentials. Adding assertion.ref == "refs/heads/main" scoped it to the main branch only.

Pitfall: Changeable Names Versus Stable Numeric Identifiers

GitHub OIDC tokens contain both repository (name, mutable) and repository_id (numeric, immutable). Renaming a repo breaks any condition based on the name. Using the numeric ID is safer:

1
2
3


# get immutable IDs
gh api repos/<owner>/<repo> --jq '.id'
gh api users/<owner> --jq '.id'

Managed Database Configuration

Pitfall: Default Edition Versus Budget Tier

For PostgreSQL 17 and above, Cloud SQL defaults to the Enterprise Plus edition. But the cheapest tier, db-f1-micro, is only available under the Enterprise edition. Failing to specify the edition explicitly either lands you on a more expensive instance than expected or causes the apply to fail outright:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


resource "google_sql_database_instance" "main" {
  database_version = "POSTGRES_17"

  settings {
    edition = "ENTERPRISE"
    tier    = "db-f1-micro"

    ip_configuration {
      ipv4_enabled = true
      ssl_mode     = "ENCRYPTED_ONLY"
    }
  }
}

ssl_mode = "ENCRYPTED_ONLY" rejects all unencrypted connections. Combined with no authorized_networks, even a direct TCP connection over public IP won’t work — only the Cloud SQL Auth Proxy can get through.

Connecting Managed Containers to the Database

Cloud Run v2 has a built-in Cloud SQL Auth Proxy. It mounts a Unix socket into the container via a volume:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


template {
  volumes {
    name = "cloudsql"
    cloud_sql_instance {
      instances = ["project:region:instance-name"]
    }
  }

  containers {
    volume_mounts {
      name       = "cloudsql"
      mount_path = "/cloudsql"
    }
  }
}

The database URL uses a Unix socket path, assembled by Terraform and stored in Secret Manager:

1

postgresql+psycopg://user:password@/dbname?host=/cloudsql/project:region:instance

No app code changes needed — it already reads the DATABASE_URL environment variable.

Pitfall: Internal Routing Flag Only Works on Non-Public Addresses

I initially added enable_private_path_for_google_cloud_services = true, hoping Cloud Run would route traffic over Google’s internal network. But Google’s documentation explicitly states this setting only takes effect under Private IP mode — it is meaningless for a Public IP instance. I removed it; no point keeping a no-op setting.

Narrowing Credential Access

Pitfall: Granting Credential Reader Role Too Broadly

My first version gave the runtime SA roles/secretmanager.secretAccessor at the project level, meaning it could read every secret in the project. I changed it to per-secret IAM:

1
2
3
4
5
6


resource "google_secret_manager_secret_iam_member" "runtime_access" {
  for_each  = toset(local.secrets)
  secret_id = google_secret_manager_secret.secrets[each.value].secret_id
  role      = "roles/secretmanager.secretAccessor"
  member    = "serviceAccount:${runtime_sa_email}"
}

Pitfall: Impersonation Role Bound Too Broadly

The CI/CD Terraform SA needs roles/iam.serviceAccountUser so Cloud Run can “act as” the runtime SA. My first version placed this at the project level — meaning it could impersonate any SA in the project. I narrowed it to bind only on the specific runtime SA:

1
2
3
4
5


resource "google_service_account_iam_member" "terraform_acts_as_runtime" {
  service_account_id = google_service_account.cloud_run.name
  role               = "roles/iam.serviceAccountUser"
  member             = "serviceAccount:${terraform_sa_email}"
}

This is the GCP equivalent of AWS iam:PassRole. AWS uses a Condition to restrict which service a role can be passed to; GCP uses a resource-level IAM binding to restrict which SA can be impersonated.

Vanity Hostnames: Native Routing Versus Global Load Balancer

Cloud Run supports two ways to bind a custom domain:

Option	Cost	Status
Cloud Run domain mapping	Free	Preview / Limited Availability
Global External ALB + serverless NEG	~$18/month	GA (General Availability)

For a dev environment, domain mapping is sufficient, but keep in mind:

It is not GA — not recommended for production
The Terraform SA must be a verified owner of the domain (explained below)
DNS uses a CNAME record, not an A record
Upgrading to an ALB later requires no app code or domain changes — only the Terraform resource changes

Pitfall: Ownership Proof Tied to a Webmaster Tool

Cloud Run domain mapping requires domain ownership to be verified through Google Search Console. This is unique to GCP — neither AWS nor Azure has this requirement:

Cloud Provider	Custom Domain Verification
GCP Cloud Run	Google Search Console domain verification
AWS App Runner / ALB	ACM (Certificate Manager) DNS validation
Azure Container Apps	Set directly on the resource, add a DNS record to verify

With AWS and Azure, you configure a custom domain on the cloud service, it gives you a DNS record (CNAME or TXT), you add it at your DNS provider, and verification completes. The entire flow stays within the cloud platform.

GCP is different: domain ownership is tied to Google Search Console (originally a tool for SEO / webmasters), and it is bound to a Google account. This means:

Your personal Google account verifying a domain does not mean the Terraform SA also has permission. A SA is a separate identity — it must be added as a verified owner in Search Console.
This is a manual step — it cannot be done via Terraform or any API. You have to go into the Search Console UI and manually add the SA’s email.
If you forget this step, terraform apply will fail at google_cloud_run_domain_mapping with a domain verification failure error, which is not immediately obvious as a Search Console issue.

Steps to fix:

1
2
3
4
5
6


1. Go to Google Search Console
2. Select the verified property for your domain
3. Settings → Users and permissions → Add user
4. Enter the Terraform SA email (e.g. github-actions-terraform@project.iam.gserviceaccount.com)
5. Set permission to Owner
6. Save

The reason for this design likely traces back to Google’s product history — Search Console was already where Google managed domain ownership, and Cloud Run merely adopted that verification system instead of building its own from scratch the way AWS and Azure did.

Pipeline Comparison

Previous Pipeline (~150 Lines)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


deploy:
  steps:
    - Configure AWS credentials (OIDC)
    - Get EC2 public IP
    - Validate 12+ secrets and variables
    - SCP compose files to EC2
    - SSH into EC2 and run deploy script:
        - write .env
        - install certbot + cronie
        - docker login to GHCR
        - docker compose pull / down / up
        - certbot + cron
    - Health check (HTTPS with HTTP fallback)

Redesigned Pipeline (~60 Lines)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


build:
  steps:
    - Authenticate to GCP (WIF)
    - Configure Docker for Artifact Registry
    - docker build + push
    - Resolve immutable digest (sha256)

deploy:
  steps:
    - Authenticate to GCP (WIF)
    - terraform init / plan / apply (service stack)
    - Health check (Cloud Run URL)
    - Health check (custom domain)

Key differences:

No SSH keys, .env files, certbot, or nginx to manage
Image digest is immutable (sha256), not a mutable tag
Cloud Run configuration is managed by Terraform — no drift
Secrets live in Secret Manager, not passed in at deploy time

Remote Backend Hardening

Terraform state contains sensitive data like database passwords. The state bucket needs hardening:

1
2
3
4
5
6
7
8


resource "google_storage_bucket" "terraform_state" {
  versioning {
    enabled = true
  }

  uniform_bucket_level_access = true
  public_access_prevention    = "enforced"
}

uniform_bucket_level_access: enforce IAM-only access, disable ACLs
public_access_prevention = "enforced": even if IAM is misconfigured, the bucket stays private
versioning: recover state if it gets overwritten or corrupted

One advantage of using GCS as a Terraform backend: state locking is built in — no need to create a separate DynamoDB table like on AWS.

Cost

Item	AWS EC2	GCP Cloud Run
Compute	~$8-15/month (t3.micro 24/7)	Nearly free (low traffic stays in free tier)
Database	Free (container)	~$7-10/month (Cloud SQL db-f1-micro)
SSL	Free (certbot)	Free (auto-managed)
Registry	Free (GHCR)	Low (Artifact Registry)
Total	~$8-15/month	~$8-12/month

Costs are similar, but operations complexity dropped sharply: no more VM patching, SSH key rotation, certbot renewal, or nginx config.

Lessons Learned

Map concepts first: GCP and AWS have very different IAM models — Service Account versus IAM Role requires a mindset shift. Sort out the correspondences before migrating.
Split your Terraform stacks by change frequency: deploy should be fast — don’t plan the entire world on every push.
Lock down federated identity trust boundaries from day one: scope to repo + branch, use immutable IDs instead of names.
Follow least privilege at every layer: project-level IAM is convenient but will bite you eventually. Per-resource bindings are safer.
Specify the managed database edition explicitly: PG 17+ defaults to Enterprise Plus; for the lowest cost tier, set ENTERPRISE.
Don’t assume features exist: every “this feature should be there” assumption needs to be verified against the documentation.