Following the previous post (Cluster, Pod, Deployment), this one covers stable network access, zero-downtime updates, restart-loop behavior, automatic replica scaling, and the troubleshooting workflow you reach for when things go wrong.
Prerequisite: a kind cluster already running a go-api Deployment (3 replicas).
Stable Network Entry Points
Why You Need a Stable Endpoint
Pod IPs are ephemeral โ every rebuild gets a new one:
1
2
3
4
5
6
kubectl get pod go-api-xxx -o wide
# IP: 10.244.0.5kubectl delete pod go-api-xxx
kubectl get pod go-api-yyy -o wide
# IP: 10.244.0.8 โ changed
Right now we have 3 Pods, each with a different IP, and any of them could change at any moment. How do other workloads connect?
1
2
3
4
5
6
7
8
9
10
without Service:
Pod 1: 10.244.0.5 โ
Pod 2: 10.244.0.6 โโโ which IP? what if it changes?
Pod 3: 10.244.0.7 โ
with Service:
Service: go-api (stable DNS + stable IP)
โโโ routes to Pod 1
โโโ routes to Pod 2
โโโ routes to Pod 3
How the Endpoint Finds Its Backends
Through a label selector โ the endpoint matches Pods by label:
1
2
3
4
5
6
7
8
9
10
# service.yamlspec:selector:app:go-api # find all Pods with this label# deployment.yaml (Pod template)template:metadata:labels:app:go-api # matches the selector above
Labels are tags attached to Pods. The endpoint uses them to decide “which Pods belong to me.”
old-pod-1 Running โ 3 old Pods running
new-pod-1 Pending โ ContainerCreating โ Running โ new Pod 1 ready
old-pod-1 Terminating โ THEN old Pod 1 killed
new-pod-2 Pending โ Running โ new Pod 2 ready
old-pod-2 Terminating โ old Pod 2 killed
new-pod-3 Pending โ Running โ new Pod 3 ready
old-pod-3 Terminating โ old Pod 3 killed
Key point: the new Pod must be Running before the old Pod is killed. That is zero downtime.
Controlled by the Deployment’s strategy:
1
2
3
4
5
strategy:type:RollingUpdaterollingUpdate:maxSurge:1# at most 1 extra Pod during updatemaxUnavailable:0# no downtime allowed
Two Replica Sets Coexisting
1
kubectl get rs
1
2
go-api-65577fc4f9 3 3 3 2m โ new ReplicaSet (0.0.3), 3 Pods
go-api-668dcc5dd 0 0 0 4d โ old ReplicaSet (0.0.2), scaled to 0
The old ReplicaSet is not deleted โ its Pod count scales down to 0. K8s keeps it on purpose so you can roll back.
Revision Trail and Reverting to an Earlier Version
1
2
3
4
5
6
7
8
9
10
# see revision historykubectl rollout history deployment/go-api
# REVISION 1 โ go-api:0.0.2# REVISION 2 โ go-api:0.0.3# rollback to previous versionkubectl rollout undo deployment/go-api
# check โ old ReplicaSet scales back upkubectl get rs
rollout undo pulls the old ReplicaSet from 0 back to 3, and shrinks the new one from 3 to 0. It follows the same rolling-update process โ still zero downtime.
Restart Loops: When a Workload Keeps Dying
What a Restart Loop Looks Like
When a container keeps crashing, kubelet does not restart it immediately without limit. It uses exponential backoff:
1
2
3
4
5
crash #1 โ restart immediately
crash #2 โ wait ~10s, then restart
crash #3 โ wait ~20s, then restart
crash #4 โ wait ~40s, then restart
...keeps doubling, up to 5 minutes max
kubelet is not giving up โ it waits longer each time. If the program itself has a bug, restarting right away would just crash right away again, wasting resources.
Observing the Restart Loop
1
kubectl get pods --watch
1
2
3
4
5
6
7
8
my-pod Running โ container starts
my-pod Error โ container crashed (exit code != 0)
my-pod Running 1 โ kubelet restarted it
my-pod Error โ crashed again
my-pod CrashLoopBackOff โ kubelet: "crashing too often, waiting..."
my-pod Running 2 (14s ago) โ restarted after 14s delay
my-pod Error โ crashed again
my-pod CrashLoopBackOff โ waiting even longer (27s)
What the Three States Mean
STATUS
Meaning
Error
Container just died
CrashLoopBackOff
Died too many times; kubelet is waiting, has not restarted yet
Running
Restart succeeded
What to Do When Your Workload Is Stuck in a Loop
1
2
# see the logs from the PREVIOUS crashed containerkubectl logs <pod-name> --previous
--previous is the key โ it shows the log from before the last crash, so you can pinpoint which part of the code failed.
Peeking Under the Hood
Control-Plane Components
1
kubectl get pods -n kube-system
1
2
3
4
5
6
etcd-control-plane โ database, stores all cluster state
kube-apiserver-control-plane โ front door, all requests go through here
kube-scheduler-control-plane โ decides which Node runs each Pod
kube-controller-manager-control-plane โ runs reconciliation loops
kube-proxy-xxxxx โ network rules (iptables/ipvs)
coredns-xxxxx (x2) โ DNS for service discovery
-n = --namespace. All of K8s’ own components run in the kube-system namespace.
Event Records: Watching Components Coordinate
1
kubectl describe pod <name>
The Events section records the full lifecycle of a Pod:
1
2
3
4
Scheduled โ default-scheduler โ assigned to devops-lab-control-plane
Pulled โ kubelet โ image already present
Created โ kubelet โ container created
Started โ kubelet โ container started
Note: Events are retained for only 1 hour. Older Pods will not show Events โ only newly created ones will.
Cluster-Wide Activity Log
1
kubectl get events --sort-by='.lastTimestamp'
Without specifying a Pod name, this lists events for all resources. You can see the ReplicaSet’s Pod-creation records:
1
Normal SuccessfulCreate replicaset/go-api-668dcc5dd Created pod: go-api-668dcc5dd-h5ttf
This proves it is the ReplicaSet that creates Pods, not the Deployment directly.
Automatic Replica Scaling
Why You Need Autoscaling
A Deployment’s replica count is fixed โ set it to 3 and it stays at 3 forever. Real traffic fluctuates:
1
2
3
4
5
6
without HPA:
replicas: 3 โ always 3 Pods, even at 3am with zero traffic
with HPA:
traffic high โ scale up to 7 Pods
traffic low โ scale down to 2 Pods
Prerequisite: Resource Usage Collector
Autoscaling needs to know each Pod’s CPU usage. The metrics-server collects that data:
1
2
3
4
5
6
7
HPA: "CPU usage is how much?"
โ
metrics-server: "let me ask kubelet on each Node"
โ
kubelet: "Pod A uses 30m CPU, Pod B uses 45m CPU"
โ
HPA: "over threshold, scale up"
Without metrics-server, autoscaling is blind.
Installing the Usage Collector (Local Cluster)
1
2
3
4
5
6
7
8
9
10
# installkubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# kind uses self-signed certs, need to skip TLS verificationkubectl -n kube-system patch deployment metrics-server \
--type='json'\
-p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'# wait for readykubectl -n kube-system rollout status deployment/metrics-server
--kubelet-insecure-tls is only needed for kind. Production (GKE/EKS) has proper certificates and does not require it.
What In-Place Patching Does
patch modifies a K8s resource in place, without rewriting the entire YAML:
The path maps to the YAML structure:
1
2
3
4
5
6
7
8
spec:# /spectemplate:# /spec/templatespec:# /spec/template/speccontainers:# /spec/template/spec/containers- name:metrics-server# /spec/template/spec/containers/0args:# /spec/template/spec/containers/0/args- --cert-dir=/tmp- --kubelet-insecure-tls # โ /args/- means append here
How do you figure out the path? Inspect the structure first with -o yaml:
1
kubectl -n kube-system get deployment metrics-server -o yaml
Verifying the Usage Collector
1
kubectl top pods
If you see numbers, metrics-server is working.
Autoscaler Definition File
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
apiVersion:autoscaling/v2kind:HorizontalPodAutoscalermetadata:name:go-apispec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:go-api # which Deployment to scaleminReplicas:2# minimum PodsmaxReplicas:10# maximum Podsmetrics:- type:Resourceresource:name:cputarget:type:UtilizationaverageUtilization:50# scale up when avg CPU > 50%
Autoscaling does not manage Pods directly โ it changes the Deployment’s replica count, and the Deployment adjusts the number of Pods.
averageUtilization: 50 is measured against the Deployment’s resources.requests.cpu (50m), so 50% = 25m. If the average Pod CPU exceeds 25m, scaling up begins.
kubectlgetpodsโwhat's the STATUS?โstatustellsyouthenextstep:ImagePullBackOffโimagenamewrongorforgotkindloadPendingโkubectldescribepod<name>โEvents:usuallyresourceorschedulingissueCrashLoopBackOffโkubectllogs<name>--previousโapplicationerrorinthelogsRunningbutnotworkingโkubectllogs<name>โcheckapplicationlogicdon't know where to startโkubectlgetevents--sort-by='.lastTimestamp'โseeeverythingthathappenedrecently
Frequently Used Diagnostic Commands
Situation
Command
Pod status looks wrong
kubectl get pods
Why won’t it start
kubectl describe pod <name>
Application error
kubectl logs <name>
Log from before a crash
kubectl logs <name> --previous
Live observation
kubectl get pods --watch
Cluster-wide activity
kubectl get events --sort-by='.lastTimestamp'
Shell into a container
kubectl exec -it <name> -- sh
Check backends behind an endpoint
kubectl describe svc <name>
View full YAML
kubectl get <resource> <name> -o yaml
kubectl exec lets you drop inside a container to look around. But if the image is built from scratch and has no shell, you cannot enter it โ which is also one reason scratch images are more secure.
Caveats for Reading Container Output
1
2
3
4
5
6
7
8
9
10
11
# wrong โ "go-api" is Deployment name, not Pod namekubectl logs go-api
# correct โ use actual Pod namekubectl logs go-api-65577fc4f9-k9f9p
# shortcut โ pick a Pod from Deployment automaticallykubectl logs deployment/go-api
# follow mode (like tail -f)kubectl logs deployment/go-api -f
Common Questions
How Do Local Multi-Container Runs Relate to Container Orchestration?
Completely different tools, solving problems at different stages:
Terraform runs first: creates EKS cluster + RDS + VPC
โ
K8s YAML runs next: deploys your app inside the cluster
โ
Docker Compose: unrelated, only used on your laptop
Can Local Cluster Manifests Move Straight to Managed Clusters?
YAML can move straight over; the cluster itself cannot. K8s is standardized โ no matter what runs underneath (kind, GKE, EKS), the kubectl apply YAML format is identical.
Things that need adjustment:
Item
kind
GKE/EKS (production)
Image source
kind load from local machine
Container Registry (GCR/ECR)
Service type
ClusterIP + port-forward
LoadBalancer
Resource requests
Set casually
Tune to actual load
Ingress
Not needed
Domain, HTTPS
Secrets
Hard-coded or unused
Secret Manager (follow least privilege)
Core Deployment, ReplicaSet, and autoscaling logic does not change.
Does the Container Runtime Use the Desktop Engine Internally?
K8s uses containerd internally (via the CRI interface), not Docker.
1
2
3
4
5
6
7
8
your laptop
โโโ Docker daemon
โ โโโ devops-lab-control-plane โ this is a Docker container (kind)
โ โโโ K8s cluster
โ โโโ containerd โ K8s uses this, not Docker
โ โโโ Pod 1
โ โโโ Pod 2
โ โโโ Pod 3
docker stop devops-lab-control-plane stops the entire kind cluster container, not an individual Pod. All operations inside K8s use kubectl.
In production (GKE/EKS) you will not touch the docker command at all.
Teardown Instructions
Removing Your Application (Keep the Environment Running)
kubectl get pods
kubectl get pods -o wide
kubectl describe pod <name>
kubectl logs <name>
kubectl logs <name> --previous
kubectl apply -f <file>
kubectl delete -f <file>
kubectl get events --sort-by='.lastTimestamp'
Concepts Worth Committing to Memory
1
2
3
4
5
6
Deployment โ ReplicaSet โ Pod โ Container
desired state vs actual state โ reconciliation loop
Pod is ephemeral โ IP changes โ need Service
Service finds Pods by label selector
Rolling update: new Pod ready โ then kill old Pod
CrashLoopBackOff: exponential backoff (10s โ 20s โ 40s โ ... โ 5min max)
Don’t Memorize: Look It Up
1
2
3
kubectl explain deployment.spec # YAML format referencekubectl get deploy <name> -o yaml # see full YAML of any resourcekubectl api-resources # all resource types and abbreviations