kubectl get pod go-api-xxx -o wide
# IP: 10.244.0.5kubectl delete pod go-api-xxx
kubectl get pod go-api-yyy -o wide
# IP: 10.244.0.8 — changed
現在有 3 個 Pod, 每個 IP 都不同, 而且隨時可能變 其他服務要怎麼連?
1
2
3
4
5
6
7
8
9
10
without Service:
Pod 1: 10.244.0.5 ┐
Pod 2: 10.244.0.6 ├── which IP? what if it changes?
Pod 3: 10.244.0.7 ┘
with Service:
Service: go-api (stable DNS + stable IP)
├── routes to Pod 1
├── routes to Pod 2
└── routes to Pod 3
Service 怎麼找到 Pod
靠 label selector — Service 用 label 匹配 Pod:
1
2
3
4
5
6
7
8
9
10
# service.yamlspec:selector:app:go-api # find all Pods with this label# deployment.yaml (Pod template)template:metadata:labels:app:go-api # matches the selector above
old-pod-1 Running ← 3 old Pods running
new-pod-1 Pending → ContainerCreating → Running ← new Pod 1 ready
old-pod-1 Terminating ← THEN old Pod 1 killed
new-pod-2 Pending → Running ← new Pod 2 ready
old-pod-2 Terminating ← old Pod 2 killed
new-pod-3 Pending → Running ← new Pod 3 ready
old-pod-3 Terminating ← old Pod 3 killed
關鍵: 先確認新 Pod Running, 再殺舊 Pod 這就是零停機
由 Deployment 的 strategy 控制:
1
2
3
4
5
strategy:type:RollingUpdaterollingUpdate:maxSurge:1# at most 1 extra Pod during updatemaxUnavailable:0# no downtime allowed
兩個 ReplicaSet 同時存在
1
kubectl get rs
1
2
go-api-65577fc4f9 3 3 3 2m ← new ReplicaSet (0.0.3), 3 Pods
go-api-668dcc5dd 0 0 0 4d ← old ReplicaSet (0.0.2), scaled to 0
舊的 ReplicaSet 不會被刪, Pod 數量縮到 0 K8s 故意保留它, 讓你可以 rollback
Rollout History 和 Rollback
1
2
3
4
5
6
7
8
9
10
# see revision historykubectl rollout history deployment/go-api
# REVISION 1 ← go-api:0.0.2# REVISION 2 ← go-api:0.0.3# rollback to previous versionkubectl rollout undo deployment/go-api
# check — old ReplicaSet scales back upkubectl get rs
my-pod Running ← container starts
my-pod Error ← container crashed (exit code != 0)
my-pod Running 1 ← kubelet restarted it
my-pod Error ← crashed again
my-pod CrashLoopBackOff ← kubelet: "crashing too often, waiting..."
my-pod Running 2 (14s ago) ← restarted after 14s delay
my-pod Error ← crashed again
my-pod CrashLoopBackOff ← waiting even longer (27s)
三個狀態的意思
STATUS
意思
Error
container 剛死
CrashLoopBackOff
死太多次, kubelet 在等, 還沒重啟
Running
重啟成功
遇到 CrashLoopBackOff 怎麼辦
1
2
# see the logs from the PREVIOUS crashed containerkubectl logs <pod-name> --previous
--previous 是關鍵 — 看上一次 crash 前的 log, 找出程式碼哪裡掛的
內部原理觀察
kube-system 裡的元件
1
kubectl get pods -n kube-system
1
2
3
4
5
6
etcd-control-plane ← database, stores all cluster state
kube-apiserver-control-plane ← front door, all requests go through here
kube-scheduler-control-plane ← decides which Node runs each Pod
kube-controller-manager-control-plane ← runs reconciliation loops
kube-proxy-xxxxx ← network rules (iptables/ipvs)
coredns-xxxxx (x2) ← DNS for service discovery
without HPA:
replicas: 3 → always 3 Pods, even at 3am with zero traffic
with HPA:
traffic high → scale up to 7 Pods
traffic low → scale down to 2 Pods
前置條件: metrics-server
HPA 需要知道每個 Pod 的 CPU 使用率 metrics-server 負責收集這些資料:
1
2
3
4
5
6
7
HPA: "CPU usage is how much?"
↓
metrics-server: "let me ask kubelet on each Node"
↓
kubelet: "Pod A uses 30m CPU, Pod B uses 45m CPU"
↓
HPA: "over threshold, scale up"
沒有 metrics-server, HPA 是瞎的
安裝 metrics-server (kind 環境)
1
2
3
4
5
6
7
8
9
10
# installkubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# kind uses self-signed certs, need to skip TLS verificationkubectl -n kube-system patch deployment metrics-server \
--type='json'\
-p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'# wait for readykubectl -n kube-system rollout status deployment/metrics-server
--kubelet-insecure-tls 只有 kind 需要 production (GKE/EKS) 有正式憑證, 不需要
spec:# /spectemplate:# /spec/templatespec:# /spec/template/speccontainers:# /spec/template/spec/containers- name:metrics-server# /spec/template/spec/containers/0args:# /spec/template/spec/containers/0/args- --cert-dir=/tmp- --kubelet-insecure-tls # ← /args/- means append here
怎麼知道 path? 先用 -o yaml 看結構:
1
kubectl -n kube-system get deployment metrics-server -o yaml
驗證 metrics-server
1
kubectl top pods
有數字就代表 metrics-server 在運作
HPA YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
apiVersion:autoscaling/v2kind:HorizontalPodAutoscalermetadata:name:go-apispec:scaleTargetRef:apiVersion:apps/v1kind:Deploymentname:go-api # which Deployment to scaleminReplicas:2# minimum PodsmaxReplicas:10# maximum Podsmetrics:- type:Resourceresource:name:cputarget:type:UtilizationaverageUtilization:50# scale up when avg CPU > 50%
HPA 不直接管 Pod — 它修改 Deployment 的 replicas 數字, Deployment 再去調整 Pod 數量
averageUtilization: 50 的基準是 Deployment 裡的 resources.requests.cpu (50m) 所以 50% = 25m 如果 Pod 平均 CPU 超過 25m, 就開始 scale up
kubectlgetpods←what's the STATUS?↓statustellsyouthenextstep:ImagePullBackOff→imagenamewrongorforgotkindloadPending→kubectldescribepod<name>→Events:usuallyresourceorschedulingissueCrashLoopBackOff→kubectllogs<name>--previous→applicationerrorinthelogsRunningbutnotworking→kubectllogs<name>→checkapplicationlogicdon't know where to start→kubectlgetevents--sort-by='.lastTimestamp'→seeeverythingthathappenedrecently
# wrong — "go-api" is Deployment name, not Pod namekubectl logs go-api
# correct — use actual Pod namekubectl logs go-api-65577fc4f9-k9f9p
# shortcut — pick a Pod from Deployment automaticallykubectl logs deployment/go-api
# follow mode (like tail -f)kubectl logs deployment/go-api -f
Terraform runs first: creates EKS cluster + RDS + VPC
↓
K8s YAML runs next: deploys your app inside the cluster
↓
Docker Compose: unrelated, only used on your laptop
your laptop
├── Docker daemon
│ └── devops-lab-control-plane ← this is a Docker container (kind)
│ └── K8s cluster
│ └── containerd ← K8s uses this, not Docker
│ ├── Pod 1
│ ├── Pod 2
│ └── Pod 3
kubectl get pods
kubectl get pods -o wide
kubectl describe pod <name>
kubectl logs <name>
kubectl logs <name> --previous
kubectl apply -f <file>
kubectl delete -f <file>
kubectl get events --sort-by='.lastTimestamp'
必須記住的概念
1
2
3
4
5
6
Deployment → ReplicaSet → Pod → Container
desired state vs actual state → reconciliation loop
Pod is ephemeral → IP changes → need Service
Service finds Pods by label selector
Rolling update: new Pod ready → then kill old Pod
CrashLoopBackOff: exponential backoff (10s → 20s → 40s → ... → 5min max)
不用背, 查就好
1
2
3
kubectl explain deployment.spec # YAML format referencekubectl get deploy <name> -o yaml # see full YAML of any resourcekubectl api-resources # all resource types and abbreviations