Skip to content

實現不停機的 Pod 更新方式

Source from Learnk8s - Graceful shutdown and zero downtime deployments in Kubernetes

如果你有一個持續不斷的服務,如 Long-lived TCP connections 或者是應用程式關閉要跑很久的,需要 "Graceful shutdown Pod" 的話

不改程式的狀況下,有你可以有 2 個治標不治本的方式來達成這件事

  • 延遲 SIGTERM 的訊號發送,preStop Hook 設定久一點讓他睡一下 (wait)
preStop_sample
apiVersion: v1
kind: Pod
metadata:
  name: lifecycle-demo
spec:
  containers:
  - name: lifecycle-demo-container
    image: nginx
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh","-c","nginx -s quit; while killall -0 nginx; do sleep 10; done"] # (1)
  1. Sleep for 10 seconds before sending SIGTERM to the main process

  2. 延遲 SIGKILL 的發生,把 terminationGracePeriodSeconds (default 30s) 設定長一點,譬如下面的範例

terminationGracePeriodSeconds_sample
---
apiVersion: v1
kind: Pod
metadata:
  name: pods-termination-grace-period-seconds
spec:
  containers:
      image: nginx
      name: pods-termination-grace-period-seconds
  terminationGracePeriodSeconds: 3600  # (1)
  1. Time to wait before moving from a TERM signal to the pod's main process to a KILL signal.

如何選擇?

如果你程式關閉時不需要處理任何狀態或同步資料,那就只要設定 preStop 就好了,如果是那種已知程式關閉已知要跑很久 (>30s) 的程式,建議 preStop 和 terminationGracePeriodSeconds 都要拉長一點,這樣就能確保 Pod 關閉時不會有靈異現象發生

治本作法是: 程式收到 SIGTERM 訊號的時候,程式要自己知道怎麼關閉服務,不要讓底下 Kubernetes 幫你硬關機

Case Study: Nginx Ingress Controller

三管齊下: 使用 preStop Hook + SIGTERM + terminationGracePeriodSeconds

關於 preStop 的處理,節錄 ingress-nginx/deploy/static/provider/cloud/deploy.yaml#L456-L460,這邊會去呼叫 waitshutdown 這個指令,再裡面去處理 SIGTERM 的訊號

ingress-nginx/deploy/static/provider/cloud/deploy.yaml
        lifecycle:
          preStop:
            exec:
              command:
              - /wait-shutdown

關於 terminationGracePeriodSeconds,節錄 ingress-nginx/deploy/static/provider/cloud/deploy.yaml#L512

ingress-nginx/deploy/static/provider/cloud/deploy.yaml
        terminationGracePeriodSeconds: 3600

關於 SIGTERM 的處理,節錄 ingress-nginx/cmd/waitshutdown/main.go#L28-L42

ingress-nginx/cmd/waitshutdown/main.go
func main() {
    err := exec.Command("bash", "-c", "pkill -SIGTERM -f nginx-ingress-controller").Run()
    if err != nil {
        klog.ErrorS(err, "terminating ingress controller")
        os.Exit(1)
    }

    // wait for the NGINX process to terminate
    timer := time.NewTicker(time.Second * 1)
    for range timer.C {
        if !nginx.IsRunning() {
            timer.Stop()
            break
        }
    }

References