Red Hat OpenShift 4 之後的服務,放在 Red Hat CoreOS (後面簡稱 RHCOS) 裡面運行的服務,絕大部分都是都是以容器 (Container) 的方式運行,但還是有 18 個服務是透過 systemd 運行的,下面列舉服務且標注對維運上比較重要的服務
$ oc debug node/compute-0
Starting pod/compute-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.97.4
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# systemctl list-units --type=service --state=running
UNIT LOAD ACTIVE SUB DESCRIPTION
auditd.service loaded active running Security Auditing Service
chronyd.service loaded active running NTP client/server
crio.service loaded active running Open Container Initiative Daemon
dbus.service loaded active running D-Bus System Message Bus
getty@tty1.service loaded active running Getty on tty1
irqbalance.service loaded active running irqbalance daemon
kubelet.service loaded active running Kubernetes Kubelet
NetworkManager.service loaded active running Network Manager
polkit.service loaded active running Authorization Manager
rpc-statd.service loaded active running NFS status monitor for NFSv2/3 locking.
rpcbind.service loaded active running RPC Bind
sshd.service loaded active running OpenSSH server daemon
sssd.service loaded active running System Security Services Daemon
systemd-journald.service loaded active running Journal Service
systemd-logind.service loaded active running Login Service
systemd-udevd.service loaded active running udev Kernel Device Manager
vgauthd.service loaded active running VGAuth Service for open-vm-tools
vmtoolsd.service loaded active running Service for virtual machines hosted on VMware
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
18 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
# Example:# oc adm node-logs -u <service> <node-ip># oc adm node-logs <node-ip> <log-name> --path=kube-apiserver/<log-name># Show sshd logs from specific node$ocadmnode-logs-usshdcompute-0
# Show crio logs from all master nodes$ocadmnode-logs-ucrio--rolemaster
# Show ALL logs from specific node$ocadmnode-logscompute-0
應用程式之 Pod 故障排除
有時候運行一個應用程式,但啟動失敗 (Failing),而且容器也被終止掉 (Terminating),依據 Pod Restart Policy 會嘗試繼續重新啟動。如果這種狀況一直持續發生的話,則部署則為失敗的。OpenShift 會將 Pod 的狀態 (Status) 會標註為 CrashLoopBackOff 來表示
$ocgetpods
NAMEREADYSTATUSRESTARTSAGE
welcome-1-deploy0/1Completed01m
welcome-1-hh7h81/1Running00s
$ocgetdc
NAMEREVISIONDESIREDCURRNETTRIGGEREDBY
welcome111config,image(welcome:latest)# Debug a pod$ocdebugpod/welcome-1-hh7h8
# See the pod that would be created to debug$ocdebugpod/welcome-1-hh7h8-oyaml
# Debug a currently running deployment by creating a new pod$ocdebugdc/welcome
$ocdebugdc/welcome--as-root
$ocdebugdc/welcome--as-user=10000
# Every new OpenShift Container Platform installation has a "dns.operator" named "default".$ocdescribedns.operator/default
Name:default
...
APIVersion:operator.openshift.io/v1
Kind:DNS
...
Spec:
Status:
ClusterDomain:cluster.local
ClusterIP:172.30.0.10
...
CoreDNS 主要處理 OpenShift 內部服務的名稱解析,包含以下 4 類:
cluster.local
<project>.cluster.local
<service>.<project>.cluster.local
<pod>.<project>.cluster.local
了解當下網路資訊 - Cluster Network Operator
通常來說,建立 OpenShift 需要三段網段,而所有 IP 網段都不能重複,分別是下列:
Node Subnet:給實體主機用的 IP
Service Subnet:給 Kubernetes Service 使用的 IP,對照設定檔名字為 serviceNetwork
按照上述輸出,可以了解到以下幾件事
1. clusterNetwork 也就是 Pod 實際上能使用的 IP 為 10.128.0.0/14,共 262142 個 Pod IP
2. networkType 也就是 CNI Plugin 的名稱,這邊是使用原生的 OpenShift SDN
3. serviceNetwork 也就是 Kubernetes Service 可以使用的 IP 為 172.30.0.0/16,共可以開啟 65534 個 Service IP
4. hostPrefix 比較難理解,主要是代表每一個節點 (Node) 所能拿到的 Pod IP 為多少,例如這邊是 /23,所以每個節點可以運行 510 個 Pod IP 的 Pod,若改成 /22,則每個節點可運行約 1000 左右的 Pod。有些人會問夠不夠用,其實就預設 OpenShift 4.3 - Recommended host practices 上,一個節點的上限是跑 kubeletConfig.maxPods: 250,最高 250,所以撞不到天花板