Isolating Log Components on Kubernetes Infra Nodes
This guide explains how to isolate logging-related infrastructure components on dedicated Kubernetes infra nodes using labels, taints, and node selectors.
TOC
Objectives
- Isolate resources: Prevent contention with business workloads.
- Enforce stability: Reduce evictions and scheduling conflicts.
- Simplify management: Centralize infra components with consistent scheduling rules.
Prerequisites
- kubectl is configured against the target cluster.
- Infra components are not bound to nodes via local-PV nodeAffinity, or you have accounted for those nodes (see below).
- Planning the infra nodes by referring to the
Check the Local PVs and nodeAffinity
If your components use local storage (for example TopoLVM, local PV), confirm whether PVs have spec.nodeAffinity. If so, either:
- Add all nodes referenced by
pv.spec.nodeAffinity to the infra node group, or
- Redeploy components using a storage class without node affinity (for example Ceph/RBD).
Example (Elasticsearch):
# 1) Get ES PVCs
kubectl get pvc -n cpaas-system | grep elastic
# 2) Inspect one PV
kubectl get pv elasticsearch-log-node-pv-192.168.135.243 -o yaml
If the PV shows:
spec:
local:
path: /cpaas/data/elasticsearch/data
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 192.168.135.243
Then Elasticsearch data is pinned to node 192.168.135.243. Ensure that node is part of the infra node group, or migrate storage.
Add Kafka/ZooKeeper nodes into infra nodes
Due to historical reasons, ensure Kafka and ZooKeeper nodes are also labeled/tainted as infra:
kubectl get nodes -l kafka=true
kubectl get nodes -l zk=true
# Add the listed nodes into infra nodes as above
Move Logging Components to Infra Nodes
ACP logging components tolerate infra taints by default. Use nodeSelector to pin workloads onto infra nodes.
Elasticsearch
# Data nodes
kubectl patch statefulset cpaas-elasticsearch -n cpaas-system \
--type='merge' \
-p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'
# Master nodes (if present)
kubectl patch statefulset cpaas-elasticsearch-master -n cpaas-system \
--type='merge' \
-p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'
# Verify
kubectl get pods -n cpaas-system -o wide | grep cpaas-elasticsearch
Kafka
kubectl patch statefulset cpaas-kafka -n cpaas-system \
--type='merge' \
-p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'
kubectl get pods -n cpaas-system -o wide | grep cpaas-kafka
ZooKeeper
kubectl patch statefulset cpaas-zookeeper -n cpaas-system \
--type='merge' \
-p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'
kubectl get pods -n cpaas-system -o wide | grep cpaas-zookeeper
ClickHouse
kubectl patch chi cpaas-clickhouse -n cpaas-system --type='json' -p='[
{"op":"add","path":"/spec/templates/podTemplates/0/spec/nodeSelector/node-role.kubernetes.io~1infra","value":""},
{"op":"add","path":"/spec/templates/podTemplates/1/spec/nodeSelector/node-role.kubernetes.io~1infra","value":""}
]'
kubectl get pods -n cpaas-system -o wide | grep clickhous
lanaya
kubectl patch deployment lanaya -n cpaas-system \
--type='merge' \
-p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'
kubectl get pods -n cpaas-system -o wide | grep lanaya
razor
# If deployed as Deployment (Elasticsearch backend)
kubectl patch deployment razor -n cpaas-system \
--type='merge' \
-p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'
# If deployed as StatefulSet (ClickHouse backend)
kubectl patch statefulset razor -n cpaas-system \
--type='merge' \
-p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'
kubectl get pods -n cpaas-system -o wide | grep razor
Any other logging component
# Deployment
kubectl patch deployment <deployment-name> -n cpaas-system \
--type='merge' \
-p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'
# StatefulSet
kubectl patch statefulset <statefulset-name> -n cpaas-system \
--type='merge' \
-p='{"spec":{"template":{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}}'
kubectl get pods -n cpaas-system -o wide | grep <deployment-name>
kubectl get pods -n cpaas-system -o wide | grep <statefulset-name>
Evict non-infra workloads already on infra nodes
If some non-infra Pods keep running on infra nodes, trigger a reschedule by updating those workloads (for example, change an annotation) or add/selectors to exclude infra nodes.
Troubleshooting
Common issues and fixes:
| Issue | Diagnosis | Solution |
|---|
| Pods stuck in Pending | kubectl describe pod <pod> | grep Events | Add tolerations or adjust selectors |
| Taint/toleration mismatch | kubectl describe node <node> | grep Taints | Add matching tolerations to the workloads |
| Resource starvation | kubectl top nodes -l node-role.kubernetes.io/infra | Scale infra nodes or tune resource requests |
Example error:
Events:
Warning FailedScheduling 2m default-scheduler 0/3 nodes are available:
3 node(s) had untolerated taint {node-role.kubernetes.io/infra: true}
Fix: add matching tolerations to the workload.