Kubernetes入门到生产实践:从零搭建高可用集群(2026版)
Kubernetes入门到生产实践:从零搭建高可用集群(2026版)
Kubernetes(简称K8s)已成为容器编排的事实标准。然而,从学习K8s到真正用于生产,中间有巨大的鸿沟。本文从零开始,带你搭建一个生产级别的K8s集群,涵盖安装、配置、应用部署、监控和排障全流程。
一、Kubernetes核心概念速览
| 概念 | 说明 | 类比 |
|---|---|---|
| Pod | 最小部署单元,包含一个或多个容器 | 虚拟机的一个进程 |
| Deployment | 管理Pod的声明式更新 | 自动伸缩组 |
| Service | 稳定的网络端点,负载均衡 | 反向代理 |
| Ingress | 七层HTTP路由,域名访问入口 | Nginx配置 |
| ConfigMap/Secret | 配置和敏感信息管理 | 配置文件/密码本 |
| PV/PVC | 持久化存储 | 磁盘/挂载请求 |
| Namespace | 资源隔离 | 项目分区 |
二、环境准备:3节点集群规划
# 节点规划 +----------------+------------------+--------+ | 角色 | IP | 配置 | +----------------+------------------+--------+ | Control Plane | 192.168.1.10 | 4C 8G | | Worker-1 | 192.168.1.11 | 8C 16G | | Worker-2 | 192.168.1.12 | 8C 16G | +----------------+------------------+--------+ # 所有节点基础配置 cat >> /etc/hosts << EOF 192.168.1.10 k8s-master 192.168.1.11 k8s-node1 192.168.1.12 k8s-node2 EOF # 关闭防火墙和swap systemctl stop firewalld && systemctl disable firewalld swapoff -a && sed -i '/swap/s/^/#/' /etc/fstab # 配置内核参数 cat >> /etc/sysctl.d/k8s.conf << EOF net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.ipv4.ip_forward = 1 vm.swappiness = 0 EOF sysctl --system
三、容器运行时:containerd
Kubernetes 1.24+已弃用Docker作为运行时,推荐使用containerd。
# 安装containerd(所有节点)
cat > /etc/yum.repos.d/docker.repo << EOF
[docker-ce-stable]
name=Docker CE Stable
baseurl=https://download.docker.com/linux/centos/9/x86_64/stable
enabled=1
gpgcheck=1
gpgkey=https://download.docker.com/linux/centos/gpg
EOF
yum install containerd.io -y
# 生成默认配置
containerd config default > /etc/containerd/config.toml
# 关键配置:使用systemd cgroup驱动
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
# 配置镜像加速
cat >> /etc/containerd/config.toml << 'EOF'
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://docker.m.daocloud.io", "https://registry-1.docker.io"]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."k8s.gcr.io"]
endpoint = ["https://k8s.m.daocloud.io"]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry.k8s.io"]
endpoint = ["https://k8s.m.daocloud.io", "https://registry.k8s.io"]
EOF
systemctl enable containerd && systemctl restart containerd
四、安装Kubernetes(kubeadm方式)
4.1 安装kubeadm/kubelet/kubectl(所有节点)
cat > /etc/yum.repos.d/kubernetes.repo << EOF [kubernetes] name=Kubernetes baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el9-x86_64 enabled=1 gpgcheck=1 gpgkey=https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg EOF # 安装指定版本(推荐使用稳定版) yum install -y kubelet-1.30.0 kubeadm-1.30.0 kubectl-1.30.0 --disableexcludes=kubernetes # 锁定版本防止意外升级 yum install -y yum-plugin-versionlock yum versionlock kubelet kubeadm kubectl systemctl enable kubelet
4.2 初始化集群(Master节点)
# 配置文件 kubeadm-config.yaml cat > kubeadm-config.yaml << EOF apiVersion: kubeadm.k8s.io/v1beta3 kind: InitConfiguration localAPIEndpoint: advertiseAddress: 192.168.1.10 bindPort: 6443 --- apiVersion: kubeadm.k8s.io/v1beta3 kind: ClusterConfiguration kubernetesVersion: 1.30.0 controlPlaneEndpoint: 192.168.1.10:6443 networking: serviceSubnet: 10.96.0.0/12 podSubnet: 10.244.0.0/16 dnsDomain: cluster.local --- apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration cgroupDriver: systemd EOF # 初始化 kubeadm init --config=kubeadm-config.yaml --upload-certs # 成功后会打印加入命令,保存下来 # 类似: # kubeadm join 192.168.1.10:6443 --token xxxxx --discovery-token-ca-cert-hash sha256:xxxxx # 配置kubectl mkdir -p $HOME/.kube cp -i /etc/kubernetes/admin.conf $HOME/.kube/config chown $(id -u):$(id -g) $HOME/.kube/config # 验证 kubectl get nodes kubectl get pods -n kube-system
4.3 加入Worker节点
# 在Worker节点执行init时打印的join命令 kubeadm join 192.168.1.10:6443 --token xxxxx --discovery-token-ca-cert-hash sha256:xxxxx # 如果token过期,在Master重新生成 kubeadm token create --print-join-command # 在Master验证 kubectl get nodes # 输出应显示所有节点 Ready 状态
五、CNI网络插件:Calico
# 安装Calico
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28/manifests/tigera-operator.yaml
cat > custom-resources.yaml << EOF
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
registry: quay.io/
calicoNetwork:
ipPools:
- blockSize: 26
cidr: 10.244.0.0/16
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
nodeSelector: all()
EOF
kubectl create -f custom-resources.yaml
# 验证
kubectl get pods -n calico-system -w
kubectl get nodes # 所有节点应变为 Ready
六、部署示例应用
6.1 Nginx + MySQL 经典架构
# 创建命名空间
kubectl create namespace production
# MySQL部署
cat > mysql-deploy.yaml << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pvc
namespace: production
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: mysql
namespace: production
spec:
replicas: 1
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.4
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: root-password
ports:
- containerPort: 3306
volumeMounts:
- name: mysql-storage
mountPath: /var/lib/mysql
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2"
volumes:
- name: mysql-storage
persistentVolumeClaim:
claimName: mysql-pvc
---
apiVersion: v1
kind: Service
metadata:
name: mysql
namespace: production
spec:
selector:
app: mysql
ports:
- port: 3306
targetPort: 3306
EOF
# 创建Secret
kubectl create secret generic mysql-secret \
--from-literal=root-password='K8s@2026!' \
-n production
# 部署
kubectl apply -f mysql-deploy.yaml
6.2 Nginx + PHP应用
cat > nginx-app.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:alpine
ports:
- containerPort: 80
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: nginx
namespace: production
spec:
type: ClusterIP
selector:
app: nginx
ports:
- port: 80
targetPort: 80
EOF
kubectl apply -f nginx-app.yaml
# 查看Pod
kubectl get pods -n production -w
6.3 Ingress配置(域名访问)
# 安装NGINX Ingress Controller
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.10.0/deploy/static/provider/cloud/deploy.yaml
# 创建Ingress规则
cat > app-ingress.yaml << EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
namespace: production
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- app.yourdomain.com
secretName: app-tls
rules:
- host: app.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: nginx
port:
number: 80
EOF
kubectl apply -f app-ingress.yaml
七、集群监控:Prometheus + Grafana
# 使用 kube-prometheus-stack(推荐)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# 创建监控命名空间
kubectl create namespace monitoring
# 安装
cat > prometheus-values.yaml << EOF
grafana:
adminPassword: Grafana@2026!
ingress:
enabled: true
hosts:
- grafana.yourdomain.com
persistence:
enabled: true
size: 10Gi
storageClassName: standard
prometheus:
prometheusSpec:
retention: 30d
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: standard
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi
EOF
helm install prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--values prometheus-values.yaml
# 验证
kubectl get pods -n monitoring
kubectl get svc -n monitoring
# 访问Grafana
# 浏览器打开 http://grafana.yourdomain.com
# 用户: admin / Grafana@2026!
# 预置的Kubernetes集群监控面板可直接使用
八、日常运维命令
# 查看集群状态
kubectl cluster-info
kubectl get nodes -o wide
kubectl get pods --all-namespaces
# 查看资源使用
kubectl top nodes
kubectl top pods --all-namespaces
# 调试Pod
kubectl logs -f pod-name -n namespace
kubectl exec -it pod-name -n namespace -- /bin/bash
kubectl describe pod pod-name -n namespace
# 端口转发(调试用)
kubectl port-forward svc/nginx -n production 8080:80
# 查看事件
kubectl get events --all-namespaces --sort-by='.lastTimestamp'
# 节点维护
kubectl cordon node-name # 标记不可调度
kubectl drain node-name --ignore-daemonsets # 驱逐Pod
kubectl uncordon node-name # 恢复调度
# 证书管理(使用cert-manager)
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.0/cert-manager.yaml
# 自动签发Let's Encrypt证书
cat > cluster-issuer.yaml << EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@yourdomain.com
privateKeySecretRef:
name: letsencrypt-prod-key
solvers:
- http01:
ingress:
class: nginx
EOF
kubectl apply -f cluster-issuer.yaml
九、常见问题排查
| 问题 | 原因 | 解决 |
|---|---|---|
| Node NotReady | CNI网络插件未部署 | 确认Calico/Flannel正常运行 |
| Pod Pending | 资源不足或PVC未绑定 | 检查节点资源、StorageClass配置 |
| Pod CrashLoopBackOff | 应用启动失败 | kubectl logs查看应用日志 |
| ImagePullBackOff | 镜像拉取失败 | 检查镜像名、仓库认证、网络 |
| DNS解析失败 | CoreDNS异常 | kubectl rollout restart -n kube-system deploy/coredns |
| 证书过期 | kubeadm证书有效期1年 | kubeadm certs renew all && systemctl restart kubelet |
十、生产环境安全清单
- RBAC权限控制: 禁止使用default ServiceAccount,为每个应用创建专用SA
- NetworkPolicy: 默认拒绝入站流量,按需放行
- Pod安全策略: 禁止特权容器、禁止root用户
- 资源限额: 为每个Namespace设置ResourceQuota和LimitRange
- 镜像扫描: 集成Trivy自动扫描镜像漏洞
- 审计日志: 开启K8s审计日志,记录所有API操作
- 密钥管理: 使用External Secrets Operator集成云KMS
- 定期升级: 保持K8s版本在支持周期内(通常每个小版本支持14个月)
Kubernetes学习曲线陡峭,但掌握后能极大提升运维效率。建议从kubeadm搭建开始,逐步熟悉各个组件的工作原理,再考虑生产环境的自动化部署(如Rancher、KubeSpray等工具)。
🔗 相关文章:Docker容器安全指南 | MySQL主从复制与高可用 | Zabbix监控部署