691 lines
14 KiB
Markdown
691 lines
14 KiB
Markdown
# JPD集群K3s自动化部署指南
|
||
|
||
本文档指导你在新的JPD集群上部署K3s并实现GitOps自动化。
|
||
|
||
## 集群信息
|
||
|
||
### 节点配置
|
||
|
||
| 角色 | 主机名 | 公网IP | 内网IP | 域名 |
|
||
|------|--------|--------|--------|------|
|
||
| Master | k3s-master-01 | 149.13.91.216 | 10.198.0.112 | *.jpd1.net3w.com |
|
||
| Worker1 | k3s-worker-01 | 149.13.91.64 | 10.198.0.175 | *.jpd2.net3w.com |
|
||
| Worker2 | k3s-worker-02 | 149.13.91.59 | 10.198.0.111 | *.jpd3.net3w.com |
|
||
|
||
### 服务域名
|
||
|
||
- **主域名**: *.jpd.net3w.com
|
||
- **Gitea**: git.jpd.net3w.com
|
||
- **ArgoCD**: argocd.jpd.net3w.com
|
||
- **测试应用**: ng.jpd.net3w.com, test.jpd.net3w.com, demo.jpd.net3w.com
|
||
|
||
---
|
||
|
||
## 部署前准备
|
||
|
||
### 1. 配置DNS解析
|
||
|
||
在域名服务商控制台添加以下DNS记录:
|
||
|
||
```
|
||
# 泛域名解析(推荐)
|
||
*.jpd.net3w.com A 149.13.91.216
|
||
*.jpd1.net3w.com A 149.13.91.216
|
||
*.jpd2.net3w.com A 149.13.91.64
|
||
*.jpd3.net3w.com A 149.13.91.59
|
||
|
||
# 或者单独配置服务域名
|
||
git.jpd.net3w.com A 149.13.91.216
|
||
argocd.jpd.net3w.com A 149.13.91.216
|
||
ng.jpd.net3w.com A 149.13.91.216
|
||
test.jpd.net3w.com A 149.13.91.216
|
||
demo.jpd.net3w.com A 149.13.91.216
|
||
```
|
||
|
||
### 2. 验证服务器连接
|
||
|
||
```bash
|
||
# 测试SSH连接
|
||
ssh fei@149.13.91.216 # Master
|
||
ssh fei@149.13.91.64 # Worker1
|
||
ssh fei@149.13.91.59 # Worker2
|
||
|
||
# 如果连接成功,退出
|
||
exit
|
||
```
|
||
|
||
### 3. 检查服务器配置
|
||
|
||
```bash
|
||
# 在每个节点上执行
|
||
ssh fei@149.13.91.216 "uname -a && free -h && df -h"
|
||
ssh fei@149.13.91.64 "uname -a && free -h && df -h"
|
||
ssh fei@149.13.91.59 "uname -a && free -h && df -h"
|
||
```
|
||
|
||
---
|
||
|
||
## 快速部署(推荐)
|
||
|
||
### 方式1: 使用一键部署脚本
|
||
|
||
```bash
|
||
# 进入项目目录
|
||
cd /home/fei/opk3s/k3s自动化部署
|
||
|
||
# 使用JPD集群配置
|
||
cp config/jpd-cluster-vars.yml config/cluster-vars.yml
|
||
|
||
# 运行一键部署脚本
|
||
./scripts/deploy-all.sh
|
||
|
||
# 脚本会自动完成:
|
||
# 1. 生成Ansible inventory
|
||
# 2. 部署K3s集群
|
||
# 3. 配置kubectl
|
||
# 4. 部署Gitea
|
||
# 5. 部署ArgoCD
|
||
# 6. 配置HTTPS
|
||
# 7. 部署测试应用
|
||
```
|
||
|
||
### 方式2: 分步部署
|
||
|
||
如果需要更细粒度的控制,可以分步执行:
|
||
|
||
```bash
|
||
cd /home/fei/opk3s/k3s自动化部署
|
||
|
||
# 1. 使用JPD集群配置
|
||
cp config/jpd-cluster-vars.yml config/cluster-vars.yml
|
||
|
||
# 2. 生成Ansible inventory
|
||
python3 scripts/generate-inventory.py
|
||
|
||
# 3. 部署K3s集群
|
||
cd k3s-ansible
|
||
ansible-playbook playbooks/site.yml -i inventory.yml
|
||
|
||
# 4. 配置kubectl(在本地机器)
|
||
cd ..
|
||
mkdir -p ~/.kube
|
||
scp fei@149.13.91.216:/etc/rancher/k3s/k3s.yaml ~/.kube/config-jpd
|
||
sed -i 's/127.0.0.1/149.13.91.216/g' ~/.kube/config-jpd
|
||
export KUBECONFIG=~/.kube/config-jpd
|
||
|
||
# 5. 验证集群
|
||
kubectl get nodes -o wide
|
||
|
||
# 6. 部署Gitea
|
||
./scripts/deploy-gitea.sh
|
||
|
||
# 7. 部署ArgoCD
|
||
./scripts/deploy-argocd.sh
|
||
|
||
# 8. 配置HTTPS
|
||
./scripts/deploy-https.sh
|
||
|
||
# 9. 部署测试应用
|
||
./scripts/deploy-test-app.sh
|
||
./scripts/deploy-nginx-app.sh
|
||
```
|
||
|
||
---
|
||
|
||
## 部署步骤详解
|
||
|
||
### 步骤1: 准备配置文件
|
||
|
||
```bash
|
||
cd /home/fei/opk3s/k3s自动化部署
|
||
|
||
# 备份原配置(如果需要)
|
||
cp config/cluster-vars.yml config/cluster-vars.yml.jpc.bak
|
||
|
||
# 使用JPD集群配置
|
||
cp config/jpd-cluster-vars.yml config/cluster-vars.yml
|
||
|
||
# 查看配置
|
||
cat config/cluster-vars.yml
|
||
```
|
||
|
||
### 步骤2: 生成Ansible Inventory
|
||
|
||
```bash
|
||
# 生成inventory文件
|
||
python3 scripts/generate-inventory.py
|
||
|
||
# 验证生成的inventory
|
||
cat k3s-ansible/inventory.yml
|
||
```
|
||
|
||
### 步骤3: 部署K3s集群
|
||
|
||
```bash
|
||
cd k3s-ansible
|
||
|
||
# 部署集群
|
||
ansible-playbook playbooks/site.yml -i inventory.yml
|
||
|
||
# 部署过程约需5-10分钟
|
||
# 完成后会看到类似输出:
|
||
# PLAY RECAP *********************************************************************
|
||
# k3s-master-01 : ok=XX changed=XX unreachable=0 failed=0
|
||
# k3s-worker-01 : ok=XX changed=XX unreachable=0 failed=0
|
||
# k3s-worker-02 : ok=XX changed=XX unreachable=0 failed=0
|
||
```
|
||
|
||
### 步骤4: 配置kubectl
|
||
|
||
```bash
|
||
cd /home/fei/opk3s/k3s自动化部署
|
||
|
||
# 创建kubeconfig目录
|
||
mkdir -p ~/.kube
|
||
|
||
# 从master节点复制kubeconfig
|
||
scp fei@149.13.91.216:/etc/rancher/k3s/k3s.yaml ~/.kube/config-jpd
|
||
|
||
# 修改server地址为master公网IP
|
||
sed -i 's/127.0.0.1/149.13.91.216/g' ~/.kube/config-jpd
|
||
|
||
# 设置KUBECONFIG环境变量
|
||
export KUBECONFIG=~/.kube/config-jpd
|
||
|
||
# 或者永久设置
|
||
echo "export KUBECONFIG=~/.kube/config-jpd" >> ~/.bashrc
|
||
source ~/.bashrc
|
||
|
||
# 验证连接
|
||
kubectl get nodes -o wide
|
||
```
|
||
|
||
**预期输出**:
|
||
```
|
||
NAME STATUS ROLES AGE VERSION
|
||
k3s-master-01 Ready control-plane,master 5m v1.28.5+k3s1
|
||
k3s-worker-01 Ready <none> 4m v1.28.5+k3s1
|
||
k3s-worker-02 Ready <none> 4m v1.28.5+k3s1
|
||
```
|
||
|
||
### 步骤5: 部署Gitea
|
||
|
||
```bash
|
||
# 运行Gitea部署脚本
|
||
./scripts/deploy-gitea.sh
|
||
|
||
# 等待Gitea Pod就绪(约3-5分钟)
|
||
watch kubectl get pods -n gitea
|
||
|
||
# 当所有Pod状态为Running时,按Ctrl+C退出
|
||
|
||
# 获取Gitea访问地址
|
||
GITEA_PORT=$(kubectl get svc gitea-http -n gitea -o jsonpath='{.spec.ports[0].nodePort}')
|
||
echo "Gitea访问地址: http://149.13.91.216:$GITEA_PORT"
|
||
echo "Gitea域名访问: http://git.jpd.net3w.com"
|
||
```
|
||
|
||
### 步骤6: 部署ArgoCD
|
||
|
||
```bash
|
||
# 运行ArgoCD部署脚本
|
||
./scripts/deploy-argocd.sh
|
||
|
||
# 等待ArgoCD Pod就绪(约2-3分钟)
|
||
watch kubectl get pods -n argocd
|
||
|
||
# 获取ArgoCD admin密码
|
||
kubectl -n argocd get secret argocd-initial-admin-secret \
|
||
-o jsonpath="{.data.password}" | base64 -d && echo
|
||
|
||
# 访问ArgoCD
|
||
echo "ArgoCD访问地址: https://argocd.jpd.net3w.com"
|
||
echo "用户名: admin"
|
||
echo "密码: (上面显示的密码)"
|
||
```
|
||
|
||
### 步骤7: 配置HTTPS
|
||
|
||
```bash
|
||
# 部署cert-manager和配置HTTPS
|
||
./scripts/deploy-https.sh
|
||
|
||
# 等待证书签发(约1-2分钟)
|
||
watch kubectl get certificate --all-namespaces
|
||
|
||
# 当所有证书状态为True时,按Ctrl+C退出
|
||
```
|
||
|
||
### 步骤8: 部署测试应用
|
||
|
||
```bash
|
||
# 部署nginx测试应用
|
||
./scripts/deploy-nginx-app.sh
|
||
|
||
# 等待应用就绪
|
||
kubectl get pods -l app=nginx-test -n default
|
||
|
||
# 测试访问
|
||
curl http://ng.jpd.net3w.com
|
||
curl https://ng.jpd.net3w.com
|
||
```
|
||
|
||
---
|
||
|
||
## 验证部署
|
||
|
||
### 1. 验证集群状态
|
||
|
||
```bash
|
||
# 查看节点状态
|
||
kubectl get nodes -o wide
|
||
|
||
# 查看所有Pod
|
||
kubectl get pods --all-namespaces
|
||
|
||
# 查看系统组件
|
||
kubectl get pods -n kube-system
|
||
|
||
# 查看资源使用
|
||
kubectl top nodes
|
||
kubectl top pods --all-namespaces
|
||
```
|
||
|
||
### 2. 验证Gitea
|
||
|
||
```bash
|
||
# 获取Gitea NodePort
|
||
GITEA_PORT=$(kubectl get svc gitea-http -n gitea -o jsonpath='{.spec.ports[0].nodePort}')
|
||
echo "Gitea NodePort: $GITEA_PORT"
|
||
|
||
# 测试访问
|
||
curl -I http://149.13.91.216:$GITEA_PORT
|
||
curl -I http://git.jpd.net3w.com
|
||
|
||
# 浏览器访问
|
||
echo "在浏览器中访问: http://git.jpd.net3w.com"
|
||
echo "用户名: gitea_admin"
|
||
echo "密码: GitAdmin@2026"
|
||
```
|
||
|
||
### 3. 验证ArgoCD
|
||
|
||
```bash
|
||
# 获取ArgoCD密码
|
||
ARGOCD_PASSWORD=$(kubectl -n argocd get secret argocd-initial-admin-secret \
|
||
-o jsonpath="{.data.password}" | base64 -d)
|
||
|
||
echo "ArgoCD访问地址: https://argocd.jpd.net3w.com"
|
||
echo "用户名: admin"
|
||
echo "密码: $ARGOCD_PASSWORD"
|
||
|
||
# 测试访问
|
||
curl -k -I https://argocd.jpd.net3w.com
|
||
```
|
||
|
||
### 4. 验证应用
|
||
|
||
```bash
|
||
# 查看所有Ingress
|
||
kubectl get ingress --all-namespaces
|
||
|
||
# 测试应用访问
|
||
curl http://ng.jpd.net3w.com
|
||
curl https://ng.jpd.net3w.com
|
||
|
||
# 查看证书状态
|
||
kubectl get certificate --all-namespaces
|
||
```
|
||
|
||
---
|
||
|
||
## 访问信息汇总
|
||
|
||
### 服务访问地址
|
||
|
||
| 服务 | 访问地址 | 用户名 | 密码 |
|
||
|------|----------|--------|------|
|
||
| Gitea | http://git.jpd.net3w.com | gitea_admin | GitAdmin@2026 |
|
||
| ArgoCD | https://argocd.jpd.net3w.com | admin | (见下方命令) |
|
||
| Nginx测试应用 | http://ng.jpd.net3w.com | - | - |
|
||
|
||
### 获取ArgoCD密码
|
||
|
||
```bash
|
||
kubectl -n argocd get secret argocd-initial-admin-secret \
|
||
-o jsonpath="{.data.password}" | base64 -d && echo
|
||
```
|
||
|
||
### SSH访问
|
||
|
||
```bash
|
||
# Master节点
|
||
ssh fei@149.13.91.216
|
||
|
||
# Worker节点
|
||
ssh fei@149.13.91.64
|
||
ssh fei@149.13.91.59
|
||
```
|
||
|
||
---
|
||
|
||
## 部署新应用
|
||
|
||
### 使用手动部署指南
|
||
|
||
参考 [MANUAL-DEPLOYMENT-GUIDE.md](./MANUAL-DEPLOYMENT-GUIDE.md) 创建新应用。
|
||
|
||
### 快速示例
|
||
|
||
```bash
|
||
# 1. 创建项目目录
|
||
mkdir -p ~/my-app/manifests
|
||
|
||
# 2. 创建Kubernetes manifests
|
||
# 参考 MANUAL-DEPLOYMENT-GUIDE.md 中的模板
|
||
|
||
# 3. 在Gitea中创建仓库
|
||
# 访问 http://git.jpd.net3w.com
|
||
|
||
# 4. 推送代码
|
||
cd ~/my-app
|
||
git init -b main
|
||
git add .
|
||
git commit -m "Initial commit"
|
||
git remote add origin http://gitea_admin:GitAdmin%402026@149.13.91.216:<GITEA_PORT>/k3s-apps/my-app.git
|
||
git push -u origin main
|
||
|
||
# 5. 在ArgoCD中创建Application
|
||
# 访问 https://argocd.jpd.net3w.com
|
||
```
|
||
|
||
---
|
||
|
||
## 常见问题
|
||
|
||
### Q1: DNS解析不生效怎么办?
|
||
|
||
**A**: DNS传播需要时间(5-30分钟)。可以先使用NodePort访问:
|
||
|
||
```bash
|
||
# 获取服务NodePort
|
||
kubectl get svc -n gitea
|
||
kubectl get svc -n argocd
|
||
|
||
# 通过IP:Port访问
|
||
http://149.13.91.216:<NodePort>
|
||
```
|
||
|
||
### Q2: Pod一直处于Pending状态?
|
||
|
||
**A**: 检查节点资源和Pod事件:
|
||
|
||
```bash
|
||
# 查看节点资源
|
||
kubectl top nodes
|
||
|
||
# 查看Pod详情
|
||
kubectl describe pod <pod-name> -n <namespace>
|
||
|
||
# 查看事件
|
||
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
|
||
```
|
||
|
||
### Q3: 证书未签发?
|
||
|
||
**A**: 检查cert-manager和DNS配置:
|
||
|
||
```bash
|
||
# 查看cert-manager日志
|
||
kubectl logs -n cert-manager -l app=cert-manager --tail=50
|
||
|
||
# 查看证书请求
|
||
kubectl get certificaterequest --all-namespaces
|
||
|
||
# 查看证书详情
|
||
kubectl describe certificate <cert-name> -n <namespace>
|
||
```
|
||
|
||
### Q4: 如何切换回JPC集群?
|
||
|
||
**A**: 切换kubeconfig:
|
||
|
||
```bash
|
||
# 切换到JPC集群
|
||
export KUBECONFIG=~/.kube/config
|
||
|
||
# 或者切换到JPD集群
|
||
export KUBECONFIG=~/.kube/config-jpd
|
||
|
||
# 验证当前集群
|
||
kubectl cluster-info
|
||
kubectl get nodes
|
||
```
|
||
|
||
### Q5: 如何同时管理多个集群?
|
||
|
||
**A**: 使用kubectl context:
|
||
|
||
```bash
|
||
# 合并kubeconfig
|
||
KUBECONFIG=~/.kube/config:~/.kube/config-jpd kubectl config view --flatten > ~/.kube/config-merged
|
||
cp ~/.kube/config-merged ~/.kube/config
|
||
|
||
# 查看所有context
|
||
kubectl config get-contexts
|
||
|
||
# 切换context
|
||
kubectl config use-context <context-name>
|
||
|
||
# 查看当前context
|
||
kubectl config current-context
|
||
```
|
||
|
||
---
|
||
|
||
## 故障排查
|
||
|
||
### 1. 集群部署失败
|
||
|
||
```bash
|
||
# 查看Ansible日志
|
||
cat k3s-ansible/ansible.log
|
||
|
||
# 检查节点连接
|
||
ansible all -i k3s-ansible/inventory.yml -m ping
|
||
|
||
# 重新部署
|
||
cd k3s-ansible
|
||
ansible-playbook playbooks/reset.yml -i inventory.yml # 清理
|
||
ansible-playbook playbooks/site.yml -i inventory.yml # 重新部署
|
||
```
|
||
|
||
### 2. kubectl连接失败
|
||
|
||
```bash
|
||
# 检查kubeconfig
|
||
cat ~/.kube/config-jpd
|
||
|
||
# 检查master节点K3s服务
|
||
ssh fei@149.13.91.216 "sudo systemctl status k3s"
|
||
|
||
# 检查防火墙
|
||
ssh fei@149.13.91.216 "sudo ufw status"
|
||
|
||
# 测试API连接
|
||
curl -k https://149.13.91.216:6443
|
||
```
|
||
|
||
### 3. Pod无法启动
|
||
|
||
```bash
|
||
# 查看Pod状态
|
||
kubectl get pods --all-namespaces -o wide
|
||
|
||
# 查看Pod日志
|
||
kubectl logs <pod-name> -n <namespace>
|
||
|
||
# 查看Pod事件
|
||
kubectl describe pod <pod-name> -n <namespace>
|
||
|
||
# 查看节点事件
|
||
kubectl get events --all-namespaces --sort-by='.lastTimestamp'
|
||
```
|
||
|
||
### 4. 服务无法访问
|
||
|
||
```bash
|
||
# 查看Service
|
||
kubectl get svc --all-namespaces
|
||
|
||
# 查看Ingress
|
||
kubectl get ingress --all-namespaces
|
||
|
||
# 查看Ingress详情
|
||
kubectl describe ingress <ingress-name> -n <namespace>
|
||
|
||
# 测试Service内部访问
|
||
kubectl run test-pod --rm -it --image=curlimages/curl -- \
|
||
curl http://<service-name>.<namespace>.svc.cluster.local
|
||
```
|
||
|
||
---
|
||
|
||
## 备份和恢复
|
||
|
||
### 备份集群
|
||
|
||
```bash
|
||
# 备份etcd
|
||
ssh fei@149.13.91.216 "sudo k3s etcd-snapshot save --name jpd-backup"
|
||
|
||
# 下载备份
|
||
scp fei@149.13.91.216:/var/lib/rancher/k3s/server/db/snapshots/jpd-backup* ./backups/
|
||
|
||
# 备份Kubernetes资源
|
||
kubectl get all --all-namespaces -o yaml > backups/jpd-all-resources.yaml
|
||
```
|
||
|
||
### 恢复集群
|
||
|
||
参考 [CLUSTER-MIGRATION-GUIDE.md](./CLUSTER-MIGRATION-GUIDE.md) 中的恢复步骤。
|
||
|
||
---
|
||
|
||
## 性能优化
|
||
|
||
### 1. 调整资源限制
|
||
|
||
```bash
|
||
# 编辑Deployment
|
||
kubectl edit deployment <deployment-name> -n <namespace>
|
||
|
||
# 修改resources部分
|
||
resources:
|
||
requests:
|
||
memory: "128Mi"
|
||
cpu: "100m"
|
||
limits:
|
||
memory: "256Mi"
|
||
cpu: "200m"
|
||
```
|
||
|
||
### 2. 配置HPA(水平自动扩缩容)
|
||
|
||
```bash
|
||
# 创建HPA
|
||
kubectl autoscale deployment <deployment-name> \
|
||
--cpu-percent=80 \
|
||
--min=2 \
|
||
--max=10 \
|
||
-n <namespace>
|
||
|
||
# 查看HPA状态
|
||
kubectl get hpa -n <namespace>
|
||
```
|
||
|
||
### 3. 配置节点亲和性
|
||
|
||
```yaml
|
||
# 在Deployment中添加
|
||
spec:
|
||
template:
|
||
spec:
|
||
affinity:
|
||
nodeAffinity:
|
||
requiredDuringSchedulingIgnoredDuringExecution:
|
||
nodeSelectorTerms:
|
||
- matchExpressions:
|
||
- key: kubernetes.io/hostname
|
||
operator: In
|
||
values:
|
||
- k3s-worker-01
|
||
```
|
||
|
||
---
|
||
|
||
## 监控和日志
|
||
|
||
### 1. 查看资源使用
|
||
|
||
```bash
|
||
# 节点资源
|
||
kubectl top nodes
|
||
|
||
# Pod资源
|
||
kubectl top pods --all-namespaces
|
||
|
||
# 持续监控
|
||
watch kubectl top pods --all-namespaces
|
||
```
|
||
|
||
### 2. 查看日志
|
||
|
||
```bash
|
||
# 查看Pod日志
|
||
kubectl logs <pod-name> -n <namespace>
|
||
|
||
# 实时查看日志
|
||
kubectl logs -f <pod-name> -n <namespace>
|
||
|
||
# 查看多个Pod日志
|
||
kubectl logs -l app=<app-name> -n <namespace> --tail=50
|
||
```
|
||
|
||
### 3. 部署监控系统(可选)
|
||
|
||
```bash
|
||
# 部署Prometheus和Grafana
|
||
# 参考官方文档或使用Helm安装
|
||
```
|
||
|
||
---
|
||
|
||
## 下一步
|
||
|
||
1. ✅ 集群已部署完成
|
||
2. ✅ Gitea和ArgoCD已配置
|
||
3. ✅ HTTPS已启用
|
||
4. ✅ 测试应用已部署
|
||
|
||
**现在你可以**:
|
||
- 📝 参考 [MANUAL-DEPLOYMENT-GUIDE.md](./MANUAL-DEPLOYMENT-GUIDE.md) 部署新应用
|
||
- 🔄 使用GitOps工作流管理应用
|
||
- 📊 配置监控和告警
|
||
- 🔐 配置备份策略
|
||
|
||
---
|
||
|
||
## 相关文档
|
||
|
||
- [K3s部署指南](./DEPLOYMENT-GUIDE.md)
|
||
- [手动部署指南](./MANUAL-DEPLOYMENT-GUIDE.md)
|
||
- [集群迁移指南](./CLUSTER-MIGRATION-GUIDE.md)
|
||
- [故障排查指南](./TROUBLESHOOTING-ACCESS.md)
|
||
|
||
---
|
||
|
||
**部署完成时间**: 预计30-60分钟
|
||
**文档版本**: 1.0
|
||
**最后更新**: 2026-02-04
|