首次提交：初始化项目

2026-02-05 00:11:05 +08:00
commit 26eaf8110b
171 changed files with 17105 additions and 0 deletions
--- a/.claude/.claude.md
+++ b/.claude/.claude.md
@@ -0,0 +1,30 @@
 和我对话要用中文,
 需要sudo的时候用  echo "1" | sudo -S  命令
 给当前的主机安装上 一个单体的k3s集群,
 还有我已经把*.u6.net3w.com指向到了这个宿主机,当我部署新项目的收,你要自己增加子域名指向.
 默认都是https访问
 当你创建说明文档的收前面自动加上001 002 等等.再同一个文件夹下的md数量顺序排序
 yaml文件要在对应的分类中建立新目录,用来存放配置文件.新文件夹中的新项目的文件夹要用 001 002之类开头的命名.
 1 一个pg数据库 里面有300个数据库,用户名和数据库名分别位pg001 以此类推到pg300 
  Caddy 只做SSL，Traefik做路由  
  Caddy 接收 HTTPS，然后通过 HTTP (80) 转给 Traefik。
 私有git
 git remote add origin https://git.u6.net3w.com/fei/k3s-configs.git
 git push -u origin main
--- a/.claude/settings.json
+++ b/.claude/settings.json
@@ -0,0 +1,12 @@
 {
  "alwaysThinkingEnabled": true,
  "env": {
    "ANTHROPIC_AUTH_TOKEN": "sk-5WAPtYaCjxXgoJiOz9kVR7Wg0MUTpDNY2MDASCNaNYdtdDxC",
    "ANTHROPIC_BASE_URL": "https://new-api.yuyugod.top",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "claude-haiku-4-5",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "claude-opus-4-5-20251101",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-5-20250929",
    "ANTHROPIC_MODEL": "claude-sonnet-4-5-20250929"
  },
  "model": "claude-sonnet-4-5-20250929"
 }
--- a/.claude/skills/caddy/SKILL.md
+++ b/.claude/skills/caddy/SKILL.md
@@ -0,0 +1,389 @@
 ---
 name: caddy-ssl-termination
 description: 专门用于 Traefik 前置 Caddy 进行 SSL 卸载的架构配置，适用于 K3s 环境。
 ---
 # Caddy SSL Termination Skill
 ## Architecture Overview
 **Setup**: Traefik (routing) → Caddy (HTTPS/SSL termination) → HTTP backend
 - **Caddy**: Handles HTTPS (443) with automatic SSL certificates, forwards to Traefik on HTTP (80)
 - **Traefik**: Routes HTTP traffic to appropriate backend services
 - **Flow**: Internet → Caddy:443 (HTTPS) → Traefik:80 (HTTP) → Backend Pods
 ## Quick Configuration Template
 ### 1. Basic Caddyfile Structure
 ```caddy
 # /etc/caddy/Caddyfile
 # Domain configuration
 example.com {
    reverse_proxy traefik-service:80
 }
 # Multiple domains
 app1.example.com {
    reverse_proxy traefik-service:80
 }
 app2.example.com {
    reverse_proxy traefik-service:80
 }
 # Wildcard subdomain (requires DNS wildcard)
 *.example.com {
    reverse_proxy traefik-service:80
 }
 ```
 ### 2. ConfigMap for Caddyfile
 ```yaml
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: caddy-config
  namespace: default
 data:
  Caddyfile: |
    # Global options
    {
        email your-email@example.com
        # Use Let's Encrypt staging for testing
        # acme_ca https://acme-staging-v02.api.letsencrypt.org/directory
    }
    # Your domains
    example.com {
        reverse_proxy traefik-service:80 {
            header_up Host {host}
            header_up X-Real-IP {remote}
            header_up X-Forwarded-For {remote}
            header_up X-Forwarded-Proto {scheme}
        }
    }
 ```
 ### 3. Caddy Deployment
 ```yaml
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: caddy
  namespace: default
 spec:
  replicas: 1
  selector:
    matchLabels:
      app: caddy
  template:tadata:
      labels:
        app: caddy
    spec:
      containers:
      - name: caddy
        image: caddy:latest
        ports:
        - containerPort: 80
        - containerPort: 443
        - containerPort: 2019  # Admin API
        volumeMounts:
        - name: config
          mountPath: /etc/caddy
        - name: data
          mountPath: /data
        - name: config-cache
          mountPath: /config
      volumes:
      - name: config
        configMap:
          name: caddy-config
      - name: data
        persistentVolumeClaim:
          claimName: caddy-data
      - name: config-cache
        emptyDir: {}
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: caddy
  namespace: default
 spec:
  type: LoadBalancer  # or NodePort
  ports:
  - name: http
    port: 80
    targetPort: 80
  - name: https
    port: 443
    targetPort: 443
  selector:
    app: caddy
 ---
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: caddy-data
  namespace: default
 spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
 ```
 ## Common Operations
 ### Reload Configuration
 After updating the ConfigMap:
 ```bash
 # Method 1: Reload via exec (preserves connections)
 kubectl exec -n default deployment/caddy -- caddy reload --config /etc/caddy/Caddyfile
 # Method 2: Restart pod (brief downtime)
 kubectl rollout restart deployment/caddy -n default
 # Method 3: Delete pod (auto-recreates)
 kubectl delete pod -n default -l app=caddy
 ```
 ### Update Caddyfile
 ```bash
 # Edit ConfigMap
 kubectl edit configmap caddy-config -n default
 # Or apply updated file
 kubectl apply -f caddy-configmap.yaml
 # Then reload
 kubectl exec -n default deployment/caddy -- caddy reload --config /etc/caddy/Caddyfile
 ```
 ### View Logs
 ```bash
 # Follow logs
 kubectl logs -n default -f deployment/caddy
 # Check SSL certificate issues
 kubectl logs -n default deployment/caddy | grep -i "certificate\|acme\|tls"
 ```
 ### Check Configuration
 ```bash
 # Validate Caddyfile syntax
 kubectl exec -n default deployment/caddy -- caddy validate --config /etc/caddy/Caddyfile
 # View current config via API
 kubectl exec -n default deployment/caddy -- curl localhost:2019/config/
 ```
 ## Adding New Domain
 ### Step-by-step Process
 1. **Update DNS**: Point new domain to Caddy's LoadBalancer IP
   ```bash
   kubectl get svc caddy -n default -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
   ```
 2. **Update ConfigMap**: Add new domain block
   ```bash
   kubectl edit configmap caddy-config -n default
   ```
   Add:
   ```caddy
   newapp.example.com {
       reverse_proxy traefik-service:80 {
           header_up Host {host}
           header_up X-Real-IP {remote}
           header_up X-Forwarded-For {remote}
           header_up X-Forwarded-Proto {scheme}
       }
   }
   ```
 3. **Reload Caddy**:
   ```bash
   kubectl exec -n default deployment/caddy -- caddy reload --config /etc/caddy/Caddyfile
   ```
 4. **Verify**: Check logs for certificate acquisition
   ```bash
   kubectl logs -n default deployment/caddy | tail -20
   ```
 ## Traefik Integration
 ### Traefik IngressRoute Example
 ```yaml
 apiVersion: traefik.containo.us/v1alpha1
 kind: IngressRoute
 metadata:
  name: myapp
  namespace: default
 spec:
  entryPoints:
    - web  # HTTP only, Caddy handles HTTPS
  routes:
  - match: Host(`myapp.example.com`)
    kind: Rule
    services:
    - name: myapp-service
      port: 8080
 ```
 ### Important Notes
 - Traefik should listen on HTTP (80) only
 - Caddy handles all HTTPS/SSL
 - Use `Host()` matcher in Traefik to route by domain
 - Caddy forwards the original `Host` header to Traefik
 ## Troubleshooting
 ### SSL Certificate Issues
 ```bash
 # Check certificate status
 kubectl exec -n default deployment/caddy -- caddy list-certificates
 # View ACME logs
 kubectl logs -n default deployment/caddy | grep -i acme
 # Common issues:
 # - Port 80/443 not accessible from internet
 # - DNS not pointing to correct IP
 # - Rate limit hit (use staging CA for testing)
 ```
 ### Configuration Errors
 ```bash
 # Test config before reload
 kubectl exec -n default deployment/caddy -- caddy validate --config /etc/caddy/Caddyfile
 # Check for syntax errors
 kubectl logs -n default deployment/caddy | grep -i error
 ```
 ### Connection Issues
 ```bash
 # Test from inside cluster
 kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl -v http://traefik-service:80
 # Check if Caddy can reach Traefik
 kubectl exec -n default deployment/caddy -- curl -v http://traefik-service:80
 ```
 ## Advanced Configurations
 ### Custom TLS Settings
 ```caddy
 example.com {
    tls {
        protocols tls1.2 tls1.3
        ciphers TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
    }
    reverse_proxy traefik-service:80
 }
 ```
 ### Rate Limiting
 ```caddy
 example.com {
    rate_limit {
        zone dynamic {
            key {remote_host}
            events 100
            window 1m
        }
    }
    reverse_proxy traefik-service:80
 }
 ```
 ### Custom Error Pages
 ```caddy
 example.com {
    handle_errors {
        respond "{err.status_code} {err.status_text}"
    }
    reverse_proxy traefik-service:80
 }
 ```
 ### Health Checks
 ```caddy
 example.com {
    reverse_proxy traefik-service:80 {
        health_uri /health
        health_interval 10s
        health_timeout 5s
    }
 }
 ```
 ## Quick Reference Commands
 ```bash
 # Get Caddy pod name
 kubectl get pods -n default -l app=caddy
 # Reload config
 kubectl exec -n default deployment/caddy -- caddy reload --config /etc/caddy/Caddyfile
 # View current config
 kubectl exec -n default deployment/caddy -- cat /etc/caddy/Caddyfile
 # Check certificates
 kubectl exec -n default deployment/caddy -- caddy list-certificates
 # Restart Caddy
 kubectl rollout restart deployment/caddy -n default
 # Watch logs
 kubectl logs -n default -f deployment/caddy
 # Get LoadBalancer IP
 kubectl get svc caddy -n default -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
 ```
 ## Checklist for New Service
 - [ ] DNS A/AAAA record points to Caddy LoadBalancer IP
 - [ ] Domain added to Caddy ConfigMap
 - [ ] Caddy reloaded successfully
 - [ ] Traefik IngressRoute created (HTTP only)
 - [ ] Backend service accessible from Traefik
 - [ ] SSL certificate acquired (check logs)
 - [ ] HTTPS access working from browser
 - [ ] HTTP redirects to HTTPS (Caddy default behavior)
 ## Notes
 - Caddy automatically obtains and renews SSL certificates
 - Certificates stored in `/data` volume (must be persistent)
 - Reload is graceful - no connection drops
 - Caddy handles HTTP→HTTPS redirect automatically
 - Use staging CA for testing to avoid rate limits
 - Wildcard certificates require DNS challenge (more complex setup)
--- a/.claude/skills/environment/sudo-pref.md
+++ b/.claude/skills/environment/sudo-pref.md
@@ -0,0 +1,14 @@
 ---
 name: sudo-automated-preference
 description: 强制 Claude Code 在需要 sudo 权限执行命令时，使用特定的管道方式自动输入密码。
 ---
 # Sudo Automation Skill
 ## Usage Guidelines
 > 当你需要执行任何需要 `sudo` 权限的 shell 命令时，请严格遵守此格式。
 ## 强制命令格式
 严禁直接使用 `sudo <command>`，必须使用以下格式：
 ```bash
 echo "1" | sudo -S <command>
--- a/.claude/skills/minio/s3.md
+++ b/.claude/skills/minio/s3.md
@@ -0,0 +1,789 @@
 ---
 name: minio-s3-expert
 description: 提供 MinIO 对象存储的配置、Bucket 管理及 S3 API 调用建议。
 ---
 # MinIO S3 Object Storage Skill
 ## Architecture Overview
 **Setup**: Caddy (HTTPS/SSL) → Traefik (routing) → MinIO (S3 storage)
 - **MinIO**: S3-compatible object storage with web console
 - **Caddy**: Handles HTTPS (443) with automatic SSL certificates
 - **Traefik**: Routes HTTP traffic to MinIO services
 - **Policy Manager**: Automatically sets new buckets to public-read (download) permission
 - **Flow**: Internet → Caddy:443 (HTTPS) → Traefik:80 (HTTP) → MinIO (9000: API, 9001: Console)
 ## Quick Deployment Template
 ### 1. Complete MinIO Deployment YAML
 ```yaml
 apiVersion: v1
 kind: Namespace
 metadata:
  name: minio
 ---
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: minio-data
  namespace: minio
 spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: local-path
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: minio
  namespace: minio
 spec:
  replicas: 1
  selector:
    matchLabels:
      app: minio
  template:
    metadata:
      labels:
        app: minio
    spec:
      containers:
      - name: minio
        image: minio/minio:latest
        command:
        - /bin/sh
        - -c
        - minio server /data --console-address ":9001"
        ports:
        - containerPort: 9000
          name: api
        - containerPort: 9001
          name: console
        env:
        - name: MINIO_ROOT_USER
          value: "admin"
        - name: MINIO_ROOT_PASSWORD
          value: "your-password-here"
        - name: MINIO_SERVER_URL
          value: "https://s3.yourdomain.com"
        - name: MINIO_BROWSER_REDIRECT_URL
          value: "https://console.s3.yourdomain.com"
        volumeMounts:
        - name: data
          mountPath: /data
        livenessProbe:
          httpGet:
            path: /minio/health/live
            port: 9000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /minio/health/ready
            port: 9000
          initialDelaySeconds: 10
          periodSeconds: 5
      - name: policy-manager
        image: alpine:latest
        command:
        - /bin/sh
        - -c
        - |
          # Install MinIO Client
          wget https://dl.min.io/client/mc/release/linux-arm64/mc -O /usr/local/bin/mc
          chmod +x /usr/local/bin/mc
          # Wait for MinIO to start
          sleep 10
          # Configure mc client
          mc alias set myminio http://localhost:9000 ${MINIO_ROOT_USER} ${MINIO_ROOT_PASSWORD}
          echo "Policy manager started. Monitoring buckets..."
          # Continuously monitor and set bucket policies
          while true; do
            # Get all buckets
            mc ls myminio 2>/dev/null | awk '{print $NF}' | sed 's/\///' | while read -r BUCKET; do
              if [ -n "$BUCKET" ]; then
                # Check current policy
                POLICY_OUTPUT=$(mc anonymous get myminio/${BUCKET} 2>&1)
                # If private (contains "Access permission for" but not "download")
                if echo "$POLICY_OUTPUT" | grep -q "Access permission for" && ! echo "$POLICY_OUTPUT" | grep -q "download"; then
                  echo "Setting download policy for bucket: ${BUCKET}"
                  mc anonymous set download myminio/${BUCKET}
                fi
              fi
            done
            sleep 30
          done
        env:
        - name: MINIO_ROOT_USER
          value: "admin"
        - name: MINIO_ROOT_PASSWORD
          value: "your-password-here"
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: minio-data
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: minio
  namespace: minio
 spec:
  type: ClusterIP
  ports:
  - port: 9000
    targetPort: 9000
    name: api
  - port: 9001
    targetPort: 9001
    name: console
  selector:
    app: minio
 ---
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: minio-api
  namespace: minio
 spec:
  ingressClassName: traefik
  rules:
  - host: s3.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: minio
            port:
              number: 9000
 ---
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: minio-console
  namespace: minio
 spec:
  ingressClassName: traefik
  rules:
  - host: console.s3.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: minio
            port:
              number: 9001
 ```
 ### 2. Configuration Checklist
 Before deploying, update these values in the YAML:
 **Domains (4 places):**
 - `s3.yourdomain.com` → Your S3 API domain
 - `console.s3.yourdomain.com` → Your console domain
 **Credentials (4 places):**
 - `MINIO_ROOT_USER: "admin"` → Your admin username
 - `MINIO_ROOT_PASSWORD: "your-password-here"` → Your admin password (min 8 chars)
 **Architecture (1 place):**
 - `linux-arm64` → Change based on your CPU:
  - ARM64: `linux-arm64`
  - x86_64: `linux-amd64`
 **Storage (1 place):**
 - `storage: 50Gi` → Adjust storage size as needed
 ## Deployment Steps
 ### 1. Prepare DNS
 Point your domains to the server IP:
 ```bash
 # Add DNS A records
 s3.yourdomain.com        A    your-server-ip
 console.s3.yourdomain.com A    your-server-ip
 ```
 ### 2. Configure Caddy
 Add domains to Caddy ConfigMap:
 ```bash
 kubectl edit configmap caddy-config -n default
 ```
 Add these blocks:
 ```caddy
 s3.yourdomain.com {
    reverse_proxy traefik.kube-system.svc.cluster.local:80 {
        header_up Host {host}
        header_up X-Real-IP {remote}
        header_up X-Forwarded-For {remote}
        header_up X-Forwarded-Proto {scheme}
    }
 }
 console.s3.yourdomain.com {
    reverse_proxy traefik.kube-system.svc.cluster.local:80 {
        header_up Host {host}
        header_up X-Real-IP {remote}
        header_up X-Forwarded-For {remote}
        header_up X-Forwarded-Proto {scheme}
    }
 }
 ```
 Reload Caddy:
 ```bash
 kubectl exec -n default deployment/caddy -- caddy reload --config /etc/caddy/Caddyfile
 ```
 ### 3. Deploy MinIO
 ```bash
 # Apply the configuration
 kubectl apply -f minio.yaml
 # Check deployment status
 kubectl get pods -n minio
 # Wait for pods to be ready
 kubectl wait --for=condition=ready pod -l app=minio -n minio --timeout=300s
 ```
 ### 4. Verify Deployment
 ```bash
 # Check MinIO logs
 kubectl logs -n minio -l app=minio -c minio
 # Check policy manager logs
 kubectl logs -n minio -l app=minio -c policy-manager
 # Check ingress
 kubectl get ingress -n minio
 # Check service
 kubectl get svc -n minio
 ```
 ## Access MinIO
 ### Web Console
 - URL: `https://console.s3.yourdomain.com`
 - Username: Your configured `MINIO_ROOT_USER`
 - Password: Your configured `MINIO_ROOT_PASSWORD`
 ### S3 API Endpoint
 - URL: `https://s3.yourdomain.com`
 - Use with AWS CLI, SDKs, or any S3-compatible client
 ## Bucket Policy Management
 ### Automatic Public-Read Policy
 The policy manager sidecar automatically:
 - Scans all buckets every 30 seconds
 - Sets new private buckets to `download` (public-read) permission
 - Allows anonymous downloads, requires auth for uploads/deletes
 ### Manual Policy Management
 ```bash
 # Get pod name
 POD=$(kubectl get pod -n minio -l app=minio -o jsonpath='{.items[0].metadata.name}')
 # Access MinIO Client in pod
 kubectl exec -n minio $POD -c policy-manager -- mc alias set myminio http://localhost:9000 admin your-password
 # List buckets
 kubectl exec -n minio $POD -c policy-manager -- mc ls myminio
 # Check bucket policy
 kubectl exec -n minio $POD -c policy-manager -- mc anonymous get myminio/bucket-name
 # Set bucket to public-read (download)
 kubectl exec -n minio $POD -c policy-manager -- mc anonymous set download myminio/bucket-name
 # Set bucket to private
 kubectl exec -n minio $POD -c policy-manager -- mc anonymous set private myminio/bucket-name
 # Set bucket to public (read + write)
 kubectl exec -n minio $POD -c policy-manager -- mc anonymous set public myminio/bucket-name
 ```
 ## Using MinIO
 ### Create Bucket via Web Console
 1. Access `https://console.s3.yourdomain.com`
 2. Login with credentials
 3. Click "Buckets" → "Create Bucket"
 4. Enter bucket name
 5. Wait 30 seconds for auto-policy to apply
 ### Upload Files via Web Console
 1. Navigate to bucket
 2. Click "Upload" → "Upload File"
 3. Select files
 4. Files are immediately accessible via public URL
 ### Access Files
 Public URL format:
 ```
 https://s3.yourdomain.com/bucket-name/file-path
 ```
 Example:
 ```bash
 # Upload via console, then access:
 curl https://s3.yourdomain.com/my-bucket/image.png
 ```
 ### Using AWS CLI
 ```bash
 # Configure AWS CLI
 aws configure set aws_access_key_id admin
 aws configure set aws_secret_access_key your-password
 aws configure set default.region us-east-1
 # List buckets
 aws --endpoint-url https://s3.yourdomain.com s3 ls
 # Create bucket
 aws --endpoint-url https://s3.yourdomain.com s3 mb s3://my-bucket
 # Upload file
 aws --endpoint-url https://s3.yourdomain.com s3 cp file.txt s3://my-bucket/
 # Download file
 aws --endpoint-url https://s3.yourdomain.com s3 cp s3://my-bucket/file.txt ./
 # List bucket contents
 aws --endpoint-url https://s3.yourdomain.com s3 ls s3://my-bucket/
 ```
 ### Using MinIO Client (mc)
 ```bash
 # Install mc locally
 wget https://dl.min.io/client/mc/release/linux-amd64/mc
 chmod +x mc
 sudo mv mc /usr/local/bin/
 # Configure alias
 mc alias set myminio https://s3.yourdomain.com admin your-password
 # List buckets
 mc ls myminio
 # Create bucket
 mc mb myminio/my-bucket
 # Upload file
 mc cp file.txt myminio/my-bucket/
 # Download file
 mc cp myminio/my-bucket/file.txt ./
 # Mirror directory
 mc mirror ./local-dir myminio/my-bucket/remote-dir
 ```
 ## Common Operations
 ### View Logs
 ```bash
 # MinIO server logs
 kubectl logs -n minio -l app=minio -c minio -f
 # Policy manager logs
 kubectl logs -n minio -l app=minio -c policy-manager -f
 # Both containers
 kubectl logs -n minio -l app=minio --all-containers -f
 ```
 ### Restart MinIO
 ```bash
 # Graceful restart
 kubectl rollout restart deployment/minio -n minio
 # Force restart (delete pod)
 kubectl delete pod -n minio -l app=minio
 ```
 ### Scale Storage
 ```bash
 # Edit PVC (note: can only increase, not decrease)
 kubectl edit pvc minio-data -n minio
 # Update storage size
 # Change: storage: 50Gi → storage: 100Gi
 ```
 ### Backup Data
 ```bash
 # Get pod name
 POD=$(kubectl get pod -n minio -l app=minio -o jsonpath='{.items[0].metadata.name}')
 # Copy data from pod
 kubectl cp minio/$POD:/data ./minio-backup -c minio
 # Or use mc mirror
 mc mirror myminio/bucket-name ./backup/bucket-name
 ```
 ### Restore Data
 ```bash
 # Copy data to pod
 kubectl cp ./minio-backup minio/$POD:/data -c minio
 # Restart MinIO
 kubectl rollout restart deployment/minio -n minio
 # Or use mc mirror
 mc mirror ./backup/bucket-name myminio/bucket-name
 ```
 ## Troubleshooting
 ### Pod Not Starting
 ```bash
 # Check pod status
 kubectl describe pod -n minio -l app=minio
 # Check events
 kubectl get events -n minio --sort-by='.lastTimestamp'
 # Common issues:
 # - PVC not bound (check storage class)
 # - Image pull error (check network/registry)
 # - Resource limits (check node resources)
 ```
 ### Cannot Access Web Console
 ```bash
 # Check ingress
 kubectl get ingress -n minio
 kubectl describe ingress minio-console -n minio
 # Check service
 kubectl get svc -n minio
 # Test from inside cluster
 kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl -v http://minio.minio.svc.cluster.local:9001
 # Check Caddy logs
 kubectl logs -n default -l app=caddy | grep -i s3
 # Check Traefik logs
 kubectl logs -n kube-system -l app.kubernetes.io/name=traefik
 ```
 ### SSL Certificate Issues
 ```bash
 # Check Caddy certificates
 kubectl exec -n default deployment/caddy -- caddy list-certificates
 # Check Caddy logs for ACME
 kubectl logs -n default deployment/caddy | grep -i "s3\|acme\|certificate"
 # Verify DNS resolution
 nslookup s3.yourdomain.com
 nslookup console.s3.yourdomain.com
 ```
 ### Policy Manager Not Working
 ```bash
 # Check policy manager logs
 kubectl logs -n minio -l app=minio -c policy-manager
 # Manually test mc commands
 POD=$(kubectl get pod -n minio -l app=minio -o jsonpath='{.items[0].metadata.name}')
 kubectl exec -n minio $POD -c policy-manager -- mc ls myminio
 # Restart policy manager (restart pod)
 kubectl delete pod -n minio -l app=minio
 ```
 ### Files Not Accessible
 ```bash
 # Check bucket policy
 kubectl exec -n minio $POD -c policy-manager -- mc anonymous get myminio/bucket-name
 # Should show: Access permission for `myminio/bucket-name` is set to `download`
 # If not, manually set
 kubectl exec -n minio $POD -c policy-manager -- mc anonymous set download myminio/bucket-name
 # Test access
 curl -I https://s3.yourdomain.com/bucket-name/file.txt
 ```
 ## Advanced Configuration
 ### Custom Storage Class
 ```yaml
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: minio-data
  namespace: minio
 spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: fast-ssd  # Custom storage class
 ```
 ### Resource Limits
 ```yaml
 containers:
 - name: minio
  image: minio/minio:latest
  resources:
    requests:
      memory: "512Mi"
      cpu: "500m"
    limits:
      memory: "2Gi"
      cpu: "2000m"
 ```
 ### Multiple Replicas (Distributed Mode)
 For production, use distributed MinIO:
 ```yaml
 # Requires 4+ nodes with persistent storage
 command:
 - /bin/sh
 - -c
 - minio server http://minio-{0...3}.minio.minio.svc.cluster.local/data --console-address ":9001"
 ```
 ### Custom Bucket Policies
 Create custom policy JSON:
 ```json
 {
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {"AWS": ["*"]},
      "Action": ["s3:GetObject"],
      "Resource": ["arn:aws:s3:::bucket-name/*"]
    }
  ]
 }
 ```
 Apply via mc:
 ```bash
 kubectl exec -n minio $POD -c policy-manager -- mc anonymous set-json policy.json myminio/bucket-name
 ```
 ### Disable Auto-Policy Manager
 Remove the `policy-manager` container from deployment if you want manual control.
 ## Best Practices
 ### Bucket Naming
 - Use lowercase letters, numbers, hyphens
 - Avoid underscores, spaces, special characters
 - Keep names short and descriptive
 - Example: `user-uploads`, `static-assets`, `backups-2024`
 ### Folder Structure
 Use prefixes (folders) to organize files:
 ```
 bucket-name/
 ├── user1/
 │   ├── profile.jpg
 │   └── documents/
 ├── user2/
 │   └── avatar.png
 └── shared/
    └── logo.png
 ```
 ### Security
 - Change default credentials immediately
 - Use strong passwords (16+ characters)
 - Create separate access keys for applications
 - Use bucket policies to restrict access
 - Enable versioning for important buckets
 - Regular backups of critical data
 ### Performance
 - Use CDN for frequently accessed files
 - Enable compression for text files
 - Use appropriate storage class
 - Monitor disk usage and scale proactively
 ## Quick Reference Commands
 ```bash
 # Deploy MinIO
 kubectl apply -f minio.yaml
 # Check status
 kubectl get pods -n minio
 kubectl get svc -n minio
 kubectl get ingress -n minio
 # View logs
 kubectl logs -n minio -l app=minio -c minio -f
 kubectl logs -n minio -l app=minio -c policy-manager -f
 # Restart MinIO
 kubectl rollout restart deployment/minio -n minio
 # Get pod name
 POD=$(kubectl get pod -n minio -l app=minio -o jsonpath='{.items[0].metadata.name}')
 # Access mc client
 kubectl exec -n minio $POD -c policy-manager -- mc ls myminio
 # Check bucket policy
 kubectl exec -n minio $POD -c policy-manager -- mc anonymous get myminio/bucket-name
 # Set bucket policy
 kubectl exec -n minio $POD -c policy-manager -- mc anonymous set download myminio/bucket-name
 # Delete deployment
 kubectl delete -f minio.yaml
 ```
 ## Integration Examples
 ### Node.js (AWS SDK)
 ```javascript
 const AWS = require('aws-sdk');
 const s3 = new AWS.S3({
  endpoint: 'https://s3.yourdomain.com',
  accessKeyId: 'admin',
  secretAccessKey: 'your-password',
  s3ForcePathStyle: true,
  signatureVersion: 'v4'
 });
 // Upload file
 s3.putObject({
  Bucket: 'my-bucket',
  Key: 'file.txt',
  Body: 'Hello World'
 }, (err, data) => {
  if (err) console.error(err);
  else console.log('Uploaded:', data);
 });
 // Download file
 s3.getObject({
  Bucket: 'my-bucket',
  Key: 'file.txt'
 }, (err, data) => {
  if (err) console.error(err);
  else console.log('Content:', data.Body.toString());
 });
 ```
 ### Python (boto3)
 ```python
 import boto3
 s3 = boto3.client('s3',
    endpoint_url='https://s3.yourdomain.com',
    aws_access_key_id='admin',
    aws_secret_access_key='your-password'
 )
 # Upload file
 s3.upload_file('local-file.txt', 'my-bucket', 'remote-file.txt')
 # Download file
 s3.download_file('my-bucket', 'remote-file.txt', 'downloaded.txt')
 # List objects
 response = s3.list_objects_v2(Bucket='my-bucket')
 for obj in response.get('Contents', []):
    print(obj['Key'])
 ```
 ### Go (minio-go)
 ```go
 package main
 import (
    "github.com/minio/minio-go/v7"
    "github.com/minio/minio-go/v7/pkg/credentials"
 )
 func main() {
    client, _ := minio.New("s3.yourdomain.com", &minio.Options{
        Creds:  credentials.NewStaticV4("admin", "your-password", ""),
        Secure: true,
    })
    // Upload file
    client.FPutObject(ctx, "my-bucket", "file.txt", "local-file.txt", minio.PutObjectOptions{})
    // Download file
    client.FGetObject(ctx, "my-bucket", "file.txt", "downloaded.txt", minio.GetObjectOptions{})
 }
 ```
 ## Notes
 - MinIO is fully S3-compatible
 - Automatic SSL via Caddy
 - Auto-policy sets buckets to public-read by default
 - Policy manager runs every 30 seconds
 - Persistent storage required for data retention
 - Single replica suitable for development/small deployments
 - Use distributed mode for production high-availability
 - Regular backups recommended for critical data
--- a/002-infra/001-registry/auth/htpasswd
+++ b/002-infra/001-registry/auth/htpasswd
--- a/002-infra/001-registry/cors-middleware.yaml
+++ b/002-infra/001-registry/cors-middleware.yaml
@@ -0,0 +1,29 @@
 # Traefik Middleware - CORS 配置
 apiVersion: traefik.io/v1alpha1
 kind: Middleware
 metadata:
  name: cors-headers
  namespace: registry-system
 spec:
  headers:
    accessControlAllowMethods:
      - "GET"
      - "HEAD"
      - "POST"
      - "PUT"
      - "DELETE"
      - "OPTIONS"
    accessControlAllowOriginList:
      - "http://registry.u6.net3w.com"
      - "https://registry.u6.net3w.com"
    accessControlAllowCredentials: true
    accessControlAllowHeaders:
      - "Authorization"
      - "Content-Type"
      - "Accept"
      - "Cache-Control"
    accessControlExposeHeaders:
      - "Docker-Content-Digest"
      - "WWW-Authenticate"
    accessControlMaxAge: 100
    addVaryHeader: true
--- a/002-infra/001-registry/hardcode-secret.yaml
+++ b/002-infra/001-registry/hardcode-secret.yaml
@@ -0,0 +1,10 @@
 apiVersion: v1
 kind: Secret
 metadata:
  name: registry-auth-secret
  namespace: registry-system
 type: Opaque
 stringData:
  # ▼▼▼ 重点：这是 123456 的 bcrypt 加密，直接复制不要改 ▼▼▼
  htpasswd: |
    admin:$2y$05$WSu.LllzUnHQcNPgklqqqum3o69unaC6lCUNz.rRmmq3YhowL99RW
--- a/002-infra/001-registry/note.md
+++ b/002-infra/001-registry/note.md
@@ -0,0 +1,27 @@
 root@98-hk:~/k3s/registry# docker run --rm --entrypoint htpasswd httpd:alpine -Bbn admin 123456
 Unable to find image 'httpd:alpine' locally
 alpine: Pulling from library/httpd
 1074353eec0d: Pull complete 
 0bd765d2a2cb: Pull complete 
 0c4ffdba1e9e: Pull complete 
 4f4fb700ef54: Pull complete 
 0c51c0b07eae: Pull complete 
 e626d5c4ed2c: Pull complete 
 988cd7d09a31: Pull complete 
 Digest: sha256:6b7535d8a33c42b0f0f48ff0067765d518503e465b1bf6b1629230b62a466a87
 Status: Downloaded newer image for httpd:alpine
 admin:$2y$05$yYEah4y9O9F/5TumcJSHAuytQko2MAyFM1MuqgAafDED7Fmiyzzse
 root@98-hk:~/k3s/registry# # 注意：两边要有单引号 ' '
 kubectl create secret generic registry-auth-secret \
  --from-literal=htpasswd='admin:$2y$05$yYEah4y9O9F/5TumcJSHAuytQko2MAyFM1MuqgAafDED7Fmiyzzse' \
  --namespace registry-system
 secret/registry-auth-secret created
 root@98-hk:~/k3s/registry# # 重新部署应用
 kubectl apply -f registry-stack.yaml
 namespace/registry-system unchanged
 persistentvolumeclaim/registry-pvc unchanged
 deployment.apps/registry created
 service/registry-service unchanged
 ingress.networking.k8s.io/registry-ingress unchanged
 root@98-hk:~/k3s/registry# 
--- a/002-infra/001-registry/registry-stack.yaml
+++ b/002-infra/001-registry/registry-stack.yaml
@@ -0,0 +1,131 @@
 # 1. 创建独立的命名空间
 apiVersion: v1
 kind: Namespace
 metadata:
  name: registry-system
 ---
 # 2. 将刚才生成的密码文件创建为 K8s Secret
 ---
 # 3. 申请硬盘空间 (存放镜像文件)
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: registry-pvc
  namespace: registry-system
 spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 20Gi  # 给仓库 20G 空间，不够随时可以扩
 ---
 # 4. 部署 Registry 应用
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: registry
  namespace: registry-system
 spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: registry
  template:
    metadata:
      labels:
        app: registry
    spec:
      containers:
      - name: registry
        image: registry:2
        ports:
        - containerPort: 5000
        env:
        # --- 开启认证 ---
        - name: REGISTRY_AUTH
          value: "htpasswd"
        - name: REGISTRY_AUTH_HTPASSWD_REALM
          value: "Registry Realm"
        - name: REGISTRY_AUTH_HTPASSWD_PATH
          value: "/auth/htpasswd"
        # --- 存储路径 ---
        - name: REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY
          value: "/var/lib/registry"
        volumeMounts:
        - name: data-volume
          mountPath: /var/lib/registry
        - name: auth-volume
          mountPath: /auth
      volumes:
      - name: data-volume
        persistentVolumeClaim:
          claimName: registry-pvc
      - name: auth-volume
        secret:
          secretName: registry-auth-secret
 ---
 # 5. 内部服务
 apiVersion: v1
 kind: Service
 metadata:
  name: registry-service
  namespace: registry-system
 spec:
  selector:
    app: registry
  ports:
    - protocol: TCP
      port: 80
      targetPort: 5000
 ---
 # 6. 暴露 HTTPS 域名
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: registry-ingress
  namespace: registry-system
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    # 增加上传大小限制 (Docker 镜像层可能很大)
    ingress.kubernetes.io/proxy-body-size: "0"
    nginx.ingress.kubernetes.io/proxy-body-size: "0"
    # CORS 配置 (允许 UI 访问 Registry API)
    traefik.ingress.kubernetes.io/router.middlewares: registry-system-cors-headers@kubernetescrd
 spec:
  rules:
  - host: registry.u6.net3w.com
    http:
      paths:
      # Registry API 路径 (优先级高，必须放在前面)
      - path: /v2
        pathType: Prefix
        backend:
          service:
            name: registry-service
            port:
              number: 80
      # UI 显示在根路径
      - path: /
        pathType: Prefix
        backend:
          service:
            name: registry-ui-service
            port:
              number: 80
  tls:
  - hosts:
    - registry.u6.net3w.com
    secretName: registry-tls-secret
--- a/002-infra/001-registry/registry-ui.yaml
+++ b/002-infra/001-registry/registry-ui.yaml
@@ -0,0 +1,84 @@
 # Joxit Docker Registry UI - 轻量级 Web 界面
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: registry-ui
  namespace: registry-system
 spec:
  replicas: 1
  selector:
    matchLabels:
      app: registry-ui
  template:
    metadata:
      labels:
        app: registry-ui
    spec:
      containers:
      - name: registry-ui
        image: joxit/docker-registry-ui:latest
        ports:
        - containerPort: 80
        env:
        # Registry API 地址（通过 nginx 代理，避免混合内容问题）
        - name: NGINX_PROXY_PASS_URL
          value: "http://registry-service.registry-system.svc.cluster.local"
        # 允许删除镜像
        - name: DELETE_IMAGES
          value: "true"
        # 显示内容摘要
        - name: SHOW_CONTENT_DIGEST
          value: "true"
        # 单个 registry 模式
        - name: SINGLE_REGISTRY
          value: "true"
        # Registry 标题
        - name: REGISTRY_TITLE
          value: "U9 Docker Registry"
        # 启用搜索功能
        - name: CATALOG_ELEMENTS_LIMIT
          value: "1000"
 ---
 # UI 服务
 apiVersion: v1
 kind: Service
 metadata:
  name: registry-ui-service
  namespace: registry-system
 spec:
  selector:
    app: registry-ui
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
 ---
 # 暴露 UI 到外网
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: registry-ui-ingress
  namespace: registry-system
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
 spec:
  rules:
  - host: registry-ui.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: registry-ui-service
            port:
              number: 80
  tls:
  - hosts:
    - registry-ui.u6.net3w.com
    secretName: registry-ui-tls-secret
--- a/002-infra/002-wordpress/01-mysql.yaml
+++ b/002-infra/002-wordpress/01-mysql.yaml
@@ -0,0 +1,72 @@
 # 01-mysql.yaml (新版)
 # --- 第一部分：申请一张硬盘券 (PVC) ---
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: mysql-pvc           # 记住这个券的名字
  namespace: demo-space
 spec:
  accessModes:
    - ReadWriteOnce         # 只能被一个节点读写
  storageClassName: longhorn # K3s 默认的存储驱动，利用 VPS 本地硬盘
  resources:
    requests:
      storage: 2Gi          # 申请 2GB 大小
 ---
 # --- 第二部分：数据库服务 (不变) ---
 apiVersion: v1
 kind: Service
 metadata:
  name: mysql-service
  namespace: demo-space
 spec:
  ports:
  - port: 3306
  selector:
    app: wordpress-mysql
 ---
 # --- 第三部分：部署数据库 (挂载硬盘) ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: wordpress-mysql
  namespace: demo-space
 spec:
  selector:
    matchLabels:
      app: wordpress-mysql
  strategy:
    type: Recreate # 有状态应用建议用 Recreate (先关旧的再开新的)
  template:
    metadata:
      labels:
        app: wordpress-mysql
    spec:
      containers:
      - image: mariadb:10.6.4-focal
        name: mysql
        env:
        - name: MYSQL_ROOT_PASSWORD
          value: "password123"
        - name: MYSQL_DATABASE
          value: "wordpress"
        - name: MYSQL_USER
          value: "wordpress"
        - name: MYSQL_PASSWORD
          value: "wordpress"
        ports:
        - containerPort: 3306
          name: mysql
        # ▼▼▼ 重点变化在这里 ▼▼▼
        volumeMounts:
        - name: mysql-store
          mountPath: /var/lib/mysql # 容器里数据库存文件的位置
      volumes:
      - name: mysql-store
        persistentVolumeClaim:
          claimName: mysql-pvc      # 使用上面定义的那张券
--- a/002-infra/002-wordpress/02-wordpress.yaml
+++ b/002-infra/002-wordpress/02-wordpress.yaml
@@ -0,0 +1,64 @@
 # 02-wordpress.yaml
 apiVersion: v1
 kind: Service
 metadata:
  name: wordpress-service
  namespace: demo-space
 spec:
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800  # 3 hours
  ports:
  - port: 80
  selector:
    app: wordpress
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: wordpress
  namespace: demo-space
 spec:
  replicas: 2  # 我们启动 2 个 WordPress 前台
  selector:
    matchLabels:
      app: wordpress
  template:
    metadata:
      labels:
        app: wordpress
    spec:
      containers:
      - image: wordpress:latest
        name: wordpress
        env:
        - name: WORDPRESS_DB_HOST
          value: "mysql-service"  # 魔法所在！直接填名字
        - name: WORDPRESS_DB_USER
          value: "wordpress"
        - name: WORDPRESS_DB_PASSWORD
          value: "wordpress"
        - name: WORDPRESS_DB_NAME
          value: "wordpress"
        - name: WORDPRESS_CONFIG_EXTRA
          value: |
            /* HTTPS behind reverse proxy - Complete configuration */
            if (isset($_SERVER['HTTP_X_FORWARDED_PROTO']) && $_SERVER['HTTP_X_FORWARDED_PROTO'] === 'https') {
              $_SERVER['HTTPS'] = 'on';
            }
            if (isset($_SERVER['HTTP_X_FORWARDED_HOST'])) {
              $_SERVER['HTTP_HOST'] = $_SERVER['HTTP_X_FORWARDED_HOST'];
            }
            /* Force SSL for admin */
            define('FORCE_SSL_ADMIN', true);
            /* Redis session storage for multi-replica support */
            @ini_set('session.save_handler', 'redis');
            @ini_set('session.save_path', 'tcp://redis-service:6379');
            /* Fix cookie issues */
            @ini_set('session.cookie_httponly', true);
            @ini_set('session.cookie_secure', true);
            @ini_set('session.use_only_cookies', true);
        ports:
        - containerPort: 80
          name: wordpress
--- a/002-infra/002-wordpress/03-ingress.yaml
+++ b/002-infra/002-wordpress/03-ingress.yaml
@@ -0,0 +1,31 @@
 # 03-ingress.yaml
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: wordpress-ingress
  namespace: demo-space
  annotations:
    # ▼▼▼ 关键注解：我要申请证书 ▼▼▼
    cert-manager.io/cluster-issuer: letsencrypt-prod
    # ▼▼▼ Traefik sticky session 配置 ▼▼▼
    traefik.ingress.kubernetes.io/affinity: "true"
    traefik.ingress.kubernetes.io/session-cookie-name: "wordpress-session"
 spec:
  rules:
  - host: blog.u6.net3w.com  # 您的域名
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: wordpress-service
            port:
              number: 80
  # ▼▼▼ 关键配置：证书存放在这个 Secret 里 ▼▼▼
  tls:
  - hosts:
    - blog.u6.net3w.com
    secretName: blog-tls-secret  # K3s 会自动创建这个 secret 并填入证书
--- a/002-infra/002-wordpress/04-redis.yaml
+++ b/002-infra/002-wordpress/04-redis.yaml
@@ -0,0 +1,40 @@
 # 04-redis.yaml - Redis for WordPress session storage
 apiVersion: v1
 kind: Service
 metadata:
  name: redis-service
  namespace: demo-space
 spec:
  ports:
  - port: 6379
    targetPort: 6379
  selector:
    app: redis
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: redis
  namespace: demo-space
 spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        ports:
        - containerPort: 6379
        resources:
          requests:
            memory: "64Mi"
            cpu: "100m"
          limits:
            memory: "128Mi"
            cpu: "200m"
--- a/002-infra/002-wordpress/Dockerfile
+++ b/002-infra/002-wordpress/Dockerfile
@@ -0,0 +1,8 @@
 # Custom WordPress image with Redis PHP extension
 FROM wordpress:latest
 # Install Redis PHP extension
 RUN pecl install redis && docker-php-ext-enable redis
 # Verify installation
 RUN php -m | grep redis
--- a/002-infra/002-wordpress/fd_反代3100/external-app.yaml
+++ b/002-infra/002-wordpress/fd_反代3100/external-app.yaml
@@ -0,0 +1,30 @@
 # 1. 定义一个“虚假”的服务，作为 K8s 内部的入口
 #
 # external-app.yaml (修正版)
 apiVersion: v1
 kind: Service
 metadata:
  name: host-app-service
  namespace: demo-space
 spec:
  ports:
    - name: http          # <--- Service 这里叫 http
      protocol: TCP
      port: 80
      targetPort: 3100
 ---
 apiVersion: v1
 kind: Endpoints
 metadata:
  name: host-app-service
  namespace: demo-space
 subsets:
  - addresses:
      - ip: 85.137.244.98
    ports:
      - port: 3100
        name: http        # <--- 【关键修改】这里必须也叫 http，才能配对成功！
--- a/002-infra/002-wordpress/fd_反代3100/external-ingress.yaml
+++ b/002-infra/002-wordpress/fd_反代3100/external-ingress.yaml
@@ -0,0 +1,25 @@
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: host-app-ingress
  namespace: demo-space
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    # ▼▼▼ 核心修复：添加这一行 ▼▼▼
    ingress.kubernetes.io/custom-response-headers: "Content-Security-Policy: upgrade-insecure-requests"
 spec:
  rules:
  - host: wt.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: host-app-service
            port:
              number: 80
  tls:
  - hosts:
    - wt.u6.net3w.com
    secretName: wt-tls-secret
--- a/002-infra/002-wordpress/issuer.yaml
+++ b/002-infra/002-wordpress/issuer.yaml
@@ -0,0 +1,16 @@
 apiVersion: cert-manager.io/v1
 kind: ClusterIssuer
 metadata:
  name: letsencrypt-prod
 spec:
  acme:
    # Let's Encrypt 的生产环境接口
    server: https://acme-v02.api.letsencrypt.org/directory
    # 填您的真实邮箱，证书过期前会发邮件提醒（虽然它会自动续期）
    email: fszy2021@gmail.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: traefik
--- a/002-infra/002-wordpress/longhorn-ingress.yaml
+++ b/002-infra/002-wordpress/longhorn-ingress.yaml
@@ -0,0 +1,27 @@
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: longhorn-ingress
  namespace: longhorn-system  # 注意：Longhorn 安装在这个命名空间
  annotations:
    # 1. 告诉 Cert-Manager：请用这个发证机构给我发证
    cert-manager.io/cluster-issuer: letsencrypt-prod
    # (可选) 强制 Traefik 使用 HTTPS 入口，但这行通常不需要，Traefik 会自动识别 TLS
    # traefik.ingress.kubernetes.io/router.entrypoints: websecure 
 spec:
  rules:
  - host: storage.u6.net3w.com   # 您的域名
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: longhorn-frontend
            port:
              number: 80
  # 2. 告诉 K3s：证书下载下来后，存在哪里
  tls:
  - hosts:
    - storage.u6.net3w.com
    secretName: longhorn-tls-secret  # 证书会自动保存在这个 Secret 里
--- a/002-infra/002-wordpress/php-apache.yaml
+++ b/002-infra/002-wordpress/php-apache.yaml
@@ -0,0 +1,37 @@
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: php-apache
  namespace: demo-space
 spec:
  selector:
    matchLabels:
      run: php-apache
  replicas: 1
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: registry.k8s.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          # 必须设置资源限制，HPA 才能计算百分比
          limits:
            cpu: 500m
          requests:
            cpu: 200m
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: php-apache
  namespace: demo-space
 spec:
  ports:
  - port: 80
  selector:
    run: php-apache
--- a/002-infra/003-n8n/n8n-stack.yaml
+++ b/002-infra/003-n8n/n8n-stack.yaml
@@ -0,0 +1,120 @@
 # 1. 独立的命名空间
 apiVersion: v1
 kind: Namespace
 metadata:
  name: n8n-system
 ---
 # 2. 数据持久化 (保存工作流和账号信息)
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: n8n-pvc
  namespace: n8n-system
 spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 5Gi
 ---
 # 3. 核心应用
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: n8n
  namespace: n8n-system
  labels:
    app: n8n
 spec:
  replicas: 1
  selector:
    matchLabels:
      app: n8n
  template:
    metadata:
      labels:
        app: n8n
    spec:
      securityContext:
        fsGroup: 1000
      containers:
      - name: n8n
        image: n8nio/n8n:latest
        securityContext:
          runAsUser: 1000
          runAsGroup: 1000
        ports:
        - containerPort: 5678
        env:
        # ▼▼▼ 关键配置 ▼▼▼
        - name: N8N_HOST
          value: "n8n.u6.net3w.com"
        - name: N8N_PORT
          value: "5678"
        - name: N8N_PROTOCOL
          value: "https"
        - name: WEBHOOK_URL
          value: "https://n8n.u6.net3w.com/"
        # 时区设置 (方便定时任务)
        - name: GENERIC_TIMEZONE
          value: "Asia/Shanghai"
        - name: TZ
          value: "Asia/Shanghai"
        # 禁用 n8n 的一些统计收集
        - name: N8N_DIAGNOSTICS_ENABLED
          value: "false"
        volumeMounts:
        - name: data
          mountPath: /home/node/.n8n
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: n8n-pvc
 ---
 # 4. 服务暴露
 apiVersion: v1
 kind: Service
 metadata:
  name: n8n-service
  namespace: n8n-system
 spec:
  selector:
    app: n8n
  ports:
    - protocol: TCP
      port: 80
      targetPort: 5678
 ---
 # 5. Ingress (自动 HTTPS)
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: n8n-ingress
  namespace: n8n-system
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
 spec:
  tls:
  - hosts:
    - n8n.u6.net3w.com
    secretName: n8n-tls
  rules:
  - host: n8n.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: n8n-service
            port:
              number: 80
--- a/002-infra/004-gitea/gitea-stack.yaml
+++ b/002-infra/004-gitea/gitea-stack.yaml
@@ -0,0 +1,109 @@
 # 1. 命名空间
 apiVersion: v1
 kind: Namespace
 metadata:
  name: gitea-system
 ---
 # 2. 数据持久化 (存放代码仓库和数据库)
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: gitea-data-pvc
  namespace: gitea-system
 spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn # 沿用你的 Longhorn
  resources:
    requests:
      storage: 10Gi
 ---
 # 3. 部署 Gitea 应用
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: gitea
  namespace: gitea-system
 spec:
  replicas: 1
  selector:
    matchLabels:
      app: gitea
  template:
    metadata:
      labels:
        app: gitea
    spec:
      containers:
      - name: gitea
        image: gitea/gitea:latest
        ports:
        - containerPort: 3000
          name: http
        - containerPort: 22
          name: ssh
        volumeMounts:
        - name: gitea-data
          mountPath: /data
        env:
        # 初始设置，避免手动改配置文件
        - name: GITEA__server__DOMAIN
          value: "git.u6.net3w.com"
        - name: GITEA__server__ROOT_URL
          value: "https://git.u6.net3w.com/"
        - name: GITEA__server__SSH_PORT
          value: "22" # 注意：通过 Ingress 访问时通常用 HTTPS，SSH 需要额外配置 NodePort，暂时先设为标准
      volumes:
      - name: gitea-data
        persistentVolumeClaim:
          claimName: gitea-data-pvc
 ---
 # 4. Service (内部网络)
 apiVersion: v1
 kind: Service
 metadata:
  name: gitea-service
  namespace: gitea-system
 spec:
  selector:
    app: gitea
  ports:
    - protocol: TCP
      port: 80
      targetPort: 3000
      name: http
    - protocol: TCP
      port: 2222 # 如果未来要用 SSH，可以映射这个端口
      targetPort: 22
      name: ssh
 ---
 # 5. Ingress (暴露 HTTPS 域名)
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: gitea-ingress
  namespace: gitea-system
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    # 允许大文件上传 (Git push 可能很大)
    nginx.ingress.kubernetes.io/proxy-body-size: "0"
 spec:
  rules:
  - host: git.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: gitea-service
            port:
              number: 80
  tls:
  - hosts:
    - git.u6.net3w.com
    secretName: gitea-tls-secret
--- a/002-infra/005-uptime-kuma/kuma-stack.yaml
+++ b/002-infra/005-uptime-kuma/kuma-stack.yaml
@@ -0,0 +1,97 @@
 # 1. 创建一个独立的命名空间，保持整洁
 apiVersion: v1
 kind: Namespace
 metadata:
  name: monitoring
 ---
 # 2. 申请一块 10GB 的硬盘 (使用 Longhorn)
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: kuma-pvc
  namespace: monitoring
 spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 2Gi
 ---
 # 3. 部署应用 (StatefulSet 也可以用 Deployment，单实例用 Deployment 足够)
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: uptime-kuma
  namespace: monitoring
 spec:
  replicas: 1
  selector:
    matchLabels:
      app: uptime-kuma
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: uptime-kuma
    spec:
      containers:
      - name: uptime-kuma
        image: louislam/uptime-kuma:1
        ports:
        - containerPort: 3001
        volumeMounts:
        - name: data
          mountPath: /app/data
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: kuma-pvc
 ---
 # 4. 创建内部服务
 apiVersion: v1
 kind: Service
 metadata:
  name: kuma-service
  namespace: monitoring
 spec:
  selector:
    app: uptime-kuma
  ports:
    - protocol: TCP
      port: 80
      targetPort: 3001
 ---
 # 5. 暴露到外网 (HTTPS + 域名)
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: kuma-ingress
  namespace: monitoring
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
 spec:
  rules:
  - host: status.u6.net3w.com   # <--- 您的新域名
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: kuma-service
            port:
              number: 80
  tls:
  - hosts:
    - status.u6.net3w.com
    secretName: status-tls-secret
--- a/002-infra/006-nav/nav-config.yaml
+++ b/002-infra/006-nav/nav-config.yaml
@@ -0,0 +1,62 @@
 apiVersion: v1
 kind: Namespace
 metadata:
  name: navigation
 ---
 # ▼▼▼ 核心知识点：ConfigMap ▼▼▼
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: homepage-config
  namespace: navigation
 data:
  # 配置文件 1: 定义小组件 (显示时间、搜索框、资源占用)
  widgets.yaml: |
    - search:
        provider: google
        target: _blank
    - resources:
        cpu: true
        memory: true
        disk: true
    - datetime:
        text_size: xl
        format:
          timeStyle: short
  # 配置文件 2: 定义您的服务链接 (请注意看下面的 icon 和 href)
  services.yaml: |
    - 我的应用:
        - 个人博客:
            icon: wordpress.png
            href: https://blog.u6.net3w.com
            description: 我的数字花园
        - 远程桌面:
            icon: linux.png
            href: https://wt.u6.net3w.com
            description: K8s 外部反代测试
    - 基础设施:
        - 状态监控:
            icon: uptime-kuma.png
            href: https://status.u6.net3w.com
            description: Uptime Kuma
            widget:
              type: uptimekuma
              url: http://kuma-service.monitoring.svc.cluster.local # ▼ 重点：K8s 内部 DNS
              slug: my-wordpress-blog  # (高级玩法：稍后填这个)
        - 存储管理:
            icon: longhorn.png
            href: https://storage.u6.net3w.com
            description: 分布式存储面板
            widget:
              type: longhorn
              url: http://longhorn-frontend.longhorn-system.svc.cluster.local
  # 配置文件 3: 基础设置
  settings.yaml: |
    title: K3s 指挥中心
    background: https://images.unsplash.com/photo-1519681393784-d120267933ba?auto=format&fit=crop&w=1920&q=80
    theme: dark
    color: slate
--- a/002-infra/006-nav/nav-deploy.yaml
+++ b/002-infra/006-nav/nav-deploy.yaml
@@ -0,0 +1,71 @@
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: homepage
  namespace: navigation
 spec:
  replicas: 1
  selector:
    matchLabels:
      app: homepage
  template:
    metadata:
      labels:
        app: homepage
    spec:
      containers:
      - name: homepage
        image: ghcr.io/gethomepage/homepage:latest
        ports:
        - containerPort: 3000
        # ▼▼▼ 关键动作：把 ConfigMap 挂载成文件 ▼▼▼
        volumeMounts:
        - name: config-volume
          mountPath: /app/config  # 容器里的配置目录
      volumes:
      - name: config-volume
        configMap:
          name: homepage-config   # 引用上面的 ConfigMap
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: homepage-service
  namespace: navigation
 spec:
  selector:
    app: homepage
  ports:
    - protocol: TCP
      port: 80
      targetPort: 3000
 ---
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: homepage-ingress
  namespace: navigation
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    # 开启这个可以允许跨域调用 (可选)
    nginx.ingress.kubernetes.io/enable-cors: "true"
 spec:
  rules:
  - host: nav.u6.net3w.com   # <--- 您的新域名
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: homepage-service
            port:
              number: 80
  tls:
  - hosts:
    - nav.u6.net3w.com
    secretName: nav-tls-secret
--- a/002-infra/007-argocd/argocd-app.yaml
+++ b/002-infra/007-argocd/argocd-app.yaml
@@ -0,0 +1,33 @@
 apiVersion: argoproj.io/v1alpha1
 kind: Application
 metadata:
  name: k3s-apps
  namespace: argocd
 spec:
  project: default
  # Git 仓库配置
  source:
    repoURL: https://git.u6.net3w.com/admin/k3s-configs.git
    targetRevision: HEAD
    path: k3s
  # 目标集群配置
  destination:
    server: https://kubernetes.default.svc
    namespace: default
  # 自动同步配置
  syncPolicy:
    automated:
      prune: true      # 自动删除 Git 中不存在的资源
      selfHeal: true   # 自动修复被手动修改的资源
      allowEmpty: false
    syncOptions:
    - CreateNamespace=true  # 自动创建命名空间
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m
--- a/002-infra/007-argocd/argocd-ingress.yaml
+++ b/002-infra/007-argocd/argocd-ingress.yaml
@@ -0,0 +1,24 @@
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: argocd-server-ingress
  namespace: argocd
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    traefik.ingress.kubernetes.io/router.entrypoints: websecure
 spec:
  tls:
  - hosts:
    - argocd.u6.net3w.com
    secretName: argocd-tls-secret
  rules:
  - host: argocd.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: argocd-server
            port:
              name: http
--- a/002-infra/008-rancher/rancher-ingress.yaml
+++ b/002-infra/008-rancher/rancher-ingress.yaml
@@ -0,0 +1,25 @@
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: rancher
  namespace: cattle-system
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    traefik.ingress.kubernetes.io/router.entrypoints: websecure
 spec:
  ingressClassName: traefik
  tls:
  - hosts:
    - rancher.u6.net3w.com
    secretName: rancher-tls
  rules:
  - host: rancher.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: rancher
            port:
              number: 80
--- a/002-infra/009-portainer/portainer-ingress.yaml
+++ b/002-infra/009-portainer/portainer-ingress.yaml
@@ -0,0 +1,23 @@
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: portainer-ingress
  namespace: portainer
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
 spec:
  tls:
  - hosts:
    - portainer.u6.net3w.com   # <--- 换个新子域名
    secretName: portainer-tls
  rules:
  - host: portainer.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: portainer
            port:
              number: 9000
--- a/002-infra/009-portainer/portainer.md
+++ b/002-infra/009-portainer/portainer.md
@@ -0,0 +1,148 @@
 这是一个非常棒的想法！天天面对黑底白字的命令行确实容易累，而且在查看日志流、多 Pod 资源占用时，图形化界面（GUI）有着无法比拟的优势。
 对于 K3s，我有 **三个最强推荐**，分别对应不同的使用场景。
 ---
 ### 方案一：最强桌面端神器 —— OpenLens (推荐开发使用)
 如果您希望在**自己的电脑（Windows/Mac）**上管理远程的 K3s，而不占用 VPS 的资源，Lens 是绝对的王者。它就像是一个专门用来操作 K8s 的“远程桌面”。
 **优点：**
 * **零资源占用**：运行在您自己电脑上，不消耗 VPS 内存。
 * **功能极其强大**：查看日志、进入终端、编辑 YAML、查看监控，点击即达。
 * **体验流畅**：原生应用，比网页版更丝滑。
 #### 🚀 如何连接 (只需 3 步)
 要让本地的 Lens 连上远程的 K3s，您需要把 VPS 上的“钥匙”拿回来。
 1. **在 VPS 上获取 Kubeconfig**：
 执行命令查看配置文件内容：
 ```bash
 cat /etc/rancher/k3s/k3s.yaml
 ```
 *(复制这一大段内容)*
 2. **在本地电脑创建配置文件**：
 在您的电脑上新建一个文件（例如 `my-k3s-config`），粘贴刚才的内容。
 **⚠️ 关键修改**：找到 `server: https://127.0.0.1:6443` 这一行，把 `127.0.0.1` 改成您的 **域名** (比如 `dev.u6.net3w.com`，前提是 6443 端口已开放) 或者 **VPS 公网 IP**。
 3. **导入 Lens**：
 下载并安装 **OpenLens** (免费开源版) 或 **Lens Desktop**。
 打开软件 -> File -> Add Cluster -> 粘贴刚才修改后的配置内容。
 **瞬间，您的 K3s 集群全貌就会展现在眼前！**
 ---
 ### 方案二：最强网页版 —— Portainer (推荐运维/监控)
 如果您希望打开浏览器就能管理，而且您习惯了 Docker 的管理方式，那么 **Portainer** 是不二之选。它对 K8s 的支持非常友好，界面简洁漂亮。
 **优点：**
 * **随时随地访问**：有浏览器就能管。
 * **可视化极佳**：有漂亮的仪表盘，能看到 CPU/内存使用率。
 * **部署简单**：它自己就是 K3s 里的一个 Pod。
 #### 🚀 如何安装 (30秒搞定)
 直接在您的 VPS 上执行：
 1. **安装 Portainer**：
 ```bash
 kubectl apply -n portainer -f https://raw.githubusercontent.com/portainer/k8s/master/deploy/manifests/portainer/portainer.yaml
 ```
 *(这一步会自动创建命名空间和所有资源)*
 2. **暴露端口 (我们要用 NodePort 访问)**：
 默认 Portainer 用的是 LoadBalancer，我们改一下让它能直接通过端口访问。
 ```bash
 # 编辑服务配置
 kubectl edit service portainer -n portainer
 ```
 找到 `type: LoadBalancer` 改为 `type: NodePort`。
 或者，既然您已经玩得很溜了，**建议直接写一个 Ingress** 给它！
 **创建 `portainer-ingress.yaml`：**
 ```yaml
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: portainer-ingress
  namespace: portainer
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
 spec:
  tls:
  - hosts:
    - portainer.u6.net3w.com   # <--- 换个新子域名
    secretName: portainer-tls
  rules:
  - host: portainer.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: portainer
            port:
              number: 9000
 ```
 3. **应用并访问**：
 `kubectl apply -f portainer-ingress.yaml`。
 稍等片刻，访问 `https://portainer.u6.net3w.com`，设置管理员密码即可。
 ---
 ### 方案三：官方全家桶 —— Rancher (企业级)
 K3s 是 Rancher Labs 开发的，Rancher 是他们的旗舰管理平台。
 **优点：** 功能最全，企业级标准，可以管理成百上千个集群。
 **缺点：** **非常吃内存**。Rancher 本身启动就需要占用 2G+ 内存。如果您的 VPS 只有 4G 或更少，**强烈不推荐**，它会把您的业务挤垮。
 ---
 ### 💡 极客的选择：K9s (终端里的 GUI)
 如果您还是离不开终端，但想要图形化的便利，**K9s** 是必备神器。它是一个运行在终端里的图形界面。
 **安装：**
 ```bash
 # 在 VPS 上安装 Linuxbrew 后安装，或者直接下载二进制文件
 curl -sS https://webinstall.dev/k9s | bash
 ```
 **启动：**
 输入 `k9s`。
 您可以像打游戏一样：
 * 按 `0` 显示所有 Pod。
 * 按 `l` 看日志。
 * 按 `s` 进入 Shell。
 * 按 `ctrl-d` 删除 Pod。
 * 全键盘操作，酷炫到没朋友。
 ---
 ### 我的建议
 1. **首选方案**：先在本地安装 **OpenLens**。不用在 VPS 上装任何东西，无负担，体验最好。
 2. **次选方案**：如果经常需要在不同电脑上管理，在 VPS 上部署一个 **Portainer**。
 **您想先试试哪一个？如果是 OpenLens，我可以教您怎么把 VPS 的 6443 端口安全地放行。**
--- a/002-infra/009-portainer/rancher/note.md
+++ b/002-infra/009-portainer/rancher/note.md
@@ -0,0 +1,98 @@
 **有眼光！** 既然您的 VPS 配置不错（之前提到有 16GB 内存），跑 Rancher 是**绝对最爽**的选择。它就像给您的 Kubernetes 集群装了一个“Windows 操作系统”，所有操作都能点点鼠标完成。
 虽然您喜欢用 YAML 硬编码（这很好），但在安装 Rancher 这种巨型软件时，**Helm**（Kubernetes 的包管理器，类似于 Python 的 pip 或 Linux 的 apt）是唯一标准且靠谱的途径。手动写 YAML 安装 Rancher 会涉及到几百个文件，极易出错。
 请按照以下步骤，把这头“牛”（Rancher 的 Logo）牵进您的集群。
 ---
 ### 🚀 第一步：安装 Helm (如果您还没有)
 Helm 是安装 Rancher 的必备工具。在 Node 1 上执行：
 ```bash
 curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
 ```
 ### 第二步：添加 Rancher 仓库
 告诉 Helm 去哪里下载 Rancher。
 ```bash
 # 1. 添加仓库
 helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
 # 2. 创建 Rancher 专属的命名空间
 kubectl create namespace cattle-system
 ```
 ### 第三步：一键安装 Rancher (关键)
 我们要安装 Rancher，并让它**自动利用**您集群里已经存在的 `cert-manager` 来申请 `rancher.u6.net3w.com` 的 HTTPS 证书。
 请**直接复制**下面这整段命令（我已经帮您把参数配好了）：
 * **域名**: `rancher.u6.net3w.com`
 * **初始密码**: `admin123456` (进去后强制要求改)
 * **副本数**: `1` (为了节省资源，生产环境通常用 3，但 1 个也够用)
 * **证书**: 使用 Let's Encrypt 自动生成
 ```bash
 helm install rancher rancher-latest/rancher \
  --namespace cattle-system \
  --set hostname=rancher.u6.net3w.com \
  --set bootstrapPassword=admin123456 \
  --set replicas=1 \
  --set ingress.tls.source=letsEncrypt \
  --set letsEncrypt.email=admin@net3w.com \
  --set letsEncrypt.ingress.class=traefik
 ```
 ---
 ### 第四步：等待“巨兽”苏醒
 Rancher 是一个庞大的 Java/Go 应用，启动需要一点时间（通常 2-5 分钟）。
 您可以像看电影一样监控它的启动过程：
 ```bash
 watch kubectl get pods -n cattle-system
 ```
 * **阶段 1**: 您会看到 `rancher-webhook` 先启动。
 * **阶段 2**: 接着 `rancher-xxxx` 主 Pod 开始 `ContainerCreating`。
 * **阶段 3**: 当 `rancher-xxxx` 状态变为 **`Running`** 且 **`READY 1/1`** 时，说明启动成功。
 *(如果 Ready 很久都是 0/1，不要急，那是它在做数据库初始化，只要不报错 CrashLoopBackOff 就等着)*
 ---
 ### 第五步：登录并接管一切
 一旦 Pod 跑起来，证书也签发好（Ingress 生效）后：
 1. 打开浏览器访问：**[https://rancher.u6.net3w.com](https://www.google.com/search?q=https://rancher.u6.net3w.com)**
 2. **密码**：输入刚才设置的 `admin123456`。
 3. **接管**：
 * 登录后，它会提示您设置一个新的复杂密码。
 * 然后您会看到一个名为 **"local"** 的集群 —— **这就是您的 K3s 集群！**
 ---
 ### 🎉 您现在可以用图形界面做什么？
 进入 "local" 集群后，您可以尝试以下操作，体验一下为什么我说它是“神器”：
 1. **看工作负载**：点击左侧菜单 **Workload (工作负载)**，您会看到之前的 `my-python-app`, `registry`, `wordpress` 全都在列表里，CPU/内存占用一目了然。
 2. **查看日志**：点进 `my-python-app` -> 点击 Pod 名字 -> 右上角三个点 -> **View Logs**。界面非常舒服，支持实时滚动。
 3. **进入终端**：同上，点击 **Execute Shell**，直接在浏览器里获得一个 root 终端，再也不用 SSH 连来连去了。
 4. **拖拽扩容**：找到 Deployment，点一下 `+` 号，瞬间从 1 个 Pod 变成 5 个 Pod，感受 Kubernetes 的弹性。
 **快去试试吧！如果安装过程中卡在 Pulling Image 或者 Pending，请告诉我 `kubectl get pods -n cattle-system` 的截图。**
--- a/003-platform/003-proxy/1go-proxy.yaml
+++ b/003-platform/003-proxy/1go-proxy.yaml
@@ -0,0 +1,59 @@
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: localhost-15001
  namespace: default
 spec:
  ports:
  - protocol: TCP
    port: 80
    targetPort: 15001
 ---
 apiVersion: v1
 kind: Endpoints
 metadata:
  name: localhost-15001
  namespace: default
 subsets:
 - addresses:
  - ip: 134.195.210.237
  ports:
  - port: 15001
 ---
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: 1go-proxy
  namespace: default
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    traefik.ingress.kubernetes.io/router.entrypoints: websecure
 spec:
  ingressClassName: traefik
  tls:
  - hosts:
    - 1go.u6.net3w.com
    - gl.u6.net3w.com
    secretName: 1go-proxy-tls
  rules:
  - host: 1go.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: localhost-15001
            port:
              number: 80
  - host: gl.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: localhost-15001
            port:
              number: 80
--- a/004-scripts/cluster-management/check-node-health.sh
+++ b/004-scripts/cluster-management/check-node-health.sh
@@ -0,0 +1,84 @@
 #!/bin/bash
 #
 # 节点健康检查脚本
 # 使用方法: bash check-node-health.sh
 #
 # 颜色输出
 RED='\033[0;31m'
 GREEN='\033[0;32m'
 YELLOW='\033[1;33m'
 BLUE='\033[0;34m'
 NC='\033[0m' # No Color
 echo -e "${BLUE}================================${NC}"
 echo -e "${BLUE}K3s 集群健康检查${NC}"
 echo -e "${BLUE}================================${NC}"
 echo ""
 # 1. 检查节点状态
 echo -e "${YELLOW}[1/8] 检查节点状态...${NC}"
 kubectl get nodes -o wide
 echo ""
 # 2. 检查节点资源
 echo -e "${YELLOW}[2/8] 检查节点资源使用...${NC}"
 kubectl top nodes 2>/dev/null || echo -e "${YELLOW}⚠ metrics-server 未就绪${NC}"
 echo ""
 # 3. 检查系统 Pods
 echo -e "${YELLOW}[3/8] 检查系统组件...${NC}"
 kubectl get pods -n kube-system
 echo ""
 # 4. 检查 Longhorn
 echo -e "${YELLOW}[4/8] 检查 Longhorn 存储...${NC}"
 kubectl get pods -n longhorn-system | head -10
 echo ""
 # 5. 检查 PVC
 echo -e "${YELLOW}[5/8] 检查持久化存储卷...${NC}"
 kubectl get pvc -A
 echo ""
 # 6. 检查应用 Pods
 echo -e "${YELLOW}[6/8] 检查应用 Pods...${NC}"
 kubectl get pods -A | grep -v "kube-system\|longhorn-system\|cert-manager" | head -20
 echo ""
 # 7. 检查 Ingress
 echo -e "${YELLOW}[7/8] 检查 Ingress 配置...${NC}"
 kubectl get ingress -A
 echo ""
 # 8. 检查证书
 echo -e "${YELLOW}[8/8] 检查 SSL 证书...${NC}"
 kubectl get certificate -A
 echo ""
 # 统计信息
 echo -e "${BLUE}================================${NC}"
 echo -e "${BLUE}集群统计信息${NC}"
 echo -e "${BLUE}================================${NC}"
 TOTAL_NODES=$(kubectl get nodes --no-headers | wc -l)
 READY_NODES=$(kubectl get nodes --no-headers | grep " Ready " | wc -l)
 TOTAL_PODS=$(kubectl get pods -A --no-headers | wc -l)
 RUNNING_PODS=$(kubectl get pods -A --no-headers | grep "Running" | wc -l)
 TOTAL_PVC=$(kubectl get pvc -A --no-headers | wc -l)
 BOUND_PVC=$(kubectl get pvc -A --no-headers | grep "Bound" | wc -l)
 echo -e "节点总数: ${GREEN}${TOTAL_NODES}${NC} (就绪: ${GREEN}${READY_NODES}${NC})"
 echo -e "Pod 总数: ${GREEN}${TOTAL_PODS}${NC} (运行中: ${GREEN}${RUNNING_PODS}${NC})"
 echo -e "PVC 总数: ${GREEN}${TOTAL_PVC}${NC} (已绑定: ${GREEN}${BOUND_PVC}${NC})"
 echo ""
 # 健康评分
 if [ $READY_NODES -eq $TOTAL_NODES ] && [ $RUNNING_PODS -gt $((TOTAL_PODS * 80 / 100)) ]; then
    echo -e "${GREEN}✓ 集群健康状态: 良好${NC}"
 elif [ $READY_NODES -gt $((TOTAL_NODES / 2)) ]; then
    echo -e "${YELLOW}⚠ 集群健康状态: 一般${NC}"
 else
    echo -e "${RED}✗ 集群健康状态: 异常${NC}"
 fi
 echo ""
--- a/004-scripts/cluster-management/generate-join-script.sh
+++ b/004-scripts/cluster-management/generate-join-script.sh
@@ -0,0 +1,113 @@
 #!/bin/bash
 #
 # 快速配置脚本生成器
 # 为新节点生成定制化的加入脚本
 #
 # 颜色输出
 GREEN='\033[0;32m'
 YELLOW='\033[1;33m'
 NC='\033[0m'
 echo -e "${GREEN}================================${NC}"
 echo -e "${GREEN}K3s 节点加入脚本生成器${NC}"
 echo -e "${GREEN}================================${NC}"
 echo ""
 # 获取当前配置
 MASTER_IP="134.195.210.237"
 NODE_TOKEN="K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d"
 echo -e "${YELLOW}当前 Master 节点信息:${NC}"
 echo "IP: $MASTER_IP"
 echo "Token: ${NODE_TOKEN:0:20}..."
 echo ""
 # 选择节点类型
 echo "请选择要加入的节点类型:"
 echo "1) Worker 节点 (推荐用于 2 节点方案)"
 echo "2) Master 节点 (用于 HA 高可用方案)"
 echo ""
 read -p "请输入选项 (1 或 2): " NODE_TYPE
 if [ "$NODE_TYPE" == "1" ]; then
    SCRIPT_NAME="join-worker-custom.sh"
    echo ""
    echo -e "${GREEN}生成 Worker 节点加入脚本...${NC}"
    cat > $SCRIPT_NAME << 'EOFWORKER'
 #!/bin/bash
 set -e
 # 配置信息
 MASTER_IP="134.195.210.237"
 NODE_TOKEN="K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d"
 echo "开始加入 Worker 节点..."
 # 系统准备
 swapoff -a
 sed -i '/ swap / s/^/#/' /etc/fstab
 apt-get update -qq
 apt-get install -y curl open-iscsi nfs-common
 systemctl enable --now iscsid
 # 安装 k3s agent
 curl -sfL https://get.k3s.io | K3S_URL=https://${MASTER_IP}:6443 \
    K3S_TOKEN=${NODE_TOKEN} sh -
 echo "Worker 节点加入完成！"
 echo "在 Master 节点执行: kubectl get nodes"
 EOFWORKER
    chmod +x $SCRIPT_NAME
 elif [ "$NODE_TYPE" == "2" ]; then
    SCRIPT_NAME="join-master-custom.sh"
    echo ""
    read -p "请输入负载均衡器 IP: " LB_IP
    echo -e "${GREEN}生成 Master 节点加入脚本...${NC}"
    cat > $SCRIPT_NAME << EOFMASTER
 #!/bin/bash
 set -e
 # 配置信息
 FIRST_MASTER_IP="134.195.210.237"
 LB_IP="$LB_IP"
 NODE_TOKEN="K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d"
 echo "开始加入 Master 节点 (HA 模式)..."
 # 系统准备
 swapoff -a
 sed -i '/ swap / s/^/#/' /etc/fstab
 apt-get update -qq
 apt-get install -y curl open-iscsi nfs-common
 systemctl enable --now iscsid
 # 安装 k3s server
 curl -sfL https://get.k3s.io | sh -s - server \\
    --server https://\${FIRST_MASTER_IP}:6443 \\
    --token \${NODE_TOKEN} \\
    --tls-san=\${LB_IP} \\
    --write-kubeconfig-mode 644
 echo "Master 节点加入完成！"
 echo "在任意 Master 节点执行: kubectl get nodes"
 EOFMASTER
    chmod +x $SCRIPT_NAME
 else
    echo "无效的选项"
    exit 1
 fi
 echo ""
 echo -e "${GREEN}✓ 脚本已生成: $SCRIPT_NAME${NC}"
 echo ""
 echo "使用方法:"
 echo "1. 将脚本复制到新节点"
 echo "2. 在新节点上执行: sudo bash $SCRIPT_NAME"
 echo ""
--- a/004-scripts/cluster-management/join-master.sh
+++ b/004-scripts/cluster-management/join-master.sh
@@ -0,0 +1,137 @@
 #!/bin/bash
 #
 # K3s Master 节点快速加入脚本 (用于 HA 集群)
 # 使用方法: sudo bash join-master.sh
 #
 set -e
 # 颜色输出
 RED='\033[0;31m'
 GREEN='\033[0;32m'
 YELLOW='\033[1;33m'
 NC='\033[0m' # No Color
 echo -e "${GREEN}================================${NC}"
 echo -e "${GREEN}K3s Master 节点加入脚本 (HA)${NC}"
 echo -e "${GREEN}================================${NC}"
 echo ""
 # 检查是否为 root
 if [ "$EUID" -ne 0 ]; then
    echo -e "${RED}错误: 请使用 sudo 运行此脚本${NC}"
    exit 1
 fi
 # 配置信息
 FIRST_MASTER_IP="134.195.210.237"
 NODE_TOKEN="K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d"
 echo -e "${YELLOW}第一个 Master 节点 IP: ${FIRST_MASTER_IP}${NC}"
 echo ""
 # 获取负载均衡器 IP
 read -p "请输入负载均衡器 IP 地址: " LB_IP
 if [ -z "$LB_IP" ]; then
    echo -e "${RED}错误: 负载均衡器 IP 不能为空${NC}"
    exit 1
 fi
 echo -e "${YELLOW}负载均衡器 IP: ${LB_IP}${NC}"
 echo ""
 # 1. 检查网络连通性
 echo -e "${YELLOW}[1/6] 检查网络连通性...${NC}"
 if ping -c 2 ${FIRST_MASTER_IP} > /dev/null 2>&1; then
    echo -e "${GREEN}✓ 可以连接到第一个 Master 节点${NC}"
 else
    echo -e "${RED}✗ 无法连接到第一个 Master 节点 ${FIRST_MASTER_IP}${NC}"
    exit 1
 fi
 if ping -c 2 ${LB_IP} > /dev/null 2>&1; then
    echo -e "${GREEN}✓ 可以连接到负载均衡器${NC}"
 else
    echo -e "${RED}✗ 无法连接到负载均衡器 ${LB_IP}${NC}"
    exit 1
 fi
 # 2. 检查端口
 echo -e "${YELLOW}[2/6] 检查端口...${NC}"
 if timeout 5 bash -c "cat < /dev/null > /dev/tcp/${FIRST_MASTER_IP}/6443" 2>/dev/null; then
    echo -e "${GREEN}✓ Master 节点端口 6443 可访问${NC}"
 else
    echo -e "${RED}✗ Master 节点端口 6443 无法访问${NC}"
    exit 1
 fi
 # 3. 系统准备
 echo -e "${YELLOW}[3/6] 准备系统环境...${NC}"
 # 禁用 swap
 swapoff -a
 sed -i '/ swap / s/^/#/' /etc/fstab
 echo -e "${GREEN}✓ 已禁用 swap${NC}"
 # 安装依赖
 apt-get update -qq
 apt-get install -y curl open-iscsi nfs-common > /dev/null 2>&1
 systemctl enable --now iscsid > /dev/null 2>&1
 echo -e "${GREEN}✓ 已安装必要依赖${NC}"
 # 4. 设置主机名
 echo -e "${YELLOW}[4/6] 配置主机名...${NC}"
 read -p "请输入此节点的主机名 (例如: master-2): " HOSTNAME
 if [ -n "$HOSTNAME" ]; then
    hostnamectl set-hostname $HOSTNAME
    echo -e "${GREEN}✓ 主机名已设置为: $HOSTNAME${NC}"
 else
    echo -e "${YELLOW}⚠ 跳过主机名设置${NC}"
 fi
 # 5. 安装 k3s server
 echo -e "${YELLOW}[5/6] 安装 k3s server (HA 模式)...${NC}"
 echo -e "${YELLOW}这可能需要几分钟时间...${NC}"
 curl -sfL https://get.k3s.io | sh -s - server \
    --server https://${FIRST_MASTER_IP}:6443 \
    --token ${NODE_TOKEN} \
    --tls-san=${LB_IP} \
    --write-kubeconfig-mode 644 > /dev/null 2>&1
 if [ $? -eq 0 ]; then
    echo -e "${GREEN}✓ k3s server 安装成功${NC}"
 else
    echo -e "${RED}✗ k3s server 安装失败${NC}"
    exit 1
 fi
 # 6. 验证安装
 echo -e "${YELLOW}[6/6] 验证安装...${NC}"
 sleep 15
 if systemctl is-active --quiet k3s; then
    echo -e "${GREEN}✓ k3s 服务运行正常${NC}"
 else
    echo -e "${RED}✗ k3s 服务未运行${NC}"
    echo -e "${YELLOW}查看日志: sudo journalctl -u k3s -f${NC}"
    exit 1
 fi
 echo ""
 echo -e "${GREEN}================================${NC}"
 echo -e "${GREEN}✓ Master 节点加入成功！${NC}"
 echo -e "${GREEN}================================${NC}"
 echo ""
 echo -e "${YELLOW}下一步操作:${NC}"
 echo -e "1. 在任意 Master 节点执行以下命令查看节点状态:"
 echo -e "   ${GREEN}kubectl get nodes${NC}"
 echo ""
 echo -e "2. 检查 etcd 集群状态:"
 echo -e "   ${GREEN}kubectl get pods -n kube-system | grep etcd${NC}"
 echo ""
 echo -e "3. 查看节点详细信息:"
 echo -e "   ${GREEN}kubectl describe node $HOSTNAME${NC}"
 echo ""
 echo -e "4. 更新负载均衡器配置，添加此节点的 IP"
 echo ""
--- a/004-scripts/cluster-management/join-worker.sh
+++ b/004-scripts/cluster-management/join-worker.sh
@@ -0,0 +1,116 @@
 #!/bin/bash
 #
 # K3s Worker 节点快速加入脚本
 # 使用方法: sudo bash join-worker.sh
 #
 set -e
 # 颜色输出
 RED='\033[0;31m'
 GREEN='\033[0;32m'
 YELLOW='\033[1;33m'
 NC='\033[0m' # No Color
 echo -e "${GREEN}================================${NC}"
 echo -e "${GREEN}K3s Worker 节点加入脚本${NC}"
 echo -e "${GREEN}================================${NC}"
 echo ""
 # 检查是否为 root
 if [ "$EUID" -ne 0 ]; then
    echo -e "${RED}错误: 请使用 sudo 运行此脚本${NC}"
    exit 1
 fi
 # 配置信息
 MASTER_IP="134.195.210.237"
 NODE_TOKEN="K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d"
 echo -e "${YELLOW}Master 节点 IP: ${MASTER_IP}${NC}"
 echo ""
 # 1. 检查网络连通性
 echo -e "${YELLOW}[1/6] 检查网络连通性...${NC}"
 if ping -c 2 ${MASTER_IP} > /dev/null 2>&1; then
    echo -e "${GREEN}✓ 网络连通正常${NC}"
 else
    echo -e "${RED}✗ 无法连接到 Master 节点 ${MASTER_IP}${NC}"
    exit 1
 fi
 # 2. 检查端口
 echo -e "${YELLOW}[2/6] 检查 Master 节点端口 6443...${NC}"
 if timeout 5 bash -c "cat < /dev/null > /dev/tcp/${MASTER_IP}/6443" 2>/dev/null; then
    echo -e "${GREEN}✓ 端口 6443 可访问${NC}"
 else
    echo -e "${RED}✗ 端口 6443 无法访问，请检查防火墙${NC}"
    exit 1
 fi
 # 3. 系统准备
 echo -e "${YELLOW}[3/6] 准备系统环境...${NC}"
 # 禁用 swap
 swapoff -a
 sed -i '/ swap / s/^/#/' /etc/fstab
 echo -e "${GREEN}✓ 已禁用 swap${NC}"
 # 安装依赖
 apt-get update -qq
 apt-get install -y curl open-iscsi nfs-common > /dev/null 2>&1
 systemctl enable --now iscsid > /dev/null 2>&1
 echo -e "${GREEN}✓ 已安装必要依赖${NC}"
 # 4. 设置主机名
 echo -e "${YELLOW}[4/6] 配置主机名...${NC}"
 read -p "请输入此节点的主机名 (例如: worker-1): " HOSTNAME
 if [ -n "$HOSTNAME" ]; then
    hostnamectl set-hostname $HOSTNAME
    echo -e "${GREEN}✓ 主机名已设置为: $HOSTNAME${NC}"
 else
    echo -e "${YELLOW}⚠ 跳过主机名设置${NC}"
 fi
 # 5. 安装 k3s agent
 echo -e "${YELLOW}[5/6] 安装 k3s agent...${NC}"
 echo -e "${YELLOW}这可能需要几分钟时间...${NC}"
 curl -sfL https://get.k3s.io | K3S_URL=https://${MASTER_IP}:6443 \
    K3S_TOKEN=${NODE_TOKEN} \
    sh - > /dev/null 2>&1
 if [ $? -eq 0 ]; then
    echo -e "${GREEN}✓ k3s agent 安装成功${NC}"
 else
    echo -e "${RED}✗ k3s agent 安装失败${NC}"
    exit 1
 fi
 # 6. 验证安装
 echo -e "${YELLOW}[6/6] 验证安装...${NC}"
 sleep 10
 if systemctl is-active --quiet k3s-agent; then
    echo -e "${GREEN}✓ k3s-agent 服务运行正常${NC}"
 else
    echo -e "${RED}✗ k3s-agent 服务未运行${NC}"
    echo -e "${YELLOW}查看日志: sudo journalctl -u k3s-agent -f${NC}"
    exit 1
 fi
 echo ""
 echo -e "${GREEN}================================${NC}"
 echo -e "${GREEN}✓ Worker 节点加入成功！${NC}"
 echo -e "${GREEN}================================${NC}"
 echo ""
 echo -e "${YELLOW}下一步操作:${NC}"
 echo -e "1. 在 Master 节点执行以下命令查看节点状态:"
 echo -e "   ${GREEN}kubectl get nodes${NC}"
 echo ""
 echo -e "2. 为节点添加标签 (在 Master 节点执行):"
 echo -e "   ${GREEN}kubectl label nodes $HOSTNAME node-role.kubernetes.io/worker=worker${NC}"
 echo ""
 echo -e "3. 查看节点详细信息:"
 echo -e "   ${GREEN}kubectl describe node $HOSTNAME${NC}"
 echo ""
--- a/004-scripts/project-tools/project-status.sh
+++ b/004-scripts/project-tools/project-status.sh
@@ -0,0 +1,88 @@
 #!/bin/bash
 # 项目状态检查脚本
 # 扫描仓库并显示项目状态、部署情况、文档完整性等
 echo "╔════════════════════════════════════════════════════════════════╗"
 echo "║           K3s Monorepo - 项目状态                              ║"
 echo "╚════════════════════════════════════════════════════════════════╝"
 echo ""
 # 检查已部署的应用
 echo "📦 已部署应用:"
 echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
 if command -v kubectl &> /dev/null; then
    kubectl get deployments -A 2>/dev/null | grep -E "(php-test|go01|wordpress|registry|n8n|gitea)" | \
      awk '{printf "  ✅ %-25s %-15s %s/%s replicas\n", $2, $1, $4, $3}' || echo "  ⚠️  无法获取部署信息"
 else
    echo "  ⚠️  kubectl 未安装，无法检查部署状态"
 fi
 echo ""
 echo "📱 应用项目:"
 echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
 # 检查每个应用目录
 for dir in php-test go01 rails/*/ www; do
  if [ -d "$dir" ]; then
    name=$(basename "$dir")
    readme=""
    dockerfile=""
    k8s=""
    [ -f "$dir/README.md" ] && readme="📄" || readme="  "
    [ -f "$dir/Dockerfile" ] && dockerfile="🐳" || dockerfile="  "
    [ -d "$dir/k8s" ] || [ -f "$dir/k8s-deployment.yaml" ] && k8s="☸️ " || k8s="  "
    printf "  %-30s %s %s %s\n" "$name" "$readme" "$dockerfile" "$k8s"
  fi
 done
 echo ""
 echo "🏗️  基础设施服务:"
 echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
 for dir in k3s/*/; do
  if [ -d "$dir" ]; then
    name=$(basename "$dir")
    yaml_count=$(find "$dir" -name "*.yaml" 2>/dev/null | wc -l)
    printf "  %-30s %2d YAML 文件\n" "$name" "$yaml_count"
  fi
 done
 echo ""
 echo "🛠️  平台工具:"
 echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
 for dir in traefik kuboard proxy; do
  if [ -d "$dir" ]; then
    yaml_count=$(find "$dir" -name "*.yaml" 2>/dev/null | wc -l)
    printf "  %-30s %2d YAML 文件\n" "$dir" "$yaml_count"
  fi
 done
 echo ""
 echo "📊 统计信息:"
 echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
 total_yaml=$(find . -name "*.yaml" -type f 2>/dev/null | wc -l)
 total_md=$(find . -name "*.md" -type f 2>/dev/null | wc -l)
 total_sh=$(find . -name "*.sh" -type f 2>/dev/null | wc -l)
 total_dockerfile=$(find . -name "Dockerfile" -type f 2>/dev/null | wc -l)
 echo "  YAML 配置文件: $total_yaml"
 echo "  Markdown 文档: $total_md"
 echo "  Shell 脚本: $total_sh"
 echo "  Dockerfile: $total_dockerfile"
 echo ""
 echo "💡 提示:"
 echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
 echo "  📄 = 有 README 文档"
 echo "  🐳 = 有 Dockerfile"
 echo "  ☸️  = 有 Kubernetes 配置"
 echo ""
 echo "  查看详细信息: cat PROJECT-INDEX.md"
 echo "  查看目录结构: ./scripts/project-tree.sh"
 echo "  查看集群状态: make status"
 echo ""
--- a/004-scripts/project-tools/project-tree.sh
+++ b/004-scripts/project-tools/project-tree.sh
@@ -0,0 +1,59 @@
 #!/bin/bash
 # 目录树生成脚本
 # 生成清晰的项目目录结构，过滤掉不必要的文件
 echo "╔════════════════════════════════════════════════════════════════╗"
 echo "║           K3s Monorepo - 目录结构                              ║"
 echo "╚════════════════════════════════════════════════════════════════╝"
 echo ""
 # 检查 tree 命令是否存在
 if ! command -v tree &> /dev/null; then
    echo "⚠️  tree 命令未安装"
    echo ""
    echo "安装方法:"
    echo "  Ubuntu/Debian: sudo apt-get install tree"
    echo "  CentOS/RHEL:   sudo yum install tree"
    echo "  macOS:         brew install tree"
    echo ""
    echo "使用 ls 命令替代显示:"
    echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
    echo ""
    ls -lh --group-directories-first 2>/dev/null || ls -lh
    exit 0
 fi
 echo "📁 根目录结构 (2层):"
 echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
 tree -L 2 -I 'node_modules|.git|__pycache__|*.pyc|.claude' \
  --dirsfirst \
  -F \
  /home/fei/k3s
 echo ""
 echo "📱 应用项目详情:"
 echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
 echo ""
 echo "php-test/ (PHP 应用):"
 tree -L 2 -I 'node_modules|.git' php-test 2>/dev/null || echo "  目录不存在"
 echo ""
 echo "go01/ (Go 应用):"
 tree -L 2 -I 'node_modules|.git' go01 2>/dev/null || echo "  目录不存在"
 echo ""
 echo "🏗️  基础设施配置:"
 echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
 echo ""
 echo "k3s/ (基础设施):"
 tree -L 2 -I 'node_modules|.git' k3s 2>/dev/null || echo "  目录不存在"
 echo ""
 echo "💡 提示:"
 echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
 echo "  查看完整结构: tree -I 'node_modules|.git'"
 echo "  查看项目索引: cat PROJECT-INDEX.md"
 echo "  查看项目状态: ./scripts/project-status.sh"
 echo ""
--- a/005-docs/notes/biji.md
+++ b/005-docs/notes/biji.md
@@ -0,0 +1,228 @@
 k3s集群部署完成!                                                                                                                                                             
  📊 部署总结                                                                                                                                                                     
  ✅ 集群状态                                                                                                                                                                     
  - k3s版本: v1.34.3+k3s1                                                                                                                                                         
  - 节点状态: Ready (control-plane)                                                                                                                                               
  - 运行中的Pods: 35个                                                                                                                                                            
  - 存储系统: Longhorn (分布式存储)                                                                                                                                               
  - 证书管理: cert-manager + Let's Encrypt                                                                                                                                        
  🚀 已部署的服务                                                                                                                                                                 
  所有服务都已实现无状态化部署,数据和存储完全解耦:                                                                                                                                
  ┌─────────────────┬───────────────────────────────┬────────────┬─────────────┬─────────────────┐                                                                                
  │      服务       │             域名              │    状态    │    存储     │      HTTPS      │                                                                                
  ├─────────────────┼───────────────────────────────┼────────────┼─────────────┼─────────────────┤                                                                                
  │ WordPress博客   │ https://blog.u6.net3w.com     │ ✅ Running │ 2Gi (MySQL) │ ✅              │                                                                                
  ├─────────────────┼───────────────────────────────┼────────────┼─────────────┼─────────────────┤                                                                                
  │ Docker Registry │ https://registry.u6.net3w.com │ ✅ Running │ 20Gi        │ ⏳ (证书申请中) │                                                                                
  ├─────────────────┼───────────────────────────────┼────────────┼─────────────┼─────────────────┤                                                                                
  │ n8n工作流       │ https://n8n.u6.net3w.com      │ ✅ Running │ 5Gi         │ ✅              │                                                                                
  ├─────────────────┼───────────────────────────────┼────────────┼─────────────┼─────────────────┤                                                                                
  │ Gitea代码仓库   │ https://git.u6.net3w.com      │ ✅ Running │ 10Gi        │ ✅              │                                                                                
  ├─────────────────┼───────────────────────────────┼────────────┼─────────────┼─────────────────┤                                                                                
  │ Uptime Kuma监控 │ https://status.u6.net3w.com   │ ✅ Running │ 2Gi         │ ✅              │                                                                                
  ├─────────────────┼───────────────────────────────┼────────────┼─────────────┼─────────────────┤                                                                                
  │ 导航页          │ https://nav.u6.net3w.com      │ ✅ Running │ -           │ ✅              │                                                                                
  └─────────────────┴───────────────────────────────┴────────────┴─────────────┴─────────────────┘                                                                                
  💾 存储配置                                                                                                                                                                     
  所有数据都使用Longhorn持久化存储,实现了容器与数据的完全解耦:                                                                                                                    
  - MySQL数据库: 2Gi独立存储卷                                                                                                                                                    
  - Docker镜像仓库: 20Gi独立存储卷                                                                                                                                                
  - Gitea代码仓库: 10Gi独立存储卷                                                                                                                                                 
  - n8n工作流数据: 5Gi独立存储卷                                                                                                                                                  
  - Uptime Kuma监控数据: 2Gi独立存储卷                                                                                                                                            
  🔐 安全配置                                                                                                                                                                     
  - ✅ 自动HTTPS证书(Let's Encrypt)                                                                                                                                               
  - ✅ Traefik Ingress Controller                                                                                                                                                 
  - ✅ Docker Registry认证(用户名: admin, 密码: 123456)                                                                                                                           
  📝 常用命令                                                                                                                                                                     
  # 查看所有服务                                                                                                                                                                  
  kubectl get pods -A                                                                                                                                                             
  # 查看存储卷                                                                                                                                                                    
  kubectl get pvc -A                                                                                                                                                              
  # 查看域名和证书                                                                                                                                                                
  kubectl get ingress -A                                                                                                                                                          
  kubectl get certificate -A                                                                                                                                                      
  # 查看Longhorn UI (可选)                                                                                                                                                        
  kubectl get svc -n longhorn-system                                                                                                                                              
  ⚠️ 注意事项                                                                                                                                                                     
  1. Registry证书: registry.u6.net3w.com的SSL证书还在申请中,稍等几分钟会自动完成                                                                                                  
  2. DNS解析: 确保所有*.u6.net3w.com子域名都已正确解析到 134.195.210.237                                                                                                          
  3. 数据持久化: 所有数据都存储在Longhorn卷中,容器重启不会丢失数据                                                                                                                
  所有服务已成功实现无状态化部署,可以随时重启、扩容或迁移! 🎊                                                                                                                     
 ● ✅ 部署完成！                                                       
  Registry 账号密码                                                   
  - 用户名: admin                                                     
  - 密码: 123456                                                      
  Web 界面访问地址                                                    
  🌐 https://registry-ui.u6.net3w.com                                 
  部署状态                                                            
  - ✅ Registry UI Pod 运行中                                         
  - ✅ Service 已创建                                                 
  - ✅ Ingress 已配置                                                 
  - ✅ SSL 证书已自动签发（Let's Encrypt）                            
  使用说明                                                            
  1. 访问 Web 界面                                                    
    - 打开浏览器访问: https://registry-ui.u6.net3w.com                
    - 现在可以看到漂亮的 Web 界面，不再是空白页面                     
  2. 浏览镜像                                                         
    - UI 会自动连接到您的 Registry                                    
    - 可以查看所有已推送的镜像                                        
    - 支持搜索和过滤功能                                              
  3. Docker 命令使用                                                  
  # 登录 Registry                                                     
  docker login registry.u6.net3w.com                                  
  # 用户名: admin                                                     
  # 密码: 123456                                                      
  # 推送镜像                                                          
  docker tag my-image:latest registry.u6.net3w.com/my-image:latest    
  docker push registry.u6.net3w.com/my-image:latest                   
  # 拉取镜像                                                          
  docker pull registry.u6.net3w.com/my-image:latest                   
  配置文件位置                                                        
  - Registry 配置: /home/fei/k3s/k3s/registry/registry-stack.yaml     
  - UI 配置: /home/fei/k3s/k3s/registry/registry-ui.yaml              
  - 密码文件: /home/fei/k3s/k3s/registry/hardcode-secret.yaml         
  现在您可以通过 Web 界面方便地管理 Docker 镜像了！  
  认证已经成功！                                                   
  我刚才测试了 docker login，登录成功了！                             
  docker login registry.u6.net3w.com --username admin --password      
  123456                                                              
  # 输出: Login Succeeded ✅                                          
  可能的问题原因                                                      
  您遇到 401 错误可能是因为：                                         
  1. 密码输入错误 - 在交互式输入时可能有隐藏字符或空格                
  2. Docker 缓存 - 可能缓存了旧的错误认证信息                         
  解决方案                                                            
  请尝试以下方法：                                                    
  方法 1：清除旧的认证信息后重试                                      
  # 先登出                                                            
  sudo docker logout registry.u6.net3w.com                            
  # 再登录                                                            
  sudo docker login registry.u6.net3w.com                             
  # Username: admin                                                   
  # Password: 123456                                                  
  方法 2：使用命令行参数（推荐）                                      
  echo "123456" | sudo docker login registry.u6.net3w.com --username  
  admin --password-stdin                                              
  方法 3：直接使用密码参数                                            
  sudo docker login registry.u6.net3w.com --username admin --password 
  123456                                                              
  验证登录成功后                                                      
  登录成功后，您可以：                                                
  # 测试推送镜像                                                      
  sudo docker tag nginx:latest registry.u6.net3w.com/nginx:test       
  sudo docker push registry.u6.net3w.com/nginx:test                   
  # 查看 Web UI                                                       
  # 访问: https://registry-ui.u6.net3w.com                            
  请尝试上述方法之一，应该就能成功登录了！如果还有问题，请告诉我具体的
  错误信息。                                                          
 ✻ Crunched for 1m 17s                                                 
 ● Background command "Monitor registry logs in real-time" completed   
 (exit code 0)                                                         
 ● 我看到后台日志监控任务已完成。让我读取完整的日志输出。              
 ● Read agent output bb664e0                                           
  ⎿  Read 16 lines                                                    
 ● 完美！从日志中我可以看到 Docker 登录已经成功了！                    
  日志分析                                                            
  查看日志的第 35-39 行，显示了完整的 Docker 登录流程：               
  1. 第一次请求（401）：                                              
  "GET /v2/" HTTP/1.1" 401 87 "" "Docker-Client/28.2.2 (linux)"       
  1. Docker 客户端先发送不带认证的请求，收到 401 要求认证             
  2. 第二次请求（200 成功）：                                         
  level=info msg="authorized request" ... "Docker-Client/28.2.2       
  (linux)"                                                            
  "GET /v2/" HTTP/1.1" 200 2 "" "Docker-Client/28.2.2 (linux)"        
  2. Docker 客户端发送带认证的请求，认证成功！                        
  总结                                                                
  ✅ 问题已解决                                                       
  - Registry 认证配置正确                                             
  - 账号密码：admin / 123456                                          
  - Docker 登录测试成功                                               
  您现在可以使用的服务                                                
  1. Docker Registry API                                              
  sudo docker login registry.u6.net3w.com                             
  # Username: admin                                                   
  # Password: 123456                                                  
  2. Web UI 界面                                                      
    - 访问：https://registry-ui.u6.net3w.com                          
    - 可视化管理您的镜像                                              
  使用示例                                                            
  # 推送镜像                                                          
  sudo docker tag my-app:latest registry.u6.net3w.com/my-app:latest   
  sudo docker push registry.u6.net3w.com/my-app:latest                
  # 拉取镜像                                                          
  sudo docker pull registry.u6.net3w.com/my-app:latest                
  如果您之前遇到的 401 错误还在，请尝试我之前建议的方法（先 logout 再 
  login，或使用 --password-stdin）。Registry 本身已经完全正常工作了！ 
--- a/005-docs/notes/gitops.md
+++ b/005-docs/notes/gitops.md
@@ -0,0 +1,94 @@
 完美！GitOps 自动部署环境已经完全配置好了！                         
  🎉 GitOps 实现条件评估结果                                          
  ✅ 所有条件已满足！                                                 
  ┌────────────────┬──────┬──────────────────────────────┐            
  │      条件      │ 状态 │             说明             │            
  ├────────────────┼──────┼──────────────────────────────┤            
  │ k3s 集群       │ ✅   │ v1.34.3+k3s1 运行中          │            
  ├────────────────┼──────┼──────────────────────────────┤            
  │ Git 仓库       │ ✅   │ 本地仓库已初始化，配置已提交 │            
  ├────────────────┼──────┼──────────────────────────────┤            
  │ Gitea 代码托管 │ ✅   │ https://git.u6.net3w.com     │            
  ├────────────────┼──────┼──────────────────────────────┤            
  │ ArgoCD         │ ✅   │ https://argocd.u6.net3w.com  │            
  ├────────────────┼──────┼──────────────────────────────┤            
  │ 自动同步配置   │ ✅   │ argocd-app.yaml 已创建       │            
  ├────────────────┼──────┼──────────────────────────────┤            
  │ HTTPS 证书     │ ✅   │ Let's Encrypt 自动签发       │            
  └────────────────┴──────┴──────────────────────────────┘            
  🚀 下一步操作指南                                                   
  1️⃣ 访问 Gitea 创建仓库                                              
  访问: https://git.u6.net3w.com                                      
  1. 注册账号（首次访问）                                             
  2. 创建新仓库: k3s-configs                                          
  3. 记录仓库 URL                                                     
  2️⃣ 推送代码到 Gitea                                                 
  # 添加远程仓库（替换为你的用户名）                                  
  git remote add origin                                               
  https://git.u6.net3w.com/你的用户名/k3s-configs.git                 
  # 推送代码                                                          
  git push -u origin master                                           
  3️⃣ 配置 ArgoCD                                                      
  访问: https://argocd.u6.net3w.com                                   
  用户名: admin                                                       
  密码: R5ALO8NIRi4KoN8Q                                              
  步骤:                                                               
  1. Settings → Repositories → Connect Repo                           
  2. 填入 Gitea 仓库信息                                              
  3. 应用 ArgoCD Application:                                         
     kubectl apply -f k3s/argocd-app.yaml                             
  4️⃣ 测试自动部署                                                     
  # 修改任意配置文件                                                  
  vim k3s/my-blog/02-wordpress.yaml                                   
  # 提交并推送                                                        
  git add .                                                           
  git commit -m "Test auto-sync"                                      
  git push                                                            
  # ArgoCD 会在 3 分钟内自动同步到集群！                              
  📊 当前部署状态                                                     
  ✅ k3s 集群: 运行中 (35+ pods)                                      
  ✅ ArgoCD: https://argocd.u6.net3w.com                              
  ✅ Gitea: https://git.u6.net3w.com                                  
  ✅ WordPress: https://blog.u6.net3w.com                             
  ✅ Docker Registry: https://registry.u6.net3w.com                   
  ✅ n8n: https://n8n.u6.net3w.com                                    
  ✅ Uptime Kuma: https://status.u6.net3w.com                         
  ✅ 导航页: https://nav.u6.net3w.com                                 
  🎯 GitOps 工作流                                                    
  ┌─────────────┐      ┌─────────────┐      ┌─────────────┐           
  │  修改 YAML  │ ───> │  Git Push   │ ───> │   Gitea     │           
  └─────────────┘      └─────────────┘                                
                                                    │                 
                                                    ▼                 
  ┌─────────────┐      ┌─────────────┐      ┌─────────────┐           
  │  k3s 集群   │ <─── │   ArgoCD    │ <─── │  自动检测   │           
  │   更新完成  │      │   自动同步  │      │   变更      │           
  └─────────────┘      └─────────────┘                                
  所有实现条件已满足！ 你现在可以实现：                               
  - ✅ 更新 YAML 到 Git                                               
  - ✅ 自动同步到生产环境                                             
  - ✅ 版本控制和快速回滚                                             
  - ✅ 完整的审计追踪                                                 
  详细使用说明请查看 README.md 文件！🎊                               
 ✻ Churned for 4m 47s                         
--- a/005-docs/notes/后期加入节点/K3S-CLUSTER-EXPANSION.md
+++ b/005-docs/notes/后期加入节点/K3S-CLUSTER-EXPANSION.md
@@ -0,0 +1,844 @@
 # K3s 集群扩展指南
 ## 📋 目录
 - [当前集群状态](#当前集群状态)
 - [前置条件](#前置条件)
 - [架构设计方案](#架构设计方案)
  - [2节点集群（1 Master + 2 Worker）](#2节点集群1-master--2-worker)
  - [4节点集群（3 Master + 4 Worker）](#4节点集群3-master--4-worker)
  - [6节点集群（3 Master + 6 Worker）](#6节点集群3-master--6-worker)
 - [节点加入步骤](#节点加入步骤)
 - [高可用配置](#高可用配置)
 - [存储配置](#存储配置)
 - [验证和测试](#验证和测试)
 - [故障排查](#故障排查)
 ---
 ## 📊 当前集群状态
 ```
 Master 节点: vmus9
 IP 地址: 134.195.210.237
 k3s 版本: v1.34.3+k3s1
 节点令牌: K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d
 ```
 **重要**: 请妥善保管节点令牌，这是其他节点加入集群的凭证！
 ---
 ## ✅ 前置条件
 ### 所有新节点需要满足：
 #### 1. 硬件要求
 ```
 最低配置:
 - CPU: 2 核
 - 内存: 2GB (建议 4GB+)
 - 磁盘: 20GB (Longhorn 存储建议 50GB+)
 推荐配置:
 - CPU: 4 核
 - 内存: 8GB
 - 磁盘: 100GB SSD
 ```
 #### 2. 操作系统
 ```bash
 # 支持的系统
 - Ubuntu 20.04/22.04/24.04
 - Debian 10/11/12
 - CentOS 7/8
 - RHEL 7/8
 # 检查系统版本
 cat /etc/os-release
 ```
 #### 3. 网络要求
 ```bash
 # 所有节点之间需要能够互相访问
 # 需要开放的端口:
 Master 节点:
 - 6443: Kubernetes API Server
 - 10250: Kubelet metrics
 - 2379-2380: etcd (仅 HA 模式)
 Worker 节点:
 - 10250: Kubelet metrics
 - 30000-32767: NodePort Services
 所有节点:
 - 8472: Flannel VXLAN (UDP)
 - 51820: Flannel WireGuard (UDP)
 ```
 #### 4. 系统准备
 在每个新节点上执行：
 ```bash
 # 1. 更新系统
 sudo apt update && sudo apt upgrade -y
 # 2. 禁用 swap (k8s 要求)
 sudo swapoff -a
 sudo sed -i '/ swap / s/^/#/' /etc/fstab
 # 3. 配置主机名 (每个节点不同)
 sudo hostnamectl set-hostname worker-node-1
 # 4. 配置时间同步
 sudo apt install -y chrony
 sudo systemctl enable --now chrony
 # 5. 安装必要工具
 sudo apt install -y curl wget git
 # 6. 配置防火墙 (如果启用)
 # Ubuntu/Debian
 sudo ufw allow 6443/tcp
 sudo ufw allow 10250/tcp
 sudo ufw allow 8472/udp
 sudo ufw allow 51820/udp
 ```
 ---
 ## 🏗️ 架构设计方案
 ### 方案一：2节点集群（1 Master + 2 Worker）
 **适用场景**: 开发/测试环境，小型应用
 ```
 ┌─────────────────────────────────────────────────┐
 │              负载均衡 (可选)                      │
 │         *.u6.net3w.com (Traefik)                │
 └─────────────────────────────────────────────────┘
                      │
        ┌─────────────┼─────────────┐
        │             │             │
 ┌───────▼──────┐ ┌────▼─────┐ ┌────▼─────┐
 │   Master     │ │ Worker-1 │ │ Worker-2 │
 │   vmus9      │ │          │ │          │
 │ Control Plane│ │  应用负载 │ │  应用负载 │
 │   + etcd     │ │          │ │          │
 │ 134.195.x.x  │ │ 新节点1   │ │ 新节点2   │
 └──────────────┘ └──────────┘ └──────────┘
 ```
 **特点**:
 - ✅ 简单易维护
 - ✅ 成本低
 - ❌ Master 单点故障
 - ❌ 不适合生产环境
 **资源分配建议**:
 - Master: 4C8G (运行控制平面 + 部分应用)
 - Worker-1: 4C8G (运行应用负载)
 - Worker-2: 4C8G (运行应用负载)
 ---
 ### 方案二：4节点集群（3 Master + 4 Worker）
 **适用场景**: 生产环境，中等规模应用
 ```
 ┌──────────────────────────────────────────────────┐
 │           外部负载均衡 (必需)                       │
 │      HAProxy/Nginx/云厂商 LB                      │
 │         *.u6.net3w.com                           │
 └──────────────────────────────────────────────────┘
                      │
        ┌─────────────┼─────────────┬─────────────┐
        │             │             │             │
 ┌───────▼──────┐ ┌────▼─────┐ ┌────▼─────┐ ┌─────▼────┐
 │  Master-1    │ │ Master-2 │ │ Master-3 │ │ Worker-1 │
 │   vmus9      │ │          │ │          │ │      │
 │ Control Plane│ │ Control  │ │ Control  │ │  应用负载 │
 │   + etcd     │ │  + etcd  │ │  + etcd  │ │          │
 └──────────────┘ └──────────┘ └──────────┘ └──────────┘
                                            ┌──────────┐
                                            │ Worker-2 │
                                            │  应用负载 │
                                            └──────────┘
                                            ┌──────────┐
                                            │ Worker-3 │
                                            │  应用负载 │
                                            └──────────┘
                                            ┌──────────┐
                                            │ Worker-4 │
                                            │  应用负载 │
                                            └──────────┘
 ```
 **特点**:
 - ✅ 高可用 (HA)
 - ✅ Master 节点冗余
 - ✅ 适合生产环境
 - ✅ 可承载中等规模应用
 - ⚠️ 需要外部负载均衡
 **资源分配建议**:
 - Master-1/2/3: 4C8G (仅运行控制平面)
 - Worker-1/2/3/4: 8C16G (运行应用负载)
 **etcd 集群**: 3 个 Master 节点组成 etcd 集群，可容忍 1 个节点故障
 ---
 ### 方案三：6节点集群（3 Master + 6 Worker）
 **适用场景**: 大规模生产环境，高负载应用
 ```
 ┌──────────────────────────────────────────────────┐
 │           外部负载均衡 (必需)                       │
 │      HAProxy/Nginx/云厂商 LB                      │
 │         *.u6.net3w.com                           │
 └──────────────────────────────────────────────────┘
                      │
        ┌─────────────┼─────────────┬─────────────┐
        │             │             │             │
 ┌───────▼──────┐ ┌────▼─────┐ ┌────▼─────┐       │
 │  Master-1    │ │ Master-2 │ │ Master-3 │       │
 │   vmus9      │ │          │ │          │       │
 │ Control Plane│ │ Control  │ │ Control  │       │
 │   + etcd     │ │  + etcd  │ │  + etcd  │       │
 └──────────────┘ └──────────┘ └──────────┘       │
                                                  │
        ┌─────────────┬─────────────┬─────────────┘
        │             │             │
 ┌───────▼──────┐ ┌────▼─────┐ ┌────▼─────┐
 │  Worker-1    │ │ Worker-2 │ │ Worker-3 │
 │  Web 应用层  │ │  Web 层  │ │  Web 层  │
 └──────────────┘ └──────────┘ └──────────┘
 ┌──────────────┐ ┌──────────┐ ┌──────────┐
 │  Worker-4    │ │ Worker-5 │ │ Worker-6 │
 │  数据库层    │ │  缓存层  │ │  存储层  │
 └──────────────┘ └──────────┘ └──────────┘
 ```
 **特点**:
 - ✅ 高可用 + 高性能
 - ✅ 可按功能分层部署
 - ✅ 支持大规模应用
 - ✅ Longhorn 存储性能最佳
 - ⚠️ 管理复杂度较高
 - ⚠️ 成本较高
 **资源分配建议**:
 - Master-1/2/3: 4C8G (专用控制平面)
 - Worker-1/2/3: 8C16G (Web 应用层)
 - Worker-4: 8C32G (数据库层，高内存)
 - Worker-5: 8C16G (缓存层)
 - Worker-6: 4C8G + 200GB SSD (存储层)
 **节点标签策略**:
 ```bash
 # Web 层
 kubectl label nodes worker-1 node-role=web
 kubectl label nodes worker-2 node-role=web
 kubectl label nodes worker-3 node-role=web
 # 数据库层
 kubectl label nodes worker-4 node-role=database
 # 缓存层
 kubectl label nodes worker-5 node-role=cache
 # 存储层
 kubectl label nodes worker-6 node-role=storage
 ```
 ---
 ## 🚀 节点加入步骤
 ### 场景 A: 加入 Worker 节点（适用于 2 节点方案）
 #### 在新节点上执行：
 ```bash
 # 1. 设置 Master 节点信息
 export MASTER_IP="134.195.210.237"
 export NODE_TOKEN="K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d"
 # 2. 安装 k3s agent (Worker 节点)
 curl -sfL https://get.k3s.io | K3S_URL=https://${MASTER_IP}:6443 \
  K3S_TOKEN=${NODE_TOKEN} \
  sh -
 # 3. 验证安装
 sudo systemctl status k3s-agent
 # 4. 检查节点是否加入
 # (在 Master 节点执行)
 kubectl get nodes
 ```
 #### 为 Worker 节点添加标签：
 ```bash
 # 在 Master 节点执行
 kubectl label nodes <worker-node-name> node-role.kubernetes.io/worker=worker
 kubectl label nodes <worker-node-name> workload=application
 ```
 ---
 ### 场景 B: 加入 Master 节点（适用于 4/6 节点 HA 方案）
 #### 前提条件：需要外部负载均衡器
 ##### 1. 配置外部负载均衡器
 **选项 1: 使用 HAProxy**
 在一台独立服务器上安装 HAProxy：
 ```bash
 # 安装 HAProxy
 sudo apt install -y haproxy
 # 配置 HAProxy
 sudo tee /etc/haproxy/haproxy.cfg > /dev/null <<EOF
 global
    log /dev/log local0
    log /dev/log local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon
 defaults
    log     global
    mode    tcp
    option  tcplog
    option  dontlognull
    timeout connect 5000
    timeout client  50000
    timeout server  50000
 frontend k3s-api
    bind *:6443
    mode tcp
    default_backend k3s-masters
 backend k3s-masters
    mode tcp
    balance roundrobin
    option tcp-check
    server master-1 134.195.210.237:6443 check fall 3 rise 2
    server master-2 <MASTER-2-IP>:6443 check fall 3 rise 2
    server master-3 <MASTER-3-IP>:6443 check fall 3 rise 2
 EOF
 # 重启 HAProxy
 sudo systemctl restart haproxy
 sudo systemctl enable haproxy
 ```
 **选项 2: 使用 Nginx**
 ```bash
 # 安装 Nginx
 sudo apt install -y nginx
 # 配置 Nginx Stream
 sudo tee /etc/nginx/nginx.conf > /dev/null <<EOF
 stream {
    upstream k3s_servers {
        server 134.195.210.237:6443 max_fails=3 fail_timeout=5s;
        server <MASTER-2-IP>:6443 max_fails=3 fail_timeout=5s;
        server <MASTER-3-IP>:6443 max_fails=3 fail_timeout=5s;
    }
    server {
        listen 6443;
        proxy_pass k3s_servers;
    }
 }
 EOF
 # 重启 Nginx
 sudo systemctl restart nginx
 ```
 ##### 2. 在第一个 Master 节点（当前节点）启用 HA
 ```bash
 # 在当前 Master 节点执行
 export LB_IP="<负载均衡器IP>"
 # 重新安装 k3s 为 HA 模式
 curl -sfL https://get.k3s.io | sh -s - server \
  --cluster-init \
  --tls-san=${LB_IP} \
  --write-kubeconfig-mode 644
 # 获取新的 token
 sudo cat /var/lib/rancher/k3s/server/node-token
 ```
 ##### 3. 加入第二个 Master 节点
 ```bash
 # 在新的 Master 节点执行
 export MASTER_IP="134.195.210.237"  # 第一个 Master
 export LB_IP="<负载均衡器IP>"
 export NODE_TOKEN="<新的 token>"
 curl -sfL https://get.k3s.io | sh -s - server \
  --server https://${MASTER_IP}:6443 \
  --token ${NODE_TOKEN} \
  --tls-san=${LB_IP} \
  --write-kubeconfig-mode 644
 ```
 ##### 4. 加入第三个 Master 节点
 ```bash
 # 在第三个 Master 节点执行（同上）
 export MASTER_IP="134.195.210.237"
 export LB_IP="<负载均衡器IP>"
 export NODE_TOKEN="<token>"
 curl -sfL https://get.k3s.io | sh -s - server \
  --server https://${MASTER_IP}:6443 \
  --token ${NODE_TOKEN} \
  --tls-san=${LB_IP} \
  --write-kubeconfig-mode 644
 ```
 ##### 5. 验证 HA 集群
 ```bash
 # 检查所有 Master 节点
 kubectl get nodes
 # 检查 etcd 集群状态
 kubectl get pods -n kube-system | grep etcd
 # 检查 etcd 成员
 sudo k3s etcd-snapshot save --etcd-s3=false
 ```
 ---
 ### 场景 C: 混合加入（先加 Master 再加 Worker）
 **推荐顺序**:
 1. 配置外部负载均衡器
 2. 转换第一个节点为 HA 模式
 3. 加入第 2、3 个 Master 节点
 4. 验证 Master 集群正常
 5. 依次加入 Worker 节点
 ---
 ## 💾 存储配置
 ### Longhorn 多节点配置
 当集群有 3+ 节点时，Longhorn 可以提供分布式存储和数据冗余。
 #### 1. 在所有节点安装依赖
 ```bash
 # 在每个节点执行
 sudo apt install -y open-iscsi nfs-common
 # 启动 iscsid
 sudo systemctl enable --now iscsid
 ```
 #### 2. 配置 Longhorn 副本数
 ```bash
 # 在 Master 节点执行
 kubectl edit settings.longhorn.io default-replica-count -n longhorn-system
 # 修改为:
 # value: "3"  # 3 副本（需要至少 3 个节点）
 # value: "2"  # 2 副本（需要至少 2 个节点）
 ```
 #### 3. 为节点添加存储标签
 ```bash
 # 标记哪些节点用于存储
 kubectl label nodes worker-1 node.longhorn.io/create-default-disk=true
 kubectl label nodes worker-2 node.longhorn.io/create-default-disk=true
 kubectl label nodes worker-3 node.longhorn.io/create-default-disk=true
 # 排除某些节点（如纯计算节点）
 kubectl label nodes worker-4 node.longhorn.io/create-default-disk=false
 ```
 #### 4. 配置存储路径
 ```bash
 # 在每个存储节点创建目录
 sudo mkdir -p /var/lib/longhorn
 sudo chmod 700 /var/lib/longhorn
 ```
 #### 5. 访问 Longhorn UI
 ```bash
 # 创建 Ingress (如果还没有)
 kubectl apply -f k3s/my-blog/longhorn-ingress.yaml
 # 访问: https://longhorn.u6.net3w.com
 ```
 ---
 ## ✅ 验证和测试
 ### 1. 检查节点状态
 ```bash
 # 查看所有节点
 kubectl get nodes -o wide
 # 查看节点详细信息
 kubectl describe node <node-name>
 # 查看节点资源使用
 kubectl top nodes
 ```
 ### 2. 测试 Pod 调度
 ```bash
 # 创建测试 Deployment
 kubectl create deployment nginx-test --image=nginx --replicas=6
 # 查看 Pod 分布
 kubectl get pods -o wide
 # 清理测试
 kubectl delete deployment nginx-test
 ```
 ### 3. 测试存储
 ```bash
 # 创建测试 PVC
 cat <<EOF | kubectl apply -f -
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: test-pvc
 spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 1Gi
 EOF
 # 检查 PVC 状态
 kubectl get pvc test-pvc
 # 清理
 kubectl delete pvc test-pvc
 ```
 ### 4. 测试高可用（仅 HA 集群）
 ```bash
 # 模拟 Master 节点故障
 # 在一个 Master 节点执行
 sudo systemctl stop k3s
 # 在另一个节点检查集群是否正常
 kubectl get nodes
 # 恢复节点
 sudo systemctl start k3s
 ```
 ### 5. 测试网络连通性
 ```bash
 # 在 Master 节点创建测试 Pod
 kubectl run test-pod --image=busybox --restart=Never -- sleep 3600
 # 进入 Pod 测试网络
 kubectl exec -it test-pod -- sh
 # 在 Pod 内测试
 ping 8.8.8.8
 nslookup kubernetes.default
 # 清理
 kubectl delete pod test-pod
 ```
 ---
 ## 🔧 故障排查
 ### 问题 1: 节点无法加入集群
 **症状**: `k3s-agent` 服务启动失败
 **排查步骤**:
 ```bash
 # 1. 检查服务状态
 sudo systemctl status k3s-agent
 # 2. 查看日志
 sudo journalctl -u k3s-agent -f
 # 3. 检查网络连通性
 ping <MASTER_IP>
 telnet <MASTER_IP> 6443
 # 4. 检查 token 是否正确
 echo $NODE_TOKEN
 # 5. 检查防火墙
 sudo ufw status
 ```
 **解决方案**:
 ```bash
 # 重新安装
 sudo /usr/local/bin/k3s-agent-uninstall.sh
 curl -sfL https://get.k3s.io | K3S_URL=https://${MASTER_IP}:6443 \
  K3S_TOKEN=${NODE_TOKEN} sh -
 ```
 ---
 ### 问题 2: 节点状态为 NotReady
 **症状**: `kubectl get nodes` 显示节点 NotReady
 **排查步骤**:
 ```bash
 # 1. 检查节点详情
 kubectl describe node <node-name>
 # 2. 检查 kubelet 日志
 # 在问题节点执行
 sudo journalctl -u k3s-agent -n 100
 # 3. 检查网络插件
 kubectl get pods -n kube-system | grep flannel
 ```
 **解决方案**:
 ```bash
 # 重启 k3s 服务
 sudo systemctl restart k3s-agent
 # 如果是网络问题，检查 CNI 配置
 sudo ls -la /etc/cni/net.d/
 ```
 ---
 ### 问题 3: Pod 无法调度到新节点
 **症状**: Pod 一直 Pending 或只调度到旧节点
 **排查步骤**:
 ```bash
 # 1. 检查节点污点
 kubectl describe node <node-name> | grep Taints
 # 2. 检查节点标签
 kubectl get nodes --show-labels
 # 3. 检查 Pod 的调度约束
 kubectl describe pod <pod-name>
 ```
 **解决方案**:
 ```bash
 # 移除污点
 kubectl taint nodes <node-name> node.kubernetes.io/not-ready:NoSchedule-
 # 添加标签
 kubectl label nodes <node-name> node-role.kubernetes.io/worker=worker
 ```
 ---
 ### 问题 4: Longhorn 存储无法使用
 **症状**: PVC 一直 Pending
 **排查步骤**:
 ```bash
 # 1. 检查 Longhorn 组件
 kubectl get pods -n longhorn-system
 # 2. 检查节点是否满足要求
 kubectl get nodes -o jsonpath='{.items[*].status.conditions[?(@.type=="Ready")].status}'
 # 3. 检查 iscsid 服务
 sudo systemctl status iscsid
 ```
 **解决方案**:
 ```bash
 # 在新节点安装依赖
 sudo apt install -y open-iscsi
 sudo systemctl enable --now iscsid
 # 重启 Longhorn manager
 kubectl rollout restart deployment longhorn-driver-deployer -n longhorn-system
 ```
 ---
 ### 问题 5: etcd 集群不健康（HA 模式）
 **症状**: Master 节点无法正常工作
 **排查步骤**:
 ```bash
 # 1. 检查 etcd 成员
 sudo k3s etcd-snapshot ls
 # 2. 检查 etcd 日志
 sudo journalctl -u k3s -n 100 | grep etcd
 # 3. 检查 etcd 端口
 sudo netstat -tlnp | grep 2379
 ```
 **解决方案**:
 ```bash
 # 从快照恢复（谨慎操作）
 sudo k3s server \
  --cluster-reset \
  --cluster-reset-restore-path=/var/lib/rancher/k3s/server/db/snapshots/<snapshot-name>
 ```
 ---
 ## 📚 快速参考
 ### 常用命令
 ```bash
 # 查看集群信息
 kubectl cluster-info
 kubectl get nodes -o wide
 kubectl get pods -A
 # 查看节点资源
 kubectl top nodes
 kubectl describe node <node-name>
 # 管理节点
 kubectl cordon <node-name>    # 标记为不可调度
 kubectl drain <node-name>     # 驱逐 Pod
 kubectl uncordon <node-name>  # 恢复调度
 # 删除节点
 kubectl delete node <node-name>
 # 在节点上卸载 k3s
 # Worker 节点
 sudo /usr/local/bin/k3s-agent-uninstall.sh
 # Master 节点
 sudo /usr/local/bin/k3s-uninstall.sh
 ```
 ### 节点标签示例
 ```bash
 # 角色标签
 kubectl label nodes <node> node-role.kubernetes.io/worker=worker
 kubectl label nodes <node> node-role.kubernetes.io/master=master
 # 功能标签
 kubectl label nodes <node> workload=database
 kubectl label nodes <node> workload=web
 kubectl label nodes <node> workload=cache
 # 区域标签
 kubectl label nodes <node> topology.kubernetes.io/zone=zone-a
 kubectl label nodes <node> topology.kubernetes.io/region=us-east
 ```
 ---
 ## 🎯 最佳实践
 ### 1. 节点命名规范
 ```
 master-1, master-2, master-3
 worker-1, worker-2, worker-3, ...
 ```
 ### 2. 逐步扩展
 - 先加入 1 个节点测试
 - 验证正常后再批量加入
 - 避免同时加入多个节点
 ### 3. 监控和告警
 ```bash
 # 部署 Prometheus + Grafana
 kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/setup/
 kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/
 ```
 ### 4. 定期备份
 ```bash
 # 备份 etcd
 sudo k3s etcd-snapshot save --name backup-$(date +%Y%m%d-%H%M%S)
 # 查看备份
 sudo k3s etcd-snapshot ls
 ```
 ### 5. 资源预留
 ```bash
 # 为系统组件预留资源
 kubectl apply -f - <<EOF
 apiVersion: v1
 kind: ResourceQuota
 metadata:
  name: system-quota
  namespace: kube-system
 spec:
  hard:
    requests.cpu: "2"
    requests.memory: 4Gi
 EOF
 ```
 ---
 ## 📞 获取帮助
 - k3s 官方文档: https://docs.k3s.io
 - Longhorn 文档: https://longhorn.io/docs
 - Kubernetes 文档: https://kubernetes.io/docs
 ---
 **文档版本**: v1.0
 **最后更新**: 2026-01-21
 **适用于**: k3s v1.34.3+k3s1
--- a/005-docs/notes/后期加入节点/QUICK-REFERENCE.md
+++ b/005-docs/notes/后期加入节点/QUICK-REFERENCE.md
@@ -0,0 +1,161 @@
 # K3s 集群扩展快速参考
 ## 🚀 快速开始
 ### 当前集群信息
 ```
 Master IP: 134.195.210.237
 Token: K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d
 ```
 ### 一键加入脚本
 #### Worker 节点（最简单）
 ```bash
 # 在新节点上执行
 sudo bash scripts/join-worker.sh
 ```
 #### Master 节点（HA 模式）
 ```bash
 # 在新节点上执行
 sudo bash scripts/join-master.sh
 ```
 ---
 ## 📊 扩展方案对比
 | 方案 | 节点配置 | 适用场景 | 高可用 | 成本 |
 |------|---------|---------|--------|------|
 | **2节点** | 1M + 2W | 开发/测试 | ❌ | 💰 |
 | **4节点** | 3M + 4W | 生产环境 | ✅ | 💰💰💰 |
 | **6节点** | 3M + 6W | 大规模生产 | ✅ | 💰💰💰💰 |
 M = Master, W = Worker
 ---
 ## 🔧 手动加入命令
 ### Worker 节点
 ```bash
 export MASTER_IP="134.195.210.237"
 export NODE_TOKEN="K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d"
 curl -sfL https://get.k3s.io | K3S_URL=https://${MASTER_IP}:6443 \
  K3S_TOKEN=${NODE_TOKEN} sh -
 ```
 ### Master 节点（需要先配置负载均衡器）
 ```bash
 export FIRST_MASTER="134.195.210.237"
 export LB_IP="<负载均衡器IP>"
 export NODE_TOKEN="K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d"
 curl -sfL https://get.k3s.io | sh -s - server \
  --server https://${FIRST_MASTER}:6443 \
  --token ${NODE_TOKEN} \
  --tls-san=${LB_IP} \
  --write-kubeconfig-mode 644
 ```
 ---
 ## ✅ 验证命令
 ```bash
 # 查看节点
 kubectl get nodes -o wide
 # 健康检查
 bash scripts/check-node-health.sh
 # 查看节点详情
 kubectl describe node <node-name>
 # 查看资源使用
 kubectl top nodes
 ```
 ---
 ## 🏷️ 节点标签
 ```bash
 # Worker 节点
 kubectl label nodes <node> node-role.kubernetes.io/worker=worker
 # 功能标签
 kubectl label nodes <node> workload=web
 kubectl label nodes <node> workload=database
 kubectl label nodes <node> workload=cache
 ```
 ---
 ## 🔥 常见问题
 ### 节点无法加入？
 ```bash
 # 检查网络
 ping 134.195.210.237
 telnet 134.195.210.237 6443
 # 查看日志
 sudo journalctl -u k3s-agent -f
 ```
 ### 节点 NotReady？
 ```bash
 # 重启服务
 sudo systemctl restart k3s-agent
 # 检查详情
 kubectl describe node <node-name>
 ```
 ### 删除节点
 ```bash
 # 在 Master 节点
 kubectl drain <node-name> --ignore-daemonsets
 kubectl delete node <node-name>
 # 在要删除的节点
 sudo /usr/local/bin/k3s-agent-uninstall.sh
 ```
 ---
 ## 📚 详细文档
 - 完整扩展指南: [K3S-CLUSTER-EXPANSION.md](K3S-CLUSTER-EXPANSION.md)
 - GitOps 自动部署: [README.md](README.md)
 ---
 ## 🎯 推荐流程
 ### 2节点扩展（开发环境）
 1. ✅ 准备 2 台新 VPS
 2. ✅ 执行 `join-worker.sh`
 3. ✅ 验证节点状态
 4. ✅ 测试应用部署
 ### 4节点扩展（生产环境）
 1. ✅ 配置外部负载均衡器
 2. ✅ 转换第一个节点为 HA 模式
 3. ✅ 加入 2 个 Master 节点
 4. ✅ 加入 4 个 Worker 节点
 5. ✅ 配置 Longhorn 多副本
 6. ✅ 测试高可用
 ### 6节点扩展（大规模）
 1. ✅ 同 4 节点方案
 2. ✅ 按功能分层部署
 3. ✅ 配置节点亲和性
 4. ✅ 优化资源分配
 ---
 **快速帮助**: 遇到问题？运行 `bash scripts/check-node-health.sh` 进行诊断
--- a/006-其他/code-server.md
+++ b/006-其他/code-server.md
@@ -0,0 +1,39 @@
 我已经准备好了服务文件。现在请执行以下命令来完成设置：                                                                                                                        
  # 1. 将服务文件复制到系统目录（需要输入密码）                                                                                                                                 
  sudo cp /tmp/code-server.service /etc/systemd/system/                                                                                                                         
  # 2. 重载 systemd 配置                                                                                                                                                        
  sudo systemctl daemon-reload                                                                                                                                                  
  # 3. 启用开机自启动                                                                                                                                                           
  sudo systemctl enable code-server                                                                                                                                             
  # 4. 立即启动服务                                                                                                                                                             
  sudo systemctl start code-server                                                                                                                                              
  # 5. 查看服务状态                                                                                                                                                             
  sudo systemctl status code-server                                                                                                                                             
  或者你可以一次性执行：                                                                                                                                                        
  sudo cp /tmp/code-server.service /etc/systemd/system/ && \                                                                                                                    
  sudo systemctl daemon-reload && \                                                                                                                                             
  sudo systemctl enable --now code-server && \                                                                                                                                  
  sudo systemctl status code-server                                                                                                                                             
  服务配置说明：                                                                                                                                                                
  - 监听地址：0.0.0.0:8080（所有网络接口）                                                                                                                                      
  - 工作目录：/home/fei                                                                                                                                                         
  - 运行用户：fei                                                                                                                                                               
  - 自动重启：服务崩溃后 10 秒自动重启                                                                                                                                          
  常用管理命令：                                                                                                                                                                
  - 查看状态：sudo systemctl status code-server                                                                                                                                 
  - 停止服务：sudo systemctl stop code-server                                                                                                                                   
  - 重启服务：sudo systemctl restart code-server                                                                                                                                
  - 查看日志：sudo journalctl -u code-server -f                                                                                                                                 
  - 禁用开机启动：sudo systemctl disable code-server                                                                                                                            
  如果你需要修改配置（如端口号或其他参数），可以编辑 /etc/systemd/system/code-server.service 文件，然后执行 sudo systemctl daemon-reload 和 sudo systemctl restart code-server
--- a/009-基础设施/001-pg16/README.md
+++ b/009-基础设施/001-pg16/README.md
@@ -0,0 +1,429 @@
 # PostgreSQL 16 K3s 部署指南
 本目录包含在 K3s 集群中部署 PostgreSQL 16 数据库的完整配置文件。
 ## 📋 目录结构
 ```
 001-pg16/
 ├── README.md          # 本文件 - 部署说明
 └── k8s/              # K8s 配置文件目录
    ├── namespace.yaml      # infrastructure 命名空间
    ├── secret.yaml         # 数据库密码
    ├── configmap.yaml      # 初始化脚本
    ├── pvc.yaml           # 持久化存储卷声明
    ├── deployment.yaml    # PostgreSQL 部署配置
    ├── service.yaml       # 服务配置
    └── README.md          # K8s 配置详细说明
 ```
 ## 🚀 快速部署
 ### 前置条件
 1. **已安装 K3s**
   ```bash
   # 检查 K3s 是否运行
   sudo systemctl status k3s
   # 检查节点状态
   sudo kubectl get nodes
   ```
 2. **配置 kubectl 权限**（可选，避免每次使用 sudo）
   ```bash
   # 方法1：复制配置到用户目录（推荐）
   mkdir -p ~/.kube
   sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
   sudo chown $USER:$USER ~/.kube/config
   chmod 600 ~/.kube/config
   # 验证配置
   kubectl get nodes
   ```
 ### 一键部署
 ```bash
 # 进入配置目录
 cd /path/to/001-pg16/k8s
 # 部署所有资源
 kubectl apply -f .
 # 或者使用 sudo（如果未配置 kubectl 权限）
 sudo kubectl apply -f .
 ```
 ### 查看部署状态
 ```bash
 # 查看 Pod 状态
 kubectl get pods -n infrastructure
 # 查看 Pod 详细信息
 kubectl describe pod -n infrastructure -l app=pg16
 # 查看初始化日志（实时）
 kubectl logs -n infrastructure -l app=pg16 -f
 # 查看服务状态
 kubectl get svc -n infrastructure
 # 查看 PVC 状态
 kubectl get pvc -n infrastructure
 ```
 ## ✅ 验证部署
 ### 1. 检查 Pod 是否运行
 ```bash
 kubectl get pods -n infrastructure
 ```
 期望输出：
 ```
 NAME                   READY   STATUS    RESTARTS   AGE
 pg16-xxxxxxxxx-xxxxx   1/1     Running   0          2m
 ```
 ### 2. 验证数据库创建
 ```bash
 # 统计数据库总数（应该是 303 个）
 kubectl exec -n infrastructure -l app=pg16 -- psql -U postgres -c "SELECT count(*) FROM pg_database;"
 # 查看前 10 个数据库
 kubectl exec -n infrastructure -l app=pg16 -- psql -U postgres -c "SELECT datname FROM pg_database WHERE datname LIKE 'pg0%' ORDER BY datname LIMIT 10;"
 # 查看最后 10 个数据库
 kubectl exec -n infrastructure -l app=pg16 -- psql -U postgres -c "SELECT datname FROM pg_database WHERE datname LIKE 'pg2%' ORDER BY datname DESC LIMIT 10;"
 ```
 期望结果：
 - 总数据库数：303 个（300 个业务数据库 + postgres + template0 + template1）
 - 数据库命名：pg001, pg002, ..., pg300
 - 数据库所有者：fei
 ### 3. 测试数据库连接
 ```bash
 # 方法1：直接在 Pod 内执行 SQL
 kubectl exec -n infrastructure -l app=pg16 -- psql -U fei -d pg001 -c "SELECT current_database(), version();"
 # 方法2：进入 Pod 交互式操作
 kubectl exec -it -n infrastructure -l app=pg16 -- bash
 # 在 Pod 内执行
 psql -U fei -d pg001
 # 退出
 \q
 exit
 ```
 ## 🔌 连接数据库
 ### 集群内部连接
 从集群内其他 Pod 连接：
 ```
 主机: pg16.infrastructure.svc.cluster.local
 端口: 5432
 用户: fei
 密码: feiks..
 数据库: pg001 ~ pg300
 ```
 连接字符串示例：
 ```
 postgresql://fei:feiks..@pg16.infrastructure.svc.cluster.local:5432/pg001
 ```
 ### 集群外部连接
 #### 方法1：使用 NodePort（推荐）
 ```bash
 # 获取节点 IP
 kubectl get nodes -o wide
 # 使用 NodePort 连接
 psql -h <节点IP> -U fei -d pg001 -p 30432
 ```
 连接信息：
 - 主机：节点 IP 地址
 - 端口：30432
 - 用户：fei
 - 密码：feiks..
 #### 方法2：使用 Port Forward
 ```bash
 # 转发端口到本地
 kubectl port-forward -n infrastructure svc/pg16 5432:5432
 # 在另一个终端连接
 psql -h localhost -U fei -d pg001 -p 5432
 ```
 ## 📊 数据库信息
 ### 默认配置
 - **PostgreSQL 版本**: 16
 - **命名空间**: infrastructure
 - **数据库数量**: 300 个（pg001 ~ pg300）
 - **超级用户**: fei（密码：feiks..）
 - **系统用户**: postgres（密码：adminks..）
 - **持久化存储**: 20Gi（使用 K3s 默认 local-path StorageClass）
 ### 资源配置
 - **CPU 请求**: 500m
 - **CPU 限制**: 2000m
 - **内存请求**: 512Mi
 - **内存限制**: 2Gi
 ### 服务端口
 - **ClusterIP 服务**: pg16（端口 5432）
 - **NodePort 服务**: pg16-nodeport（端口 30432）
 ## 🔧 常用操作
 ### 查看日志
 ```bash
 # 查看最近 50 行日志
 kubectl logs -n infrastructure -l app=pg16 --tail=50
 # 实时查看日志
 kubectl logs -n infrastructure -l app=pg16 -f
 # 查看上一次容器的日志（如果 Pod 重启过）
 kubectl logs -n infrastructure -l app=pg16 --previous
 ```
 ### 进入容器
 ```bash
 # 进入 PostgreSQL 容器
 kubectl exec -it -n infrastructure -l app=pg16 -- bash
 # 直接进入 psql
 kubectl exec -it -n infrastructure -l app=pg16 -- psql -U postgres
 ```
 ### 重启 Pod
 ```bash
 # 删除 Pod（Deployment 会自动重建）
 kubectl delete pod -n infrastructure -l app=pg16
 # 或者重启 Deployment
 kubectl rollout restart deployment pg16 -n infrastructure
 ```
 ### 扩缩容（不推荐用于数据库）
 ```bash
 # 查看当前副本数
 kubectl get deployment pg16 -n infrastructure
 # 注意：PostgreSQL 不支持多副本，保持 replicas=1
 ```
 ## 🗑️ 卸载
 ### 删除部署（保留数据）
 ```bash
 # 删除 Deployment 和 Service
 kubectl delete deployment pg16 -n infrastructure
 kubectl delete svc pg16 pg16-nodeport -n infrastructure
 # PVC 和数据会保留
 ```
 ### 完全卸载（包括数据）
 ```bash
 # 删除所有资源
 kubectl delete -f k8s/
 # 或者逐个删除
 kubectl delete deployment pg16 -n infrastructure
 kubectl delete svc pg16 pg16-nodeport -n infrastructure
 kubectl delete pvc pg16-data -n infrastructure
 kubectl delete configmap pg16-init-script -n infrastructure
 kubectl delete secret pg16-secret -n infrastructure
 kubectl delete namespace infrastructure
 ```
 **⚠️ 警告**: 删除 PVC 会永久删除所有数据库数据，无法恢复！
 ## 🔐 安全建议
 ### 修改默认密码
 部署后建议立即修改默认密码：
 ```bash
 # 进入 Pod
 kubectl exec -it -n infrastructure -l app=pg16 -- psql -U postgres
 # 修改 fei 用户密码
 ALTER USER fei WITH PASSWORD '新密码';
 # 修改 postgres 用户密码
 ALTER USER postgres WITH PASSWORD '新密码';
 # 退出
 \q
 ```
 然后更新 Secret：
 ```bash
 # 编辑 secret.yaml，修改密码（需要 base64 编码）
 echo -n "新密码" | base64
 # 更新 Secret
 kubectl apply -f k8s/secret.yaml
 ```
 ### 网络安全
 - 默认配置使用 NodePort 30432 暴露服务
 - 生产环境建议：
  - 使用防火墙限制访问 IP
  - 或者删除 NodePort 服务，仅使用集群内部访问
  - 配置 NetworkPolicy 限制访问
 ```bash
 # 删除 NodePort 服务（仅保留集群内访问）
 kubectl delete svc pg16-nodeport -n infrastructure
 ```
 ## 🐛 故障排查
 ### Pod 无法启动
 ```bash
 # 查看 Pod 状态
 kubectl describe pod -n infrastructure -l app=pg16
 # 查看事件
 kubectl get events -n infrastructure --sort-by='.lastTimestamp'
 # 查看日志
 kubectl logs -n infrastructure -l app=pg16
 ```
 常见问题：
 - **ImagePullBackOff**: 无法拉取镜像，检查网络连接
 - **CrashLoopBackOff**: 容器启动失败，查看日志
 - **Pending**: PVC 无法绑定，检查存储类
 ### PVC 无法绑定
 ```bash
 # 查看 PVC 状态
 kubectl describe pvc pg16-data -n infrastructure
 # 查看 StorageClass
 kubectl get storageclass
 # 检查 local-path-provisioner
 kubectl get pods -n kube-system | grep local-path
 ```
 ### 数据库连接失败
 ```bash
 # 检查服务是否正常
 kubectl get svc -n infrastructure
 # 检查 Pod 是否就绪
 kubectl get pods -n infrastructure
 # 测试集群内连接
 kubectl run -it --rm debug --image=postgres:16 --restart=Never -- psql -h pg16.infrastructure.svc.cluster.local -U fei -d pg001
 ```
 ### 初始化脚本未执行
 如果发现数据库未创建 300 个数据库：
 ```bash
 # 查看初始化日志
 kubectl logs -n infrastructure -l app=pg16 | grep -i "init\|create database"
 # 检查 ConfigMap 是否正确挂载
 kubectl exec -n infrastructure -l app=pg16 -- ls -la /docker-entrypoint-initdb.d/
 # 查看脚本内容
 kubectl exec -n infrastructure -l app=pg16 -- cat /docker-entrypoint-initdb.d/01-init.sh
 ```
 **注意**: PostgreSQL 初始化脚本只在首次启动且数据目录为空时执行。如果需要重新初始化：
 ```bash
 # 删除 Deployment 和 PVC
 kubectl delete deployment pg16 -n infrastructure
 kubectl delete pvc pg16-data -n infrastructure
 # 重新部署
 kubectl apply -f k8s/
 ```
 ## 📝 备份与恢复
 ### 备份单个数据库
 ```bash
 # 备份 pg001 数据库
 kubectl exec -n infrastructure -l app=pg16 -- pg_dump -U fei pg001 > pg001_backup.sql
 # 备份所有数据库
 kubectl exec -n infrastructure -l app=pg16 -- pg_dumpall -U postgres > all_databases_backup.sql
 ```
 ### 恢复数据库
 ```bash
 # 恢复单个数据库
 cat pg001_backup.sql | kubectl exec -i -n infrastructure -l app=pg16 -- psql -U fei pg001
 # 恢复所有数据库
 cat all_databases_backup.sql | kubectl exec -i -n infrastructure -l app=pg16 -- psql -U postgres
 ```
 ### 数据持久化
 数据存储在 K3s 的 local-path 存储中，默认路径：
 ```
 /var/lib/rancher/k3s/storage/pvc-<uuid>_infrastructure_pg16-data/
 ```
 ## 📚 更多信息
 - PostgreSQL 官方文档: https://www.postgresql.org/docs/16/
 - K3s 官方文档: https://docs.k3s.io/
 - Kubernetes 官方文档: https://kubernetes.io/docs/
 ## 🆘 获取帮助
 如有问题，请检查：
 1. Pod 日志: `kubectl logs -n infrastructure -l app=pg16`
 2. Pod 状态: `kubectl describe pod -n infrastructure -l app=pg16`
 3. 事件记录: `kubectl get events -n infrastructure`
 ---
 **版本信息**
 - PostgreSQL: 16
 - 创建日期: 2026-01-29
 - 最后更新: 2026-01-29
--- a/009-基础设施/001-pg16/k8s/README.md
+++ b/009-基础设施/001-pg16/k8s/README.md
@@ -0,0 +1,112 @@
 # PostgreSQL 16 K3s 部署配置
 ## 文件说明
 - `namespace.yaml` - 创建 infrastructure 命名空间
 - `secret.yaml` - 存储 PostgreSQL 密码等敏感信息
 - `configmap.yaml` - 存储初始化脚本（创建用户和 300 个数据库）
 - `pvc.yaml` - 持久化存储声明（20Gi）
 - `deployment.yaml` - PostgreSQL 16 部署配置
 - `service.yaml` - 服务暴露（ClusterIP + NodePort）
 ## 部署步骤
 ### 1. 部署所有资源
 ```bash
 kubectl apply -f namespace.yaml
 kubectl apply -f secret.yaml
 kubectl apply -f configmap.yaml
 kubectl apply -f pvc.yaml
 kubectl apply -f deployment.yaml
 kubectl apply -f service.yaml
 ```
 或者一次性部署：
 ```bash
 kubectl apply -f .
 ```
 ### 2. 查看部署状态
 ```bash
 # 查看 Pod 状态
 kubectl get pods -n infrastructure
 # 查看 Pod 日志
 kubectl logs -n infrastructure -l app=pg16 -f
 # 查看服务
 kubectl get svc -n infrastructure
 ```
 ### 3. 访问数据库
 **集群内访问：**
 ```bash
 # 使用 ClusterIP 服务
 psql -h pg16.infrastructure.svc.cluster.local -U postgres -p 5432
 ```
 **集群外访问：**
 ```bash
 # 使用 NodePort（端口 30432）
 psql -h <节点IP> -U postgres -p 30432
 ```
 **使用 kubectl port-forward：**
 ```bash
 kubectl port-forward -n infrastructure svc/pg16 5432:5432
 psql -h localhost -U postgres -p 5432
 ```
 ## 配置说明
 ### 存储
 - 使用 k3s 默认的 `local-path` StorageClass
 - 默认申请 20Gi 存储空间
 - 数据存储在 `/var/lib/postgresql/data/pgdata`
 ### 资源限制
 - 请求：512Mi 内存，0.5 核 CPU
 - 限制：2Gi 内存，2 核 CPU
 ### 初始化
 - 自动创建超级用户 `fei`
 - 自动创建 300 个数据库（pg001 到 pg300）
 ### 服务暴露
 - **ClusterIP 服务**：集群内部访问，服务名 `pg16`
 - **NodePort 服务**：集群外部访问，端口 `30432`
 ## 数据迁移
 ### 从现有 Docker 数据迁移
 如果你有现有的 pgdata 数据，可以：
 1. 先部署不带数据的 PostgreSQL
 2. 停止 Pod
 3. 将数据复制到 PVC 对应的主机路径
 4. 重启 Pod
 ```bash
 # 查找 PVC 对应的主机路径
 kubectl get pv
 # 停止 Pod
 kubectl scale deployment pg16 -n infrastructure --replicas=0
 # 复制数据到主机路径（通常在 /var/lib/rancher/k3s/storage/）
 # 然后重启
 kubectl scale deployment pg16 -n infrastructure --replicas=1
 ```
 ## 卸载
 ```bash
 kubectl delete -f .
 ```
 注意：删除 PVC 会删除所有数据，请谨慎操作。
--- a/009-基础设施/001-pg16/k8s/configmap.yaml
+++ b/009-基础设施/001-pg16/k8s/configmap.yaml
@@ -0,0 +1,19 @@
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: pg16-init-script
  namespace: infrastructure
 data:
  01-init.sh: |
    #!/bin/bash
    set -e
    # 创建超级用户 fei
    psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
        CREATE USER fei WITH SUPERUSER PASSWORD 'feiks..';
    EOSQL
    # 创建 300 个数据库
    for i in $(seq -w 1 300); do
        psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" -c "CREATE DATABASE pg${i} OWNER fei;"
    done
--- a/009-基础设施/001-pg16/k8s/deployment.yaml
+++ b/009-基础设施/001-pg16/k8s/deployment.yaml
@@ -0,0 +1,76 @@
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: pg16
  namespace: infrastructure
  labels:
    app: pg16
 spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: pg16
  template:
    metadata:
      labels:
        app: pg16
    spec:
      containers:
      - name: postgres
        image: postgres:16
        ports:
        - containerPort: 5432
          name: postgres
        env:
        - name: POSTGRES_USER
          valueFrom:
            secretKeyRef:
              name: pg16-secret
              key: POSTGRES_USER
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: pg16-secret
              key: POSTGRES_PASSWORD
        - name: PGDATA
          value: /var/lib/postgresql/data/pgdata
        volumeMounts:
        - name: postgres-data
          mountPath: /var/lib/postgresql/data
        - name: init-scripts
          mountPath: /docker-entrypoint-initdb.d
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "2000m"
        livenessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - postgres
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
        readinessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - postgres
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 3
      volumes:
      - name: postgres-data
        persistentVolumeClaim:
          claimName: pg16-data
      - name: init-scripts
        configMap:
          name: pg16-init-script
          defaultMode: 0755
--- a/009-基础设施/001-pg16/k8s/namespace.yaml
+++ b/009-基础设施/001-pg16/k8s/namespace.yaml
@@ -0,0 +1,4 @@
 apiVersion: v1
 kind: Namespace
 metadata:
  name: infrastructure
--- a/009-基础设施/001-pg16/k8s/pvc.yaml
+++ b/009-基础设施/001-pg16/k8s/pvc.yaml
@@ -0,0 +1,12 @@
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: pg16-data
  namespace: infrastructure
 spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  storageClassName: local-path
--- a/009-基础设施/001-pg16/k8s/secret.yaml
+++ b/009-基础设施/001-pg16/k8s/secret.yaml
@@ -0,0 +1,10 @@
 apiVersion: v1
 kind: Secret
 metadata:
  name: pg16-secret
  namespace: infrastructure
 type: Opaque
 stringData:
  POSTGRES_PASSWORD: "adminks.."
  POSTGRES_USER: "postgres"
  FEI_PASSWORD: "feiks.."
--- a/009-基础设施/001-pg16/k8s/service.yaml
+++ b/009-基础设施/001-pg16/k8s/service.yaml
@@ -0,0 +1,34 @@
 apiVersion: v1
 kind: Service
 metadata:
  name: pg16
  namespace: infrastructure
  labels:
    app: pg16
 spec:
  type: ClusterIP
  ports:
  - port: 5432
    targetPort: 5432
    protocol: TCP
    name: postgres
  selector:
    app: pg16
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: pg16-nodeport
  namespace: infrastructure
  labels:
    app: pg16
 spec:
  type: NodePort
  ports:
  - port: 5432
    targetPort: 5432
    nodePort: 30432
    protocol: TCP
    name: postgres
  selector:
    app: pg16
--- a/009-基础设施/002-s3/README.md
+++ b/009-基础设施/002-s3/README.md
@@ -0,0 +1,131 @@
 # MinIO S3 对象存储部署
 ## 功能特性
 - ✅ MinIO 对象存储服务
 - ✅ 自动 SSL 证书（通过 Caddy）
 - ✅ 自动设置新存储桶为公开只读权限
 - ✅ Web 管理控制台
 - ✅ S3 兼容 API
 ## 部署前准备
 ### 1. 修改配置
 编辑 `minio.yaml`，替换以下内容：
 **域名配置（3 处）：**
 - `s3.u6.net3w.com` → 你的 S3 API 域名
 - `console.s3.u6.net3w.com` → 你的控制台域名
 **凭证配置（4 处）：**
 - `MINIO_ROOT_USER: "admin"` → 你的管理员账号
 - `MINIO_ROOT_PASSWORD: "adminks.."` → 你的管理员密码（建议至少 8 位）
 **架构配置（1 处）：**
 - `linux-arm64` → 根据你的 CPU 架构选择：
  - ARM64: `linux-arm64`
  - x86_64: `linux-amd64`
 ### 2. 配置 DNS
 将域名解析到你的服务器 IP：
 ```
 s3.yourdomain.com        A    your-server-ip
 console.s3.yourdomain.com A    your-server-ip
 ```
 ### 3. 配置 Caddy
 在 Caddy 配置中添加（如果使用 Caddy 做 SSL）：
 ```
 s3.yourdomain.com {
    reverse_proxy traefik.kube-system.svc.cluster.local:80
 }
 console.s3.yourdomain.com {
    reverse_proxy traefik.kube-system.svc.cluster.local:80
 }
 ```
 ## 部署步骤
 ```bash
 # 1. 部署 MinIO
 kubectl apply -f minio.yaml
 # 2. 检查部署状态
 kubectl get pods -n minio
 # 3. 查看日志
 kubectl logs -n minio -l app=minio -c minio
 kubectl logs -n minio -l app=minio -c policy-manager
 ```
 ## 访问服务
 - **Web 控制台**: https://console.s3.yourdomain.com
 - **S3 API 端点**: https://s3.yourdomain.com
 - **登录凭证**: 使用你配置的 MINIO_ROOT_USER 和 MINIO_ROOT_PASSWORD
 ## 自动权限策略
 新创建的存储桶会在 30 秒内自动设置为 **公开只读（download）** 权限：
 - ✅ 任何人可以下载文件（无需认证）
 - ✅ 上传/删除需要认证
 如需保持某个桶为私有，在控制台手动改回 PRIVATE 即可。
 ## 存储配置
 默认使用 50Gi 存储空间，修改方法：
 编辑 `minio.yaml` 中的 PersistentVolumeClaim：
 ```yaml
 resources:
  requests:
    storage: 50Gi  # 修改为你需要的大小
 ```
 ## 故障排查
 ### Pod 无法启动
 ```bash
 kubectl describe pod -n minio <pod-name>
 ```
 ### 查看详细日志
 ```bash
 # MinIO 主容器
 kubectl logs -n minio <pod-name> -c minio
 # 策略管理器
 kubectl logs -n minio <pod-name> -c policy-manager
 ```
 ### 检查 Ingress
 ```bash
 kubectl get ingress -n minio
 ```
 ## 架构说明
 ```
 用户 HTTPS 请求
    ↓
 Caddy (SSL 终止)
    ↓ HTTP
 Traefik (路由)
    ↓
 MinIO Service
    ├─ MinIO 容器 (9000: API, 9001: Console)
    └─ Policy Manager 容器 (自动设置桶权限)
 ```
 ## 卸载
 ```bash
 kubectl delete -f minio.yaml
 ```
 注意：这会删除所有数据，请先备份重要文件。
--- a/009-基础设施/002-s3/minio.yaml
+++ b/009-基础设施/002-s3/minio.yaml
@@ -0,0 +1,169 @@
 apiVersion: v1
 kind: Namespace
 metadata:
  name: minio
 ---
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: minio-data
  namespace: minio
 spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: local-path
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: minio
  namespace: minio
 spec:
  replicas: 1
  selector:
    matchLabels:
      app: minio
  template:
    metadata:
      labels:
        app: minio
    spec:
      containers:
      - name: minio
        image: minio/minio:latest
        command:
        - /bin/sh
        - -c
        - minio server /data --console-address ":9001"
        ports:
        - containerPort: 9000
          name: api
        - containerPort: 9001
          name: console
        env:
        - name: MINIO_ROOT_USER
          value: "admin"
        - name: MINIO_ROOT_PASSWORD
          value: "adminks.."
        - name: MINIO_SERVER_URL
          value: "https://s3.u6.net3w.com"
        - name: MINIO_BROWSER_REDIRECT_URL
          value: "https://console.s3.u6.net3w.com"
        volumeMounts:
        - name: data
          mountPath: /data
        livenessProbe:
          httpGet:
            path: /minio/health/live
            port: 9000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /minio/health/ready
            port: 9000
          initialDelaySeconds: 10
          periodSeconds: 5
      - name: policy-manager
        image: alpine:latest
        command:
        - /bin/sh
        - -c
        - |
          # 安装 MinIO Client
          wget https://dl.min.io/client/mc/release/linux-arm64/mc -O /usr/local/bin/mc
          chmod +x /usr/local/bin/mc
          # 等待 MinIO 启动
          sleep 10
          # 配置 mc 客户端
          mc alias set myminio http://localhost:9000 ${MINIO_ROOT_USER} ${MINIO_ROOT_PASSWORD}
          echo "Policy manager started. Monitoring buckets..."
          # 持续监控并设置新桶的策略
          while true; do
            # 获取所有存储桶
            mc ls myminio 2>/dev/null | awk '{print $NF}' | sed 's/\///' | while read -r BUCKET; do
              if [ -n "$BUCKET" ]; then
                # 检查当前策略
                POLICY_OUTPUT=$(mc anonymous get myminio/${BUCKET} 2>&1)
                # 如果是私有的（包含 "Access permission for" 且不包含 "download"）
                if echo "$POLICY_OUTPUT" | grep -q "Access permission for" && ! echo "$POLICY_OUTPUT" | grep -q "download"; then
                  echo "Setting download policy for bucket: ${BUCKET}"
                  mc anonymous set download myminio/${BUCKET}
                fi
              fi
            done
            sleep 30
          done
        env:
        - name: MINIO_ROOT_USER
          value: "admin"
        - name: MINIO_ROOT_PASSWORD
          value: "adminks.."
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: minio-data
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: minio
  namespace: minio
 spec:
  type: ClusterIP
  ports:
  - port: 9000
    targetPort: 9000
    name: api
  - port: 9001
    targetPort: 9001
    name: console
  selector:
    app: minio
 ---
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: minio-api
  namespace: minio
 spec:
  ingressClassName: traefik
  rules:
  - host: s3.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: minio
            port:
              number: 9000
 ---
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: minio-console
  namespace: minio
 spec:
  ingressClassName: traefik
  rules:
  - host: console.s3.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: minio
            port:
              number: 9001
--- a/009-基础设施/002-s3/note.md
+++ b/009-基础设施/002-s3/note.md
@@ -0,0 +1,65 @@
 1. MinIO 配置已更新 
 - MINIO_SERVER_URL: https://s3.u6.net3w.com 
 - MINIO_BROWSER_REDIRECT_URL: https://console.s3.u6.net3w.com 
 2. MinIO 日志确认 
 API: https://s3.u6.net3w.com
 WebUI: https://console.s3.u6.net3w.com
 3. 访问测试通过 
 - https://s3.u6.net3w.com - 正确重定向到 Console（不再显示端口号）
 - https://console.s3.u6.net3w.com - 直接访问 Console
 访问方式：
 - Web 控制台: https://console.s3.u6.net3w.com 
 - S3 API 端点: https://s3.u6.net3w.com
 - 登录凭证: 
 - 用户名: admin 
 - 密码: adminks.. 
 配置文件位置：
 - MinIO: /home/fei/k3s/009-基础设施/002-s3/minio.yaml 
 - Caddy: /home/fei/k3s/caddy-ssl-deployment.yaml
 所有服务已正常运行，可以开始使用 MinIO 对象存储了！ 
 不如固定桶名？
 在对象存储的使用习惯中，“频繁创建桶”其实是一种反模式（Anti-pattern）。
 建议方案：创建一个名为 public-assets 的固定桶，并将其设为 download 权限。
 管理方式：不同的业务或用户，通过 文件夹（Prefix） 来区分，例如 public-assets/user1/，public-assets/user2/。
 结果：你只需要在部署 Pod 时设置一次权限，之后所有上传到该桶下的文件自动就能下载。
 配置完成： ─
 从日志可以看到：
 Policy manager started. Monitoring buckets... 
 Setting download policy for bucket: test
 Access permission for `myminio/test` is set to `download` 
 功能说明：
 1. 自动检测 - 每 30 秒扫描一次所有存储桶
 2. 自动设置 - 发现私有（PRIVATE）存储桶时，自动设置为 download（公开只读）
 3. 已处理 - 你的 test 存储桶已自动设置为公开只读权限
 权限效果：
 - ✅ 任何人都可以下载文件（无需认证） 
 - ✅ 上传/删除需要认证（Access Key + Secret Key） 
 测试方法：
 # 现在可以直接访问 test 桶中的文件（无需认证）
 curl https://s3.u6.net3w.com/test/your-file.txt 
 后续使用：
 - 新创建的任何存储桶都会在 30 秒内自动设置为公开只读
 - 如果需要保持某个桶为私有，可以在 Web 控制台手动改回 PRIVATE 
 - Policy manager 只会处理完全私有的桶，不会覆盖已设置的策略 
 配置已生效，可以继续使用了！
--- a/009-基础设施/002-s3/set-bucket-policy.sh
+++ b/009-基础设施/002-s3/set-bucket-policy.sh
@@ -0,0 +1,25 @@
 #!/bin/sh
 # 自动为新创建的存储桶设置 download（公开只读）策略
 # 配置 mc 客户端
 mc alias set myminio http://localhost:9000 ${MINIO_ROOT_USER} ${MINIO_ROOT_PASSWORD}
 # 持续监控并设置新桶的策略
 while true; do
  # 获取所有存储桶
  BUCKETS=$(mc ls myminio 2>/dev/null | awk '{print $NF}' | sed 's/\///')
  for BUCKET in $BUCKETS; do
    # 检查当前策略
    CURRENT_POLICY=$(mc anonymous get myminio/${BUCKET} 2>/dev/null | grep -o "download\|upload\|public" || echo "none")
    # 如果策略为 none（私有），则设置为 download
    if [ "$CURRENT_POLICY" = "none" ]; then
      echo "Setting download policy for bucket: ${BUCKET}"
      mc anonymous set download myminio/${BUCKET}
    fi
  done
  # 每 30 秒检查一次
  sleep 30
 done
--- a/009-基础设施/003-helm/install_helm.sh
+++ b/009-基础设施/003-helm/install_helm.sh
@@ -0,0 +1,4 @@
 # 写入以下内容
 curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
 # 记录 K3s 环境变量
 echo "export KUBECONFIG=/etc/rancher/k3s/k3s.yaml" >> ~/.bashrc
--- a/009-基础设施/003-helm/readme.md
+++ b/009-基础设施/003-helm/readme.md
--- a/009-基础设施/004-longhorn/backup-config.yaml
+++ b/009-基础设施/004-longhorn/backup-config.yaml
@@ -0,0 +1,8 @@
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: longhorn-backup-config
  namespace: longhorn-system
 data:
  backup-target: "s3://longhorn-backup@us-east-1/"
  backup-target-credential-secret: "longhorn-crypto"
--- a/009-基础设施/004-longhorn/deploy.sh
+++ b/009-基础设施/004-longhorn/deploy.sh
@@ -0,0 +1,10 @@
 # 1. 创建命名空间
 kubectl create namespace longhorn-system
 # 2. 应用 S3 密钥
 kubectl apply -f s3-secret.yaml
 # 3. 使用本地 values.yaml 安装远程仓库的应用
 helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  -f values.yaml
--- a/009-基础设施/004-longhorn/readme.md
+++ b/009-基础设施/004-longhorn/readme.md
@@ -0,0 +1,130 @@
 Installation Summary: 
 ✅ Namespace created: longhorn-system 
 ✅ S3 Secret created: longhorn-crypto with MinIO credentials
 ✅ Helm deployment: Longhorn v1 deployed successfully 
 ✅ Components starting: 
 - longhorn-manager: Running (2/2) 
 - longhorn-ui: Running (2 replicas) 
 - longhorn-driver-deployer: Initializing
 S3 Configuration Verified:
 - AWS_ACCESS_KEY_ID: admin ✓
 - AWS_SECRET_ACCESS_KEY: adminks.. ✓
 - AWS_ENDPOINTS: https://s3.u6.net3w.com ✓
 Next Steps: 
 The Longhorn components are still initializing. It will take a few minutes for all components (CSI drivers, Engine Images, Instance Managers) to be fully ready.
 You can monitor the progress with:
 kubectl get pods -n longhorn-system -w
 这是一个非常棒的习惯！在生产实践中，这被称为 **IaC (基础设施即代码)**。即使我们使用 Helm 远程安装，也应该在本地保留一份“配置清单”，以便日后维护、迁移或重建集群。
 为了满足你的需求，建议在 `/home/fei/k3s/009-基础设施/` 下创建两个标准化的文件夹。
 ---
 ## 1. 推荐的目录结构
 你可以执行以下命令创建目录：
 ```bash
 mkdir -p /home/fei/k3s/009-基础设施/003-helm
 mkdir -p /home/fei/k3s/009-基础设施/004-longhorn
 ```
 结构如下：
 * **003-helm**: 存放 Helm 的安装脚本、环境变量配置。
 * **004-longhorn**: 存放 Longhorn 的自定义配置文件（`values.yaml`）和 S3 密钥配置。
 ---
 ## 2. 文件夹内应该放什么内容？
 ### 文件夹：003-helm
 由于 Helm 是一个工具，这个文件夹主要存放**安装记录**和**环境配置**。
 * **创建文件：`install_helm.sh**` (记录你是怎么装的)
 ```bash
 # 写入以下内容
 curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
 # 记录 K3s 环境变量
 echo "export KUBECONFIG=/etc/rancher/k3s/k3s.yaml" >> ~/.bashrc
 ```
 * **创建文件：`readme.md**`: 写明当前 Helm 的版本和用途。
 ---
 ### 文件夹：004-longhorn
 这是核心。虽然我们用远程仓库安装，但我们要把**配置参数**固化在本地。
 **第一步：导出远程默认配置**
 在文件夹下执行：
 ```bash
 helm repo add longhorn https://charts.longhorn.io
 helm repo update
 helm show values longhorn/longhorn > values.yaml
 ```
 现在，你拥有了一个几百行的 `values.yaml`，这就是 Longhorn 的“灵魂”。
 **第二步：创建 S3 密钥文件 `s3-secret.yaml**`
 因为你要对接 S3，你需要把秘钥写成 K8s 的 Secret：
 ```yaml
 apiVersion: v1
 kind: Secret
 metadata:
 name: longhorn-crypto
 namespace: longhorn-system
 type: Opaque
 stringData:
 AWS_ACCESS_KEY_ID: "你的ID"
 AWS_SECRET_ACCESS_KEY: "你的密钥"
 AWS_ENDPOINTS: "你的S3连接地址"
 ```
 **第三步：创建一键安装脚本 `deploy.sh**`
 ```bash
 # 1. 创建命名空间
 kubectl create namespace longhorn-system
 # 2. 应用 S3 密钥
 kubectl apply -f s3-secret.yaml
 # 3. 使用本地 values.yaml 安装远程仓库的应用
 helm install longhorn longhorn/longhorn \
 --namespace longhorn-system \
 -f values.yaml
 ```
 ---
 ## 3. 这样做的好处
 1. **版本可控**：以后你想改 S3 地址或副本数，只需要改 `/004-longhorn/values.yaml`，然后执行 `helm upgrade`。
 2. **一目了然**：当你打开 `009-基础设施` 文件夹，你能清楚看到你的 S3、Helm 和 Longhorn 是如何相互关联的。
 3. **快速重建**：如果哪天服务器系统重装了，你只需要进入这些文件夹，按顺序执行 `deploy.sh`，几分钟内整个基础设施就能起死回生。
 ---
 ## 4. 特别提示：Redis 放在哪？
 按照你的逻辑，你应该在 `009-基础设施` 之外（或者创建一个 `010-中间件` 文件夹）专门放 Redis 的 YAML。因为 Redis 是属于**应用层**，而 Longhorn 属于**存储层**。
 **接下来，需要我帮你把上述的 `values.yaml` 中关于 S3 备份的关键配置项找出来，并告诉你如何修改吗？**
--- a/009-基础设施/004-longhorn/s3-secret.yaml
+++ b/009-基础设施/004-longhorn/s3-secret.yaml
@@ -0,0 +1,10 @@
 apiVersion: v1
 kind: Secret
 metadata:
  name: longhorn-crypto
  namespace: longhorn-system
 type: Opaque
 stringData:
  AWS_ACCESS_KEY_ID: "admin"
  AWS_SECRET_ACCESS_KEY: "adminks.."
  AWS_ENDPOINTS: "https://s3.u6.net3w.com"
--- a/009-基础设施/004-longhorn/values.yaml
+++ b/009-基础设施/004-longhorn/values.yaml
@@ -0,0 +1,660 @@
 # Default values for longhorn.
 # This is a YAML-formatted file.
 # Declare variables to be passed into your templates.
 global:
  # -- Global override for container image registry.
  imageRegistry: "docker.io"
  # -- Global override for image pull secrets for container registry.
  imagePullSecrets: []
  # -- Set container timezone (TZ env) for all Longhorn workloads. Leave empty to use container default.
  timezone: ""
  # -- Toleration for nodes allowed to run user-deployed components such as Longhorn Manager, Longhorn UI, and Longhorn Driver Deployer.
  tolerations: []
  # -- Node selector for nodes allowed to run user-deployed components such as Longhorn Manager, Longhorn UI, and Longhorn Driver Deployer.
  nodeSelector: {}
  cattle:
    # -- Default system registry.
    systemDefaultRegistry: ""
    windowsCluster:
      # -- Setting that allows Longhorn to run on a Rancher Windows cluster.
      enabled: false
      # -- Toleration for Linux nodes that can run user-deployed Longhorn components.
      tolerations:
        - key: "cattle.io/os"
          value: "linux"
          effect: "NoSchedule"
          operator: "Equal"
      # -- Node selector for Linux nodes that can run user-deployed Longhorn components.
      nodeSelector:
        kubernetes.io/os: "linux"
      defaultSetting:
        # -- Toleration for system-managed Longhorn components.
        taintToleration: cattle.io/os=linux:NoSchedule
        # -- Node selector for system-managed Longhorn components.
        systemManagedComponentsNodeSelector: kubernetes.io/os:linux
 networkPolicies:
  # -- Setting that allows you to enable network policies that control access to Longhorn pods.
  enabled: false
  # -- Distribution that determines the policy for allowing access for an ingress. (Options: "k3s", "rke2", "rke1")
  type: "k3s"
 image:
  longhorn:
    engine:
      # -- Registry for the Longhorn Engine image.
      registry: ""
      # -- Repository for the Longhorn Engine image.
      repository: longhornio/longhorn-engine
      # -- Tag for the Longhorn Engine image.
      tag: v1.11.0
    manager:
      # -- Registry for the Longhorn Manager image.
      registry: ""
      # -- Repository for the Longhorn Manager image.
      repository: longhornio/longhorn-manager
      # -- Tag for the Longhorn Manager image.
      tag: v1.11.0
    ui:
      # -- Registry for the Longhorn UI image.
      registry: ""
      # -- Repository for the Longhorn UI image.
      repository: longhornio/longhorn-ui
      # -- Tag for the Longhorn UI image.
      tag: v1.11.0
    instanceManager:
      # -- Registry for the Longhorn Instance Manager image.
      registry: ""
      # -- Repository for the Longhorn Instance Manager image.
      repository: longhornio/longhorn-instance-manager
      # -- Tag for the Longhorn Instance Manager image.
      tag: v1.11.0
    shareManager:
      # -- Registry for the Longhorn Share Manager image.
      registry: ""
      # -- Repository for the Longhorn Share Manager image.
      repository: longhornio/longhorn-share-manager
      # -- Tag for the Longhorn Share Manager image.
      tag: v1.11.0
    backingImageManager:
      # -- Registry for the Backing Image Manager image. When unspecified, Longhorn uses the default value.
      registry: ""
      # -- Repository for the Backing Image Manager image. When unspecified, Longhorn uses the default value.
      repository: longhornio/backing-image-manager
      # -- Tag for the Backing Image Manager image. When unspecified, Longhorn uses the default value.
      tag: v1.11.0
    supportBundleKit:
      # -- Registry for the Longhorn Support Bundle Manager image.
      registry: ""
      # -- Repository for the Longhorn Support Bundle Manager image.
      repository: longhornio/support-bundle-kit
      # -- Tag for the Longhorn Support Bundle Manager image.
      tag: v0.0.79
  csi:
    attacher:
      # -- Registry for the CSI attacher image. When unspecified, Longhorn uses the default value.
      registry: ""
      # -- Repository for the CSI attacher image. When unspecified, Longhorn uses the default value.
      repository: longhornio/csi-attacher
      # -- Tag for the CSI attacher image. When unspecified, Longhorn uses the default value.
      tag: v4.10.0-20251226
    provisioner:
      # -- Registry for the CSI Provisioner image. When unspecified, Longhorn uses the default value.
      registry: ""
      # -- Repository for the CSI Provisioner image. When unspecified, Longhorn uses the default value.
      repository: longhornio/csi-provisioner
      # -- Tag for the CSI Provisioner image. When unspecified, Longhorn uses the default value.
      tag: v5.3.0-20251226
    nodeDriverRegistrar:
      # -- Registry for the CSI Node Driver Registrar image. When unspecified, Longhorn uses the default value.
      registry: ""
      # -- Repository for the CSI Node Driver Registrar image. When unspecified, Longhorn uses the default value.
      repository: longhornio/csi-node-driver-registrar
      # -- Tag for the CSI Node Driver Registrar image. When unspecified, Longhorn uses the default value.
      tag: v2.15.0-20251226
    resizer:
      # -- Registry for the CSI Resizer image. When unspecified, Longhorn uses the default value.
      registry: ""
      # -- Repository for the CSI Resizer image. When unspecified, Longhorn uses the default value.
      repository: longhornio/csi-resizer
      # -- Tag for the CSI Resizer image. When unspecified, Longhorn uses the default value.
      tag: v2.0.0-20251226
    snapshotter:
      # -- Registry for the CSI Snapshotter image. When unspecified, Longhorn uses the default value.
      registry: ""
      # -- Repository for the CSI Snapshotter image. When unspecified, Longhorn uses the default value.
      repository: longhornio/csi-snapshotter
      # -- Tag for the CSI Snapshotter image. When unspecified, Longhorn uses the default value.
      tag: v8.4.0-20251226
    livenessProbe:
      # -- Registry for the CSI liveness probe image. When unspecified, Longhorn uses the default value.
      registry: ""
      # -- Repository for the CSI liveness probe image. When unspecified, Longhorn uses the default value.
      repository: longhornio/livenessprobe
      # -- Tag for the CSI liveness probe image. When unspecified, Longhorn uses the default value.
      tag: v2.17.0-20251226
  openshift:
    oauthProxy:
      # -- Registry for the OAuth Proxy image. Specify the upstream image (for example, "quay.io/openshift/origin-oauth-proxy"). This setting applies only to OpenShift users.
      registry: ""
      # -- Repository for the OAuth Proxy image. Specify the upstream image (for example, "quay.io/openshift/origin-oauth-proxy"). This setting applies only to OpenShift users.
      repository: ""
      # -- Tag for the OAuth Proxy image. Specify OCP/OKD version 4.1 or later (including version 4.18, which is available at quay.io/openshift/origin-oauth-proxy:4.18). This setting applies only to OpenShift users.
      tag: ""
  # -- Image pull policy that applies to all user-deployed Longhorn components, such as Longhorn Manager, Longhorn driver, and Longhorn UI.
  pullPolicy: IfNotPresent
 service:
  ui:
    # -- Service type for Longhorn UI. (Options: "ClusterIP", "NodePort", "LoadBalancer", "Rancher-Proxy")
    type: ClusterIP
    # -- NodePort port number for Longhorn UI. When unspecified, Longhorn selects a free port between 30000 and 32767.
    nodePort: null
    # -- Class of a load balancer implementation
    loadBalancerClass: ""
    # -- Annotation for the Longhorn UI service.
    annotations: {}
    ## If you want to set annotations for the Longhorn UI service, delete the `{}` in the line above
    ## and uncomment this example block
    #  annotation-key1: "annotation-value1"
    #  annotation-key2: "annotation-value2"
    labels: {}
    ## If you want to set additional labels for the Longhorn UI service, delete the `{}` in the line above
    ## and uncomment this example block
    #  label-key1: "label-value1"
    #  label-key2: "label-value2"
  manager:
    # -- Service type for Longhorn Manager.
    type: ClusterIP
    # -- NodePort port number for Longhorn Manager. When unspecified, Longhorn selects a free port between 30000 and 32767.
    nodePort: ""
 persistence:
  # -- Setting that allows you to specify the default Longhorn StorageClass.
  defaultClass: true
  # -- Filesystem type of the default Longhorn StorageClass.
  defaultFsType: ext4
  # -- mkfs parameters of the default Longhorn StorageClass.
  defaultMkfsParams: ""
  # -- Replica count of the default Longhorn StorageClass.
  defaultClassReplicaCount: 3
  # -- Data locality of the default Longhorn StorageClass. (Options: "disabled", "best-effort")
  defaultDataLocality: disabled
  # -- Reclaim policy that provides instructions for handling of a volume after its claim is released. (Options: "Retain", "Delete")
  reclaimPolicy: Delete
  # -- VolumeBindingMode controls when volume binding and dynamic provisioning should occur. (Options: "Immediate", "WaitForFirstConsumer") (Defaults to "Immediate")
  volumeBindingMode: "Immediate"
  # -- Setting that allows you to enable live migration of a Longhorn volume from one node to another.
  migratable: false
  # -- Setting that disables the revision counter and thereby prevents Longhorn from tracking all write operations to a volume. When salvaging a volume, Longhorn uses properties of the volume-head-xxx.img file (the last file size and the last time the file was modified) to select the replica to be used for volume recovery.
  disableRevisionCounter: "true"
  # -- Set NFS mount options for Longhorn StorageClass for RWX volumes
  nfsOptions: ""
  recurringJobSelector:
    # -- Setting that allows you to enable the recurring job selector for a Longhorn StorageClass.
    enable: false
    # -- Recurring job selector for a Longhorn StorageClass. Ensure that quotes are used correctly when specifying job parameters. (Example: `[{"name":"backup", "isGroup":true}]`)
    jobList: []
  backingImage:
    # -- Setting that allows you to use a backing image in a Longhorn StorageClass.
    enable: false
    # -- Backing image to be used for creating and restoring volumes in a Longhorn StorageClass. When no backing images are available, specify the data source type and parameters that Longhorn can use to create a backing image.
    name: ~
    # -- Data source type of a backing image used in a Longhorn StorageClass.
    # If the backing image exists in the cluster, Longhorn uses this setting to verify the image.
    # If the backing image does not exist, Longhorn creates one using the specified data source type.
    dataSourceType: ~
    # -- Data source parameters of a backing image used in a Longhorn StorageClass.
    # You can specify a JSON string of a map. (Example: `'{\"url\":\"https://backing-image-example.s3-region.amazonaws.com/test-backing-image\"}'`)
    dataSourceParameters: ~
    # -- Expected SHA-512 checksum of a backing image used in a Longhorn StorageClass.
    expectedChecksum: ~
  defaultDiskSelector:
    # -- Setting that allows you to enable the disk selector for the default Longhorn StorageClass.
    enable: false
    # -- Disk selector for the default Longhorn StorageClass. Longhorn uses only disks with the specified tags for storing volume data. (Examples: "nvme,sata")
    selector: ""
  defaultNodeSelector:
    # -- Setting that allows you to enable the node selector for the default Longhorn StorageClass.
    enable: false
    # -- Node selector for the default Longhorn StorageClass. Longhorn uses only nodes with the specified tags for storing volume data. (Examples: "storage,fast")
    selector: ""
  # -- Setting that allows you to enable automatic snapshot removal during filesystem trim for a Longhorn StorageClass. (Options: "ignored", "enabled", "disabled")
  unmapMarkSnapChainRemoved: ignored
  # -- Setting that allows you to specify the data engine version for the default Longhorn StorageClass. (Options: "v1", "v2")
  dataEngine: v1
  # -- Setting that allows you to specify the backup target for the default Longhorn StorageClass.
  backupTargetName: default
 preUpgradeChecker:
  # -- Setting that allows Longhorn to perform pre-upgrade checks. Disable this setting when installing Longhorn using Argo CD or other GitOps solutions.
  jobEnabled: true
  # -- Setting that allows Longhorn to perform upgrade version checks after starting the Longhorn Manager DaemonSet Pods. Disabling this setting also disables `preUpgradeChecker.jobEnabled`. Longhorn recommends keeping this setting enabled.
  upgradeVersionCheck: true
 csi:
  # -- kubelet root directory. When unspecified, Longhorn uses the default value.
  kubeletRootDir: ~
  # -- Configures Pod anti-affinity to prevent multiple instances on the same node. Use soft (tries to separate) or hard (must separate). When unspecified, Longhorn uses the default value ("soft").
  podAntiAffinityPreset: ~
  # -- Replica count of the CSI Attacher. When unspecified, Longhorn uses the default value ("3").
  attacherReplicaCount: ~
  # -- Replica count of the CSI Provisioner. When unspecified, Longhorn uses the default value ("3").
  provisionerReplicaCount: ~
  # -- Replica count of the CSI Resizer. When unspecified, Longhorn uses the default value ("3").
  resizerReplicaCount: ~
  # -- Replica count of the CSI Snapshotter. When unspecified, Longhorn uses the default value ("3").
  snapshotterReplicaCount: ~
 defaultSettings:
  # -- Setting that allows Longhorn to automatically attach a volume and create snapshots or backups when recurring jobs are run.
  allowRecurringJobWhileVolumeDetached: ~
  # -- Setting that allows Longhorn to automatically create a default disk only on nodes with the label "node.longhorn.io/create-default-disk=true" (if no other disks exist). When this setting is disabled, Longhorn creates a default disk on each node that is added to the cluster.
  createDefaultDiskLabeledNodes: ~
  # -- Default path to use for storing data on a host. An absolute directory path indicates a filesystem-type disk used by the V1 Data Engine, while a path to a block device indicates a block-type disk used by the V2 Data Engine. The default value is "/var/lib/longhorn/".
  defaultDataPath: ~
  # -- Default data locality. A Longhorn volume has data locality if a local replica of the volume exists on the same node as the pod that is using the volume.
  defaultDataLocality: ~
  # -- Setting that allows scheduling on nodes with healthy replicas of the same volume. This setting is disabled by default.
  replicaSoftAntiAffinity: ~
  # -- Setting that automatically rebalances replicas when an available node is discovered.
  replicaAutoBalance: ~
  # -- Percentage of storage that can be allocated relative to hard drive capacity. The default value is "100".
  storageOverProvisioningPercentage: ~
  # -- Percentage of minimum available disk capacity. When the minimum available capacity exceeds the total available capacity, the disk becomes unschedulable until more space is made available for use. The default value is "25".
  storageMinimalAvailablePercentage: ~
  # -- Percentage of disk space that is not allocated to the default disk on each new Longhorn node.
  storageReservedPercentageForDefaultDisk: ~
  # -- Upgrade Checker that periodically checks for new Longhorn versions. When a new version is available, a notification appears on the Longhorn UI. This setting is enabled by default
  upgradeChecker: ~
  # -- The Upgrade Responder sends a notification whenever a new Longhorn version that you can upgrade to becomes available. The default value is https://longhorn-upgrade-responder.rancher.io/v1/checkupgrade.
  upgradeResponderURL: ~
  # -- The external URL used to access the Longhorn Manager API. When set, this URL is returned in API responses (the actions and links fields) instead of the internal pod IP. This is useful when accessing the API through Ingress or Gateway API HTTPRoute. Format: scheme://host[:port] (for example, https://longhorn.example.com or https://longhorn.example.com:8443). Leave it empty to use the default behavior.
  managerUrl: ~
  # -- Default number of replicas for volumes created using the Longhorn UI. For Kubernetes configuration, modify the `numberOfReplicas` field in the StorageClass. The default value is "{"v1":"3","v2":"3"}".
  defaultReplicaCount: ~
  # -- Default name of Longhorn static StorageClass. "storageClassName" is assigned to PVs and PVCs that are created for an existing Longhorn volume. "storageClassName" can also be used as a label, so it is possible to use a Longhorn StorageClass to bind a workload to an existing PV without creating a Kubernetes StorageClass object. "storageClassName" needs to be an existing StorageClass. The default value is "longhorn-static".
  defaultLonghornStaticStorageClass: ~
  # -- Number of minutes that Longhorn keeps a failed backup resource. When the value is "0", automatic deletion is disabled.
  failedBackupTTL: ~
  # -- Number of minutes that Longhorn allows for the backup execution. The default value is "1".
  backupExecutionTimeout: ~
  # -- Setting that restores recurring jobs from a backup volume on a backup target and creates recurring jobs if none exist during backup restoration.
  restoreVolumeRecurringJobs: ~
  # -- Maximum number of successful recurring backup and snapshot jobs to be retained. When the value is "0", a history of successful recurring jobs is not retained.
  recurringSuccessfulJobsHistoryLimit: ~
  # -- Maximum number of failed recurring backup and snapshot jobs to be retained. When the value is "0", a history of failed recurring jobs is not retained.
  recurringFailedJobsHistoryLimit: ~
  # -- Maximum number of snapshots or backups to be retained.
  recurringJobMaxRetention: ~
  # -- Maximum number of failed support bundles that can exist in the cluster. When the value is "0", Longhorn automatically purges all failed support bundles.
  supportBundleFailedHistoryLimit: ~
  # -- Taint or toleration for system-managed Longhorn components.
  # Specify values using a semicolon-separated list in `kubectl taint` syntax (Example: key1=value1:effect; key2=value2:effect).
  taintToleration: ~
  # -- Node selector for system-managed Longhorn components.
  systemManagedComponentsNodeSelector: ~
  # -- Resource limits for system-managed CSI components.
  # This setting allows you to configure CPU and memory requests/limits for CSI attacher, provisioner, resizer, snapshotter, and plugin components.
  # Supported components: csi-attacher, csi-provisioner, csi-resizer, csi-snapshotter, longhorn-csi-plugin, node-driver-registrar, longhorn-liveness-probe.
  # Notice that changing resource limits will cause CSI components to restart, which may temporarily affect volume provisioning and attach/detach operations until the components are ready. The value should be a JSON object with component names as keys and ResourceRequirements as values.
  systemManagedCSIComponentsResourceLimits: ~
  # -- PriorityClass for system-managed Longhorn components.
  # This setting can help prevent Longhorn components from being evicted under Node Pressure.
  # Notice that this will be applied to Longhorn user-deployed components by default if there are no priority class values set yet, such as `longhornManager.priorityClass`.
  priorityClass: &defaultPriorityClassNameRef "longhorn-critical"
  # -- Setting that allows Longhorn to automatically salvage volumes when all replicas become faulty (for example, when the network connection is interrupted). Longhorn determines which replicas are usable and then uses these replicas for the volume. This setting is enabled by default.
  autoSalvage: ~
  # -- Setting that allows Longhorn to automatically delete a workload pod that is managed by a controller (for example, daemonset) whenever a Longhorn volume is detached unexpectedly (for example, during Kubernetes upgrades). After deletion, the controller restarts the pod and then Kubernetes handles volume reattachment and remounting.
  autoDeletePodWhenVolumeDetachedUnexpectedly: ~
  # -- Blacklist of controller api/kind values for the setting Automatically Delete Workload Pod when the Volume Is Detached Unexpectedly. If a workload pod is managed by a controller whose api/kind is listed in this blacklist, Longhorn will not automatically delete the pod when its volume is unexpectedly detached. Multiple controller api/kind entries can be specified, separated by semicolons. For example: `apps/StatefulSet;apps/DaemonSet`. Note that the controller api/kind is case sensitive and must exactly match the api/kind in the workload pod's owner reference.
  blacklistForAutoDeletePodWhenVolumeDetachedUnexpectedly: ~
  # -- Setting that prevents Longhorn Manager from scheduling replicas on a cordoned Kubernetes node. This setting is enabled by default.
  disableSchedulingOnCordonedNode: ~
  # -- Setting that allows Longhorn to schedule new replicas of a volume to nodes in the same zone as existing healthy replicas. Nodes that do not belong to any zone are treated as existing in the zone that contains healthy replicas. When identifying zones, Longhorn relies on the label "topology.kubernetes.io/zone=<Zone name of the node>" in the Kubernetes node object.
  replicaZoneSoftAntiAffinity: ~
  # -- Setting that allows scheduling on disks with existing healthy replicas of the same volume. This setting is enabled by default.
  replicaDiskSoftAntiAffinity: ~
  # -- Policy that defines the action Longhorn takes when a volume is stuck with a StatefulSet or Deployment pod on a node that failed.
  nodeDownPodDeletionPolicy: ~
  # -- Policy that defines the action Longhorn takes when a node with the last healthy replica of a volume is drained.
  nodeDrainPolicy: ~
  # -- Setting that allows automatic detaching of manually-attached volumes when a node is cordoned.
  detachManuallyAttachedVolumesWhenCordoned: ~
  # -- Number of seconds that Longhorn waits before reusing existing data on a failed replica instead of creating a new replica of a degraded volume.
  replicaReplenishmentWaitInterval: ~
  # -- Maximum number of replicas that can be concurrently rebuilt on each node.
  concurrentReplicaRebuildPerNodeLimit: ~
  # -- Maximum number of file synchronization operations that can run concurrently during a single replica rebuild. Right now, it's for v1 data engine only.
  rebuildConcurrentSyncLimit: ~
  # -- Maximum number of volumes that can be concurrently restored on each node using a backup. When the value is "0", restoration of volumes using a backup is disabled.
  concurrentVolumeBackupRestorePerNodeLimit: ~
  # -- Setting that disables the revision counter and thereby prevents Longhorn from tracking all write operations to a volume. When salvaging a volume, Longhorn uses properties of the "volume-head-xxx.img" file (the last file size and the last time the file was modified) to select the replica to be used for volume recovery. This setting applies only to volumes created using the Longhorn UI.
  disableRevisionCounter: '{"v1":"true"}'
  # -- Image pull policy for system-managed pods, such as Instance Manager, engine images, and CSI Driver. Changes to the image pull policy are applied only after the system-managed pods restart.
  systemManagedPodsImagePullPolicy: ~
  # -- Setting that allows you to create and attach a volume without having all replicas scheduled at the time of creation.
  allowVolumeCreationWithDegradedAvailability: ~
  # -- Setting that allows Longhorn to automatically clean up the system-generated snapshot after replica rebuilding is completed.
  autoCleanupSystemGeneratedSnapshot: ~
  # -- Setting that allows Longhorn to automatically clean up the snapshot generated by a recurring backup job.
  autoCleanupRecurringJobBackupSnapshot: ~
  # -- Maximum number of engines that are allowed to concurrently upgrade on each node after Longhorn Manager is upgraded. When the value is "0", Longhorn does not automatically upgrade volume engines to the new default engine image version.
  concurrentAutomaticEngineUpgradePerNodeLimit: ~
  # -- Number of minutes that Longhorn waits before cleaning up the backing image file when no replicas in the disk are using it.
  backingImageCleanupWaitInterval: ~
  # -- Number of seconds that Longhorn waits before downloading a backing image file again when the status of all image disk files changes to "failed" or "unknown".
  backingImageRecoveryWaitInterval: ~
  # -- Percentage of the total allocatable CPU resources on each node to be reserved for each instance manager pod. The default value is {"v1":"12","v2":"12"}.
  guaranteedInstanceManagerCPU: ~
  # -- Setting that notifies Longhorn that the cluster is using the Kubernetes Cluster Autoscaler.
  kubernetesClusterAutoscalerEnabled: ~
  # -- Enables Longhorn to automatically delete orphaned resources and their associated data or processes (e.g., stale replicas). Orphaned resources on failed or unknown nodes are not automatically cleaned up.
  # You need to specify the resource types to be deleted using a semicolon-separated list (e.g., `replica-data;instance`). Available items are: `replica-data`, `instance`.
  orphanResourceAutoDeletion: ~
  # -- Specifies the wait time, in seconds, before Longhorn automatically deletes an orphaned Custom Resource (CR) and its associated resources.
  # Note that if a user manually deletes an orphaned CR, the deletion occurs immediately and does not respect this grace period.
  orphanResourceAutoDeletionGracePeriod: ~
  # -- Storage network for in-cluster traffic. When unspecified, Longhorn uses the Kubernetes cluster network.
  storageNetwork: ~
  # -- Specifies a dedicated network for mounting RWX (ReadWriteMany) volumes. Leave this blank to use the default Kubernetes cluster network. **Caution**: This setting should change after all RWX volumes are detached because some Longhorn component pods must be recreated to apply the setting. You cannot modify this setting while RWX volumes are still attached.
  endpointNetworkForRWXVolume: ~
  # -- Flag that prevents accidental uninstallation of Longhorn.
  deletingConfirmationFlag: ~
  # -- Timeout between the Longhorn Engine and replicas. Specify a value between "8" and "30" seconds. The default value is "8".
  engineReplicaTimeout: ~
  # -- Setting that allows you to enable and disable snapshot hashing and data integrity checks.
  snapshotDataIntegrity: ~
  # -- Setting that allows disabling of snapshot hashing after snapshot creation to minimize impact on system performance.
  snapshotDataIntegrityImmediateCheckAfterSnapshotCreation: ~
  # -- Setting that defines when Longhorn checks the integrity of data in snapshot disk files. You must use the Unix cron expression format.
  snapshotDataIntegrityCronjob: ~
  # -- Setting that controls how many snapshot heavy task operations (such as purge and clone) can run concurrently per node. This is a best-effort mechanism: due to the distributed nature of the system, temporary oversubscription may occur. The limiter reduces worst-case overload but does not guarantee perfect enforcement.
  snapshotHeavyTaskConcurrentLimit: ~
  # -- Setting that allows Longhorn to automatically mark the latest snapshot and its parent files as removed during a filesystem trim. Longhorn does not remove snapshots containing multiple child files.
  removeSnapshotsDuringFilesystemTrim: ~
  # -- Setting that allows fast rebuilding of replicas using the checksum of snapshot disk files. Before enabling this setting, you must set the snapshot-data-integrity value to "enable" or "fast-check".
  fastReplicaRebuildEnabled: ~
  # -- Number of seconds that an HTTP client waits for a response from a File Sync server before considering the connection to have failed.
  replicaFileSyncHttpClientTimeout: ~
  # -- Number of seconds that Longhorn allows for the completion of replica rebuilding and snapshot cloning operations.
  longGRPCTimeOut: ~
  # -- Log levels that indicate the type and severity of logs in Longhorn Manager. The default value is "Info". (Options: "Panic", "Fatal", "Error", "Warn", "Info", "Debug", "Trace")
  logLevel: ~
  # -- Specifies the directory on the host where Longhorn stores log files for the instance manager pod. Currently, it is only used for instance manager pods in the v2 data engine.
  logPath: ~
  # -- Setting that allows you to specify a backup compression method.
  backupCompressionMethod: ~
  # -- Maximum number of worker threads that can concurrently run for each backup.
  backupConcurrentLimit: ~
  # -- Specifies the default backup block size, in MiB, used when creating a new volume. Supported values are 2 or 16.
  defaultBackupBlockSize: ~
  # -- Maximum number of worker threads that can concurrently run for each restore operation.
  restoreConcurrentLimit: ~
  # -- Setting that allows you to enable the V1 Data Engine.
  v1DataEngine: ~
  # -- Setting that allows you to enable the V2 Data Engine, which is based on the Storage Performance Development Kit (SPDK). The V2 Data Engine is an experimental feature and should not be used in production environments.
  v2DataEngine: ~
  # -- Applies only to the V2 Data Engine. Enables hugepages for the Storage Performance Development Kit (SPDK) target daemon. If disabled, legacy memory is used. Allocation size is set via the Data Engine Memory Size setting.
  dataEngineHugepageEnabled: ~
  # -- Applies only to the V2 Data Engine. Specifies the hugepage size, in MiB, for the Storage Performance Development Kit (SPDK) target daemon. The default value is "{"v2":"2048"}"
  dataEngineMemorySize: ~
  # -- Applies only to the V2 Data Engine. Specifies the CPU cores on which the Storage Performance Development Kit (SPDK) target daemon runs. The daemon is deployed in each Instance Manager pod. Ensure that the number of assigned cores does not exceed the guaranteed Instance Manager CPUs for the V2 Data Engine. The default value is "{"v2":"0x1"}".
  dataEngineCPUMask: ~
  # -- This setting specifies the default write bandwidth limit (in megabytes per second) for volume replica rebuilding when using the v2 data engine (SPDK). If this value is set to 0, there will be no write bandwidth limitation. Individual volumes can override this setting by specifying their own rebuilding bandwidth limit.
  replicaRebuildingBandwidthLimit: ~
  # -- This setting specifies the default depth of each queue for Ublk frontend. This setting applies to volumes using the V2 Data Engine with Ublk front end. Individual volumes can override this setting by specifying their own Ublk queue depth.
  defaultUblkQueueDepth: ~
  # -- This setting specifies the default the number of queues for ublk frontend. This setting applies to volumes using the V2 Data Engine with Ublk front end. Individual volumes can override this setting by specifying their own number of queues for ublk.
  defaultUblkNumberOfQueue: ~
  # -- In seconds. The setting specifies the timeout for the instance manager pod liveness probe. The default value is 10 seconds.
  instanceManagerPodLivenessProbeTimeout: ~
  # -- Setting that allows scheduling of empty node selector volumes to any node.
  allowEmptyNodeSelectorVolume: ~
  # -- Setting that allows scheduling of empty disk selector volumes to any disk.
  allowEmptyDiskSelectorVolume: ~
  # -- Setting that allows Longhorn to periodically collect anonymous usage data for product improvement purposes. Longhorn sends collected data to the [Upgrade Responder](https://github.com/longhorn/upgrade-responder) server, which is the data source of the Longhorn Public Metrics Dashboard (https://metrics.longhorn.io). The Upgrade Responder server does not store data that can be used to identify clients, including IP addresses.
  allowCollectingLonghornUsageMetrics: ~
  # -- Setting that temporarily prevents all attempts to purge volume snapshots.
  disableSnapshotPurge: ~
  # -- Maximum snapshot count for a volume. The value should be between 2 to 250
  snapshotMaxCount: ~
  # -- Applies only to the V2 Data Engine. Specifies the log level for the Storage Performance Development Kit (SPDK) target daemon. Supported values are: Error, Warning, Notice, Info, and Debug. The default is Notice.
  dataEngineLogLevel: ~
  # -- Applies only to the V2 Data Engine. Specifies the log flags for the Storage Performance Development Kit (SPDK) target daemon.
  dataEngineLogFlags: ~
  # -- Setting that freezes the filesystem on the root partition before a snapshot is created.
  freezeFilesystemForSnapshot: ~
  # -- Setting that automatically cleans up the snapshot when the backup is deleted.
  autoCleanupSnapshotWhenDeleteBackup: ~
  # -- Setting that automatically cleans up the snapshot after the on-demand backup is completed.
  autoCleanupSnapshotAfterOnDemandBackupCompleted: ~
  # -- Setting that allows Longhorn to detect node failure and immediately migrate affected RWX volumes.
  rwxVolumeFastFailover: ~
  # -- Enables automatic rebuilding of degraded replicas while the volume is detached. This setting only takes effect if the individual volume setting is set to `ignored` or `enabled`.
  offlineReplicaRebuilding: ~
  # -- Controls whether Longhorn monitors and records health information for node disks. When disabled, disk health checks and status updates are skipped.
  nodeDiskHealthMonitoring: ~
 # -- Setting that allows you to update the default backupstore.
 defaultBackupStore:
  # -- Endpoint used to access the default backupstore. (Options: "NFS", "CIFS", "AWS", "GCP", "AZURE")
  backupTarget: "s3://longhorn-backup@us-east-1/"
  # -- Name of the Kubernetes secret associated with the default backup target.
  backupTargetCredentialSecret: "longhorn-crypto"
  # -- Number of seconds that Longhorn waits before checking the default backupstore for new backups. The default value is "300". When the value is "0", polling is disabled.
  pollInterval: 300
 privateRegistry:
  # -- Set to `true` to automatically create a new private registry secret.
  createSecret: ~
  # -- URL of a private registry. When unspecified, Longhorn uses the default system registry.
  registryUrl: ~
  # -- User account used for authenticating with a private registry.
  registryUser: ~
  # -- Password for authenticating with a private registry.
  registryPasswd: ~
  # -- If create a new private registry secret is true, create a Kubernetes secret with this name; else use the existing secret of this name. Use it to pull images from your private registry.
  registrySecret: ~
 longhornManager:
  log:
    # -- Format of Longhorn Manager logs. (Options: "plain", "json")
    format: plain
  # -- PriorityClass for Longhorn Manager.
  priorityClass: *defaultPriorityClassNameRef
  # -- Toleration for Longhorn Manager on nodes allowed to run Longhorn components.
  tolerations: []
  ## If you want to set tolerations for Longhorn Manager DaemonSet, delete the `[]` in the line above
  ## and uncomment this example block
  # - key: "key"
  #   operator: "Equal"
  #   value: "value"
  #   effect: "NoSchedule"
  # -- Resource requests and limits for Longhorn Manager pods.
  resources: ~
  # -- Node selector for Longhorn Manager. Specify the nodes allowed to run Longhorn Manager.
  nodeSelector: {}
  ## If you want to set node selector for Longhorn Manager DaemonSet, delete the `{}` in the line above
  ## and uncomment this example block
  #  label-key1: "label-value1"
  #  label-key2: "label-value2"
  # -- Annotation for the Longhorn Manager service.
  serviceAnnotations: {}
  ## If you want to set annotations for the Longhorn Manager service, delete the `{}` in the line above
  ## and uncomment this example block
  #  annotation-key1: "annotation-value1"
  #  annotation-key2: "annotation-value2"
  serviceLabels: {}
  ## If you want to set labels for the Longhorn Manager service, delete the `{}` in the line above
  ## and uncomment this example block
  #  label-key1: "label-value1"
  #  label-key2: "label-value2"
  ## DaemonSet update strategy. Default "100% unavailable" matches the upgrade
  ## flow (old managers removed before new start); override for rolling updates
  ## if you prefer that behavior.
  updateStrategy:
    rollingUpdate:
      maxUnavailable: "100%"
 longhornDriver:
  log:
    # -- Format of longhorn-driver logs. (Options: "plain", "json")
    format: plain
  # -- PriorityClass for Longhorn Driver.
  priorityClass: *defaultPriorityClassNameRef
  # -- Toleration for Longhorn Driver on nodes allowed to run Longhorn components.
  tolerations: []
  ## If you want to set tolerations for Longhorn Driver Deployer Deployment, delete the `[]` in the line above
  ## and uncomment this example block
  # - key: "key"
  #   operator: "Equal"
  #   value: "value"
  #   effect: "NoSchedule"
  # -- Node selector for Longhorn Driver. Specify the nodes allowed to run Longhorn Driver.
  nodeSelector: {}
  ## If you want to set node selector for Longhorn Driver Deployer Deployment, delete the `{}` in the line above
  ## and uncomment this example block
  #  label-key1: "label-value1"
  #  label-key2: "label-value2"
 longhornUI:
  # -- Replica count for Longhorn UI.
  replicas: 2
  # -- PriorityClass for Longhorn UI.
  priorityClass: *defaultPriorityClassNameRef
  # -- Affinity for Longhorn UI pods. Specify the affinity you want to use for Longhorn UI.
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          podAffinityTerm:
            labelSelector:
              matchExpressions:
                - key: app
                  operator: In
                  values:
                    - longhorn-ui
            topologyKey: kubernetes.io/hostname
  # -- Toleration for Longhorn UI on nodes allowed to run Longhorn components.
  tolerations: []
  ## If you want to set tolerations for Longhorn UI Deployment, delete the `[]` in the line above
  ## and uncomment this example block
  # - key: "key"
  #   operator: "Equal"
  #   value: "value"
  #   effect: "NoSchedule"
  # -- Node selector for Longhorn UI. Specify the nodes allowed to run Longhorn UI.
  nodeSelector: {}
  ## If you want to set node selector for Longhorn UI Deployment, delete the `{}` in the line above
  ## and uncomment this example block
  #  label-key1: "label-value1"
  #  label-key2: "label-value2"
 ingress:
  # -- Setting that allows Longhorn to generate ingress records for the Longhorn UI service.
  enabled: false
  # -- IngressClass resource that contains ingress configuration, including the name of the Ingress controller.
  # ingressClassName can replace the kubernetes.io/ingress.class annotation used in earlier Kubernetes releases.
  ingressClassName: ~
  # -- Hostname of the Layer 7 load balancer.
  host: sslip.io
  # -- Extra hostnames for TLS (Subject Alternative Names - SAN). Used when you need multiple FQDNs for the same ingress.
  # Example:
  # extraHosts:
  #   - longhorn.example.com
  #   - longhorn-ui.internal.local
  extraHosts: []
  # -- Setting that allows you to enable TLS on ingress records.
  tls: false
  # -- Setting that allows you to enable secure connections to the Longhorn UI service via port 443.
  secureBackends: false
  # -- TLS secret that contains the private key and certificate to be used for TLS. This setting applies only when TLS is enabled on ingress records.
  tlsSecret: longhorn.local-tls
  # -- Default ingress path. You can access the Longhorn UI by following the full ingress path {{host}}+{{path}}.
  path: /
  # -- Ingress path type. To maintain backward compatibility, the default value is "ImplementationSpecific".
  pathType: ImplementationSpecific
  ## If you're using kube-lego, you will want to add:
  ## kubernetes.io/tls-acme: true
  ##
  ## For a full list of possible ingress annotations, please see
  ## ref: https://github.com/kubernetes/ingress-nginx/blob/master/docs/annotations.md
  ##
  ## If tls is set to true, annotation ingress.kubernetes.io/secure-backends: "true" will automatically be set
  # -- Ingress annotations in the form of key-value pairs.
  annotations:
  #  kubernetes.io/ingress.class: nginx
  #  kubernetes.io/tls-acme: true
  # -- Secret that contains a TLS private key and certificate. Use secrets if you want to use your own certificates to secure ingresses.
  secrets:
  ## If you're providing your own certificates, please use this to add the certificates as secrets
  ## key and certificate should start with -----BEGIN CERTIFICATE----- or
  ## -----BEGIN RSA PRIVATE KEY-----
  ##
  ## name should line up with a tlsSecret set further up
  ## If you're using kube-lego, this is unneeded, as it will create the secret for you if it is not set
  ##
  ## It is also possible to create and manage the certificates outside of this helm chart
  ## Please see README.md for more information
  # - name: longhorn.local-tls
  #   key:
  #   certificate:
 httproute:
  # -- Setting that allows Longhorn to generate HTTPRoute records for the Longhorn UI service using Gateway API.
  enabled: false
  # -- Gateway references for HTTPRoute. Specify which Gateway(s) should handle this route.
  parentRefs: []
  ## Example:
  # - name: gateway-name
  #   namespace: gateway-namespace
  #   # Optional fields with defaults:
  #   # group: gateway.networking.k8s.io  # default
  #   # kind: Gateway                      # default
  #   # sectionName: https                 # optional, targets a specific listener
  # -- List of hostnames for the HTTPRoute. Multiple hostnames are supported.
  hostnames: []
  ## Example:
  # - longhorn.example.com
  # - longhorn.example.org
  # -- Default path for HTTPRoute. You can access the Longhorn UI by following the full path.
  path: /
  # -- Path match type for HTTPRoute. (Options: "Exact", "PathPrefix")
  pathType: PathPrefix
  # -- Annotations for the HTTPRoute resource in the form of key-value pairs.
  annotations: {}
  ## Example:
  #  annotation-key1: "annotation-value1"
 # -- Setting that allows you to enable pod security policies (PSPs) that allow privileged Longhorn pods to start. This setting applies only to clusters running Kubernetes 1.25 and earlier, and with the built-in Pod Security admission controller enabled.
 enablePSP: false
 # -- Specify override namespace, specifically this is useful for using longhorn as sub-chart and its release namespace is not the `longhorn-system`.
 namespaceOverride: ""
 # -- Annotation for the Longhorn Manager DaemonSet pods. This setting is optional.
 annotations: {}
 serviceAccount:
  # -- Annotations to add to the service account
  annotations: {}
 metrics:
  serviceMonitor:
    # -- Setting that allows the creation of a Prometheus ServiceMonitor resource for Longhorn Manager components.
    enabled: false
    # -- Additional labels for the Prometheus ServiceMonitor resource.
    additionalLabels: {}
    # -- Annotations for the Prometheus ServiceMonitor resource.
    annotations: {}
    # -- Interval at which Prometheus scrapes the metrics from the target.
    interval: ""
    # -- Timeout after which Prometheus considers the scrape to be failed.
    scrapeTimeout: ""
    # -- Configures the relabeling rules to apply the target’s metadata labels. See the [Prometheus Operator
    # documentation](https://prometheus-operator.dev/docs/api-reference/api/#monitoring.coreos.com/v1.Endpoint) for
    # formatting details.
    relabelings: []
    # -- Configures the relabeling rules to apply to the samples before ingestion. See the [Prometheus Operator
    # documentation](https://prometheus-operator.dev/docs/api-reference/api/#monitoring.coreos.com/v1.Endpoint) for
    # formatting details.
    metricRelabelings: []
 ## openshift settings
 openshift:
  # -- Setting that allows Longhorn to integrate with OpenShift.
  enabled: false
  ui:
    # -- Route for connections between Longhorn and the OpenShift web console.
    route: "longhorn-ui"
    # -- Port for accessing the OpenShift web console.
    port: 443
    # -- Port for proxy that provides access to the OpenShift web console.
    proxy: 8443
 # -- Setting that allows Longhorn to generate code coverage profiles.
 enableGoCoverDir: false
 # -- Add extra objects manifests
 extraObjects: []
--- a/009-基础设施/004-longhorn/说明.md
+++ b/009-基础设施/004-longhorn/说明.md
@@ -0,0 +1,60 @@
 这是一个非常敏锐的问题。既然 Longhorn 已经通过多副本（Replica）实现了节点级的高可用，为什么还要“多此一举”去对接 S3 呢？
 简单来说：**副本是“容灾”，而 S3 是“避难所”。**
 以下是 Longhorn 对接 S3 的核心理由：
 ---
 ## 1. 应对“全集群覆灭”风险
 Longhorn 的默认副本（比如你设定的 3 副本）通常都分布在同一个 K3s 集群的物理节点上。
 * **如果不接 S3：** 如果发生极端情况（如机房断电导致文件系统损坏、人为误删整个 `longhorn-system` 命名空间、或者由于某种 Bug 导致数据卷损坏），你的数据就彻底丢了。
 * **对接 S3 后：** S3 是**独立于 K3s 集群之外**的存储。即使你把整个 K3s 集群删掉再重装，只要 S3 里的备份还在，你就能在新集群里瞬间恢复 Redis 或 PG16 的数据。
 ---
 ## 2. 异地备份与低成本归档
 Longhorn 将数据存在高性能的 SSD 上，价格较贵。而 S3（尤其是自建的 MinIO 或云端 S3）适合存放不经常访问的长久数据。
 * **快照（Snapshot）**：存在本地，方便快速回滚。
 * **备份（Backup）**：传到 S3。
 你可以配置策略：每小时做一个快照，每天将快照同步到 S3 一次。这符合数据备份的 **3-2-1 原则**（3个副本，2种介质，1个异地）。
 ---
 ## 3. 跨集群迁移与灾备 (DR)
 如果你想把业务从“开发环境”迁移到“生产环境”，或者在另一个城市搭建一个备份集群：
 1. 在 A 集群中，将 Longhorn 卷备份到 S3。
 2. 在 B 集群中，连接同一个 S3。
 3. 点击 **Restore**，几分钟内，B 集群就长出了一个一模一样的 Redis 卷。
 Longhorn 甚至支持 **DR Volume（灾备卷）**，它会持续从 S3 拉取更新，处于“热备”状态。
 ---
 ## 4. 节省本地磁盘空间
 Longhorn 的快照（Snapshot）如果积攒多了，会占用大量的本地磁盘空间。
 对接 S3 后，你可以将旧的备份保留在 S3 中，而定期清理本地磁盘。S3 的存储空间几乎是无限的。
 ---
 ## 5. 你的 Redis 场景如何受益？
 对于你的 Redis 独立 Pod：
 1. **崩溃恢复：** 如果你的 Redis AOF 文件因为某种原因在本地损坏了，你可以从 S3 找回昨天下午 4 点的备份。
 2. **版本回滚：** 如果由于代码错误导致 Redis 里的数据被污染了，你可以通过 S3 的备份快速“穿越”回事故发生前。
 ---
 ### 配置建议
 在你的 `/home/fei/k3s/009-基础设施/004-longhorn/values.yaml` 中，你会看到 `defaultSetting` 下有 `backupTarget`。你应该将其指向你的 S3 桶地址，例如：
 `s3://longhorn-backup@us-east-1/`。
 **由于你已经有了 S3 服务，这等于是“免费”的数据保险。你需要我提供在 Longhorn 中配置 S3 的具体参数格式吗？**
--- a/009-基础设施/005-ingress/Caddyfile
+++ b/009-基础设施/005-ingress/Caddyfile
@@ -0,0 +1,43 @@
 {
    email admin@u6.net3w.com
 }
 # 示例域名配置
 test.u6.net3w.com {
    reverse_proxy traefik.kube-system.svc.cluster.local:80
 }
 # MinIO S3 API
 s3.u6.net3w.com {
    reverse_proxy traefik.kube-system.svc.cluster.local:80
 }
 # MinIO Console
 console.s3.u6.net3w.com {
    reverse_proxy traefik.kube-system.svc.cluster.local:80
 }
 # Longhorn 存储管理
 longhorn.u6.net3w.com {
    reverse_proxy traefik.kube-system.svc.cluster.local:80
 }
 # Grafana 监控仪表板
 grafana.u6.net3w.com {
    reverse_proxy traefik.kube-system.svc.cluster.local:80
 }
 # Prometheus 监控
 prometheus.u6.net3w.com {
    reverse_proxy traefik.kube-system.svc.cluster.local:80
 }
 # Alertmanager 告警管理
 alertmanager.u6.net3w.com {
    reverse_proxy traefik.kube-system.svc.cluster.local:80
 }
 # 导航页面
 dh.u6.net3w.com {
    reverse_proxy traefik.kube-system.svc.cluster.local:80
 }
--- a/009-基础设施/005-ingress/deploy-longhorn-ingress.sh
+++ b/009-基础设施/005-ingress/deploy-longhorn-ingress.sh
@@ -0,0 +1,16 @@
 #!/bin/bash
 # 应用 Longhorn Ingress
 echo "创建 Longhorn Ingress..."
 kubectl apply -f longhorn-ingress.yaml
 # 显示 Ingress 状态
 echo ""
 echo "Ingress 状态："
 kubectl get ingress -n longhorn-system
 echo ""
 echo "访问 Longhorn UI："
 echo "  URL: http://longhorn.local"
 echo "  需要在 /etc/hosts 中添加："
 echo "  <节点IP> longhorn.local"
--- a/009-基础设施/005-ingress/longhorn-ingress.yaml
+++ b/009-基础设施/005-ingress/longhorn-ingress.yaml
@@ -0,0 +1,19 @@
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: longhorn-ingress
  namespace: longhorn-system
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web
 spec:
  rules:
  - host: longhorn.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: longhorn-frontend
            port:
              number: 80
--- a/009-基础设施/005-ingress/readme.md
+++ b/009-基础设施/005-ingress/readme.md
@@ -0,0 +1,202 @@
 # Traefik Ingress 控制器配置
 ## 当前状态
 K3s 默认已安装 Traefik 作为 Ingress 控制器。
 - **命名空间**: kube-system
 - **服务类型**: ClusterIP
 - **端口**: 80 (HTTP), 443 (HTTPS)
 ## Traefik 配置信息
 查看 Traefik 配置：
 ```bash
 kubectl get deployment traefik -n kube-system -o yaml
 ```
 查看 Traefik 服务：
 ```bash
 kubectl get svc traefik -n kube-system
 ```
 ## 使用 Ingress
 ### 基本 HTTP Ingress 示例
 ```yaml
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: example-ingress
  namespace: default
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web
 spec:
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: example-service
            port:
              number: 80
 ```
 ### HTTPS Ingress 示例（使用 TLS）
 ```yaml
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: example-ingress-tls
  namespace: default
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: websecure
    traefik.ingress.kubernetes.io/router.tls: "true"
 spec:
  tls:
  - hosts:
    - example.com
    secretName: example-tls-secret
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: example-service
            port:
              number: 80
 ```
 ## 创建 TLS 证书
 ### 使用 Let's Encrypt (cert-manager)
 1. 安装 cert-manager：
 ```bash
 kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml
 ```
 2. 创建 ClusterIssuer：
 ```yaml
 apiVersion: cert-manager.io/v1
 kind: ClusterIssuer
 metadata:
  name: letsencrypt-prod
 spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: your-email@example.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: traefik
 ```
 ### 使用自签名证书
 ```bash
 openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout tls.key -out tls.crt \
  -subj "/CN=example.com/O=example"
 kubectl create secret tls example-tls-secret \
  --key tls.key --cert tls.crt -n default
 ```
 ## Traefik Dashboard
 访问 Traefik Dashboard：
 ```bash
 kubectl port-forward -n kube-system $(kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik -o name) 9000:9000
 ```
 然后访问: http://localhost:9000/dashboard/
 ## 常用注解
 ### 重定向 HTTP 到 HTTPS
 ```yaml
 annotations:
  traefik.ingress.kubernetes.io/redirect-entry-point: https
  traefik.ingress.kubernetes.io/redirect-permanent: "true"
 ```
 ### 设置超时
 ```yaml
 annotations:
  traefik.ingress.kubernetes.io/router.middlewares: default-timeout@kubernetescrd
 ```
 ### 启用 CORS
 ```yaml
 annotations:
  traefik.ingress.kubernetes.io/router.middlewares: default-cors@kubernetescrd
 ```
 ## 中间件示例
 ### 创建超时中间件
 ```yaml
 apiVersion: traefik.containo.us/v1alpha1
 kind: Middleware
 metadata:
  name: timeout
  namespace: default
 spec:
  forwardAuth:
    address: http://auth-service
    trustForwardHeader: true
 ```
 ## 监控和日志
 查看 Traefik 日志：
 ```bash
 kubectl logs -n kube-system -l app.kubernetes.io/name=traefik -f
 ```
 ## 故障排查
 ### 检查 Ingress 状态
 ```bash
 kubectl get ingress -A
 kubectl describe ingress <ingress-name> -n <namespace>
 ```
 ### 检查 Traefik 配置
 ```bash
 kubectl get ingressroute -A
 kubectl get middleware -A
 ```
 ## 外部访问配置
 如果需要从外部访问，可以：
 1. **使用 NodePort**：
 ```bash
 kubectl patch svc traefik -n kube-system -p '{"spec":{"type":"NodePort"}}'
 ```
 2. **使用 LoadBalancer**（需要云环境或 MetalLB）：
 ```bash
 kubectl patch svc traefik -n kube-system -p '{"spec":{"type":"LoadBalancer"}}'
 ```
 3. **使用 HostPort**（直接绑定到节点端口 80/443）
 ## 参考资源
 - Traefik 官方文档: https://doc.traefik.io/traefik/
 - K3s Traefik 配置: https://docs.k3s.io/networking#traefik-ingress-controller
--- a/009-基础设施/006-monitoring-grafana/deploy.sh
+++ b/009-基础设施/006-monitoring-grafana/deploy.sh
@@ -0,0 +1,34 @@
 #!/bin/bash
 # 添加 Prometheus 社区 Helm 仓库
 echo "添加 Prometheus Helm 仓库..."
 helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
 helm repo update
 # 创建命名空间
 echo "创建 monitoring 命名空间..."
 kubectl create namespace monitoring
 # 安装 kube-prometheus-stack (包含 Prometheus, Grafana, Alertmanager)
 echo "安装 kube-prometheus-stack..."
 helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  -f values.yaml
 # 等待部署完成
 echo "等待 Prometheus 和 Grafana 启动..."
 kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=grafana -n monitoring --timeout=300s
 # 显示状态
 echo ""
 echo "监控系统部署完成！"
 kubectl get pods -n monitoring
 kubectl get svc -n monitoring
 echo ""
 echo "访问信息："
 echo "  Grafana: http://grafana.local (需要配置 Ingress)"
 echo "  默认用户名: admin"
 echo "  默认密码: prom-operator"
 echo ""
 echo "  Prometheus: http://prometheus.local (需要配置 Ingress)"
--- a/009-基础设施/006-monitoring-grafana/ingress.yaml
+++ b/009-基础设施/006-monitoring-grafana/ingress.yaml
@@ -0,0 +1,59 @@
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: grafana-ingress
  namespace: monitoring
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web
 spec:
  rules:
  - host: grafana.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: kube-prometheus-stack-grafana
            port:
              number: 80
 ---
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: prometheus-ingress
  namespace: monitoring
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web
 spec:
  rules:
  - host: prometheus.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: kube-prometheus-stack-prometheus
            port:
              number: 9090
 ---
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: alertmanager-ingress
  namespace: monitoring
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web
 spec:
  rules:
  - host: alertmanager.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: kube-prometheus-stack-alertmanager
            port:
              number: 9093
--- a/009-基础设施/006-monitoring-grafana/readme.md
+++ b/009-基础设施/006-monitoring-grafana/readme.md
@@ -0,0 +1,241 @@
 # Prometheus + Grafana 监控系统
 ## 组件说明
 ### Prometheus
 - **功能**: 时间序列数据库，收集和存储指标数据
 - **存储**: 20Gi Longhorn 卷
 - **数据保留**: 15 天
 - **访问**: http://prometheus.local
 ### Grafana
 - **功能**: 可视化仪表板
 - **存储**: 5Gi Longhorn 卷
 - **默认用户**: admin
 - **默认密码**: prom-operator
 - **访问**: http://grafana.local
 ### Alertmanager
 - **功能**: 告警管理和通知
 - **存储**: 5Gi Longhorn 卷
 - **访问**: http://alertmanager.local
 ### Node Exporter
 - **功能**: 收集节点级别的系统指标（CPU、内存、磁盘等）
 ### Kube State Metrics
 - **功能**: 收集 Kubernetes 资源状态指标
 ## 部署方式
 ```bash
 bash deploy.sh
 ```
 ## 部署后配置
 ### 1. 应用 Ingress
 ```bash
 kubectl apply -f ingress.yaml
 ```
 ### 2. 配置 /etc/hosts
 ```
 <节点IP> grafana.local
 <节点IP> prometheus.local
 <节点IP> alertmanager.local
 ```
 ### 3. 访问 Grafana
 1. 打开浏览器访问: http://grafana.local
 2. 使用默认凭证登录:
   - 用户名: admin
   - 密码: prom-operator
 3. 首次登录后建议修改密码
 ## 预置仪表板
 Grafana 已预装多个仪表板：
 1. **Kubernetes / Compute Resources / Cluster**
   - 集群整体资源使用情况
 2. **Kubernetes / Compute Resources / Namespace (Pods)**
   - 按命名空间查看 Pod 资源使用
 3. **Kubernetes / Compute Resources / Node (Pods)**
   - 按节点查看 Pod 资源使用
 4. **Kubernetes / Networking / Cluster**
   - 集群网络流量统计
 5. **Node Exporter / Nodes**
   - 节点详细指标（CPU、内存、磁盘、网络）
 ## 监控目标
 系统会自动监控：
 - ✅ Kubernetes API Server
 - ✅ Kubelet
 - ✅ Node Exporter (节点指标)
 - ✅ Kube State Metrics (K8s 资源状态)
 - ✅ CoreDNS
 - ✅ Prometheus 自身
 - ✅ Grafana
 ## 添加自定义监控
 ### 监控 Redis
 创建 ServiceMonitor：
 ```yaml
 apiVersion: monitoring.coreos.com/v1
 kind: ServiceMonitor
 metadata:
  name: redis-monitor
  namespace: monitoring
 spec:
  selector:
    matchLabels:
      app: redis
  namespaceSelector:
    matchNames:
    - redis
  endpoints:
  - port: redis
    interval: 30s
 ```
 ### 监控 PostgreSQL
 需要部署 postgres-exporter：
 ```bash
 helm install postgres-exporter prometheus-community/prometheus-postgres-exporter \
  --namespace postgresql \
  --set config.datasource.host=postgresql-service.postgresql.svc.cluster.local \
  --set config.datasource.user=postgres \
  --set config.datasource.password=postgres123
 ```
 ## 告警配置
 ### 查看告警规则
 ```bash
 kubectl get prometheusrules -n monitoring
 ```
 ### 自定义告警规则
 创建 PrometheusRule：
 ```yaml
 apiVersion: monitoring.coreos.com/v1
 kind: PrometheusRule
 metadata:
  name: custom-alerts
  namespace: monitoring
 spec:
  groups:
  - name: custom
    interval: 30s
    rules:
    - alert: HighMemoryUsage
      expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes > 0.9
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "节点内存使用率超过 90%"
        description: "节点 {{ $labels.instance }} 内存使用率为 {{ $value | humanizePercentage }}"
 ```
 ## 配置告警通知
 编辑 Alertmanager 配置：
 ```bash
 kubectl edit secret alertmanager-kube-prometheus-stack-alertmanager -n monitoring
 ```
 添加邮件、Slack、钉钉等通知渠道。
 ## 数据持久化
 所有数据都存储在 Longhorn 卷上：
 - Prometheus 数据: 20Gi
 - Grafana 配置: 5Gi
 - Alertmanager 数据: 5Gi
 可以通过 Longhorn UI 创建快照和备份到 S3。
 ## 常用操作
 ### 查看 Prometheus 目标
 访问: http://prometheus.local/targets
 ### 查看告警
 访问: http://alertmanager.local
 ### 导入自定义仪表板
 1. 访问 Grafana
 2. 点击 "+" -> "Import"
 3. 输入仪表板 ID 或上传 JSON
 推荐仪表板：
 - Node Exporter Full: 1860
 - Kubernetes Cluster Monitoring: 7249
 - Longhorn: 13032
 ### 查看日志
 ```bash
 # Prometheus 日志
 kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus -f
 # Grafana 日志
 kubectl logs -n monitoring -l app.kubernetes.io/name=grafana -f
 ```
 ## 性能优化
 ### 调整数据保留时间
 编辑 values.yaml 中的 `retention` 参数，然后：
 ```bash
 helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring -f values.yaml
 ```
 ### 调整采集间隔
 默认采集间隔为 30 秒，可以在 ServiceMonitor 中调整。
 ## 故障排查
 ### Prometheus 无法采集数据
 ```bash
 # 检查 ServiceMonitor
 kubectl get servicemonitor -A
 # 检查 Prometheus 配置
 kubectl get prometheus -n monitoring -o yaml
 ```
 ### Grafana 无法连接 Prometheus
 检查 Grafana 数据源配置：
 1. 登录 Grafana
 2. Configuration -> Data Sources
 3. 确认 Prometheus URL 正确
 ## 卸载
 ```bash
 helm uninstall kube-prometheus-stack -n monitoring
 kubectl delete namespace monitoring
 ```
 ## 参考资源
 - Prometheus 文档: https://prometheus.io/docs/
 - Grafana 文档: https://grafana.com/docs/
 - kube-prometheus-stack: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
--- a/009-基础设施/006-monitoring-grafana/values.yaml
+++ b/009-基础设施/006-monitoring-grafana/values.yaml
@@ -0,0 +1,89 @@
 # Prometheus Operator 配置
 prometheusOperator:
  enabled: true
 # Prometheus 配置
 prometheus:
  enabled: true
  prometheusSpec:
    retention: 15d
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: longhorn
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 20Gi
    resources:
      requests:
        memory: 512Mi
        cpu: 250m
      limits:
        memory: 2Gi
        cpu: 1000m
 # Grafana 配置
 grafana:
  enabled: true
  adminPassword: prom-operator
  persistence:
    enabled: true
    storageClassName: longhorn
    size: 5Gi
  resources:
    requests:
      memory: 256Mi
      cpu: 100m
    limits:
      memory: 512Mi
      cpu: 500m
 # Alertmanager 配置
 alertmanager:
  enabled: true
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: longhorn
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 5Gi
 # Node Exporter (收集节点指标)
 nodeExporter:
  enabled: true
 # Kube State Metrics (收集 K8s 资源指标)
 kubeStateMetrics:
  enabled: true
 # 默认监控规则
 defaultRules:
  create: true
  rules:
    alertmanager: true
    etcd: true
    configReloaders: true
    general: true
    k8s: true
    kubeApiserverAvailability: true
    kubeApiserverSlos: true
    kubelet: true
    kubeProxy: true
    kubePrometheusGeneral: true
    kubePrometheusNodeRecording: true
    kubernetesApps: true
    kubernetesResources: true
    kubernetesStorage: true
    kubernetesSystem: true
    kubeScheduler: true
    kubeStateMetrics: true
    network: true
    node: true
    nodeExporterAlerting: true
    nodeExporterRecording: true
    prometheus: true
    prometheusOperator: true
--- a/009-基础设施/007-keda/deploy.sh
+++ b/009-基础设施/007-keda/deploy.sh
@@ -0,0 +1,40 @@
 #!/bin/bash
 # KEDA 部署脚本
 echo "开始部署 KEDA..."
 # 设置 KUBECONFIG
 export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
 # 添加 KEDA Helm 仓库
 echo "添加 KEDA Helm 仓库..."
 helm repo add kedacore https://kedacore.github.io/charts
 helm repo update
 # 创建命名空间
 echo "创建 keda 命名空间..."
 kubectl create namespace keda --dry-run=client -o yaml | kubectl apply -f -
 # 安装 KEDA
 echo "安装 KEDA..."
 helm install keda kedacore/keda \
  --namespace keda \
  -f values.yaml
 # 等待 KEDA 组件就绪
 echo "等待 KEDA 组件启动..."
 kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=keda-operator -n keda --timeout=300s
 # 显示状态
 echo ""
 echo "KEDA 部署完成！"
 kubectl get pods -n keda
 kubectl get svc -n keda
 echo ""
 echo "验证 KEDA CRD："
 kubectl get crd | grep keda
 echo ""
 echo "KEDA 已成功部署到命名空间: keda"
--- a/009-基础设施/007-keda/http-scale-rule.yaml-这是gemini推荐的.md
+++ b/009-基础设施/007-keda/http-scale-rule.yaml-这是gemini推荐的.md
@@ -0,0 +1,16 @@
 apiVersion: http.keda.sh/v1alpha1
 kind: HTTPScaledObject
 metadata:
  name: my-web-app-scaler
 spec:
  host: my-app.example.com  # 你的域名
  targetPendingRequests: 100
  scaleTargetRef:
    name: your-deployment-name # 你想缩放到 0 的应用名
    kind: Deployment
    apiVersion: apps/v1
    service: your-service-name
    port: 80
  replicas:
    min: 0  # 核心：无人访问时缩放为 0
    max: 10
--- a/009-基础设施/007-keda/install-http-addon.sh
+++ b/009-基础设施/007-keda/install-http-addon.sh
@@ -0,0 +1,22 @@
 #!/bin/bash
 # 安装 KEDA HTTP Add-on
 echo "安装 KEDA HTTP Add-on..."
 export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
 # 安装 HTTP Add-on（使用默认配置）
 helm install http-add-on kedacore/keda-add-ons-http \
  --namespace keda
 echo "等待 HTTP Add-on 组件启动..."
 sleep 10
 echo ""
 echo "HTTP Add-on 部署完成！"
 kubectl get pods -n keda | grep http
 echo ""
 echo "HTTP Add-on 服务："
 kubectl get svc -n keda | grep http
--- a/009-基础设施/007-keda/readme.md
+++ b/009-基础设施/007-keda/readme.md
@@ -0,0 +1,458 @@
 # KEDA 自动扩缩容
 ## 功能说明
 KEDA (Kubernetes Event Driven Autoscaling) 为 K3s 集群提供基于事件驱动的自动扩缩容能力。
 ### 核心功能
 - **按需启动/停止服务**：空闲时自动缩容到 0，节省资源
 - **基于指标自动扩缩容**：根据实际负载动态调整副本数
 - **多种触发器支持**：CPU、内存、Prometheus 指标、数据库连接等
 - **与 Prometheus 集成**：利用现有监控数据进行扩缩容决策
 ## 部署方式
 ```bash
 cd /home/fei/k3s/009-基础设施/007-keda
 bash deploy.sh
 ```
 ## 已配置的服务
 ### 1. Navigation 导航服务 ✅
 - **最小副本数**: 0（空闲时完全停止）
 - **最大副本数**: 10
 - **触发条件**:
  - HTTP 请求速率 > 10 req/min
  - CPU 使用率 > 60%
 - **冷却期**: 3 分钟
 **配置文件**: `scalers/navigation-scaler.yaml`
 ### 2. Redis 缓存服务 ⏳
 - **最小副本数**: 0（空闲时完全停止）
 - **最大副本数**: 5
 - **触发条件**:
  - 有客户端连接
  - CPU 使用率 > 70%
 - **冷却期**: 5 分钟
 **配置文件**: `scalers/redis-scaler.yaml`
 **状态**: 待应用（需要先为 Redis 添加 Prometheus exporter）
 ### 3. PostgreSQL 数据库 ❌
 **不推荐使用 KEDA 扩展 PostgreSQL！**
 原因：
 - PostgreSQL 是有状态服务，多个副本会导致存储冲突
 - 需要配置主从复制才能安全扩展
 - 建议使用 PostgreSQL Operator 或 PgBouncer + KEDA
 详细说明：`scalers/postgresql-说明.md`
 ## 应用 ScaledObject
 ### 部署所有 Scaler
 ```bash
 # 应用 Navigation Scaler
 kubectl apply -f scalers/navigation-scaler.yaml
 # 应用 Redis Scaler（需要先配置 Redis exporter）
 kubectl apply -f scalers/redis-scaler.yaml
 # ⚠️ PostgreSQL 不推荐使用 KEDA 扩展
 # 详见: scalers/postgresql-说明.md
 ```
 ### 查看 ScaledObject 状态
 ```bash
 # 查看所有 ScaledObject
 kubectl get scaledobject -A
 # 查看详细信息
 kubectl describe scaledobject navigation-scaler -n navigation
 kubectl describe scaledobject redis-scaler -n redis
 kubectl describe scaledobject postgresql-scaler -n postgresql
 ```
 ### 查看自动创建的 HPA
 ```bash
 # KEDA 会自动创建 HorizontalPodAutoscaler
 kubectl get hpa -A
 ```
 ## 支持的触发器类型
 ### 1. Prometheus 指标
 ```yaml
 triggers:
 - type: prometheus
  metadata:
    serverAddress: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090
    metricName: custom_metric
    query: sum(rate(http_requests_total[1m]))
    threshold: "100"
 ```
 ### 2. CPU/内存使用率
 ```yaml
 triggers:
 - type: cpu
  metadata:
    type: Utilization
    value: "70"
 - type: memory
  metadata:
    type: Utilization
    value: "80"
 ```
 ### 3. Redis 队列长度
 ```yaml
 triggers:
 - type: redis
  metadata:
    address: redis.redis.svc.cluster.local:6379
    listName: mylist
    listLength: "5"
 ```
 ### 4. PostgreSQL 查询
 ```yaml
 triggers:
 - type: postgresql
  metadata:
    connectionString: postgresql://user:pass@host:5432/db
    query: "SELECT COUNT(*) FROM tasks WHERE status='pending'"
    targetQueryValue: "10"
 ```
 ### 5. Cron 定时触发
 ```yaml
 triggers:
 - type: cron
  metadata:
    timezone: Asia/Shanghai
    start: 0 8 * * *      # 每天 8:00 扩容
    end: 0 18 * * *       # 每天 18:00 缩容
    desiredReplicas: "3"
 ```
 ## 为新服务添加自动扩缩容
 ### 步骤 1: 确保服务配置正确
 服务的 Deployment 必须配置 `resources.requests`：
 ```yaml
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: myapp
 spec:
  # 不要设置 replicas，由 KEDA 管理
  template:
    spec:
      containers:
      - name: myapp
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi
 ```
 ### 步骤 2: 创建 ScaledObject
 ```yaml
 apiVersion: keda.sh/v1alpha1
 kind: ScaledObject
 metadata:
  name: myapp-scaler
  namespace: myapp
 spec:
  scaleTargetRef:
    name: myapp
  minReplicaCount: 0
  maxReplicaCount: 10
  pollingInterval: 30
  cooldownPeriod: 300
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090
      metricName: myapp_requests
      query: sum(rate(http_requests_total{app="myapp"}[1m]))
      threshold: "50"
 ```
 ### 步骤 3: 应用配置
 ```bash
 kubectl apply -f myapp-scaler.yaml
 ```
 ## 监控和调试
 ### 查看 KEDA 日志
 ```bash
 # Operator 日志
 kubectl logs -n keda -l app.kubernetes.io/name=keda-operator -f
 # Metrics Server 日志
 kubectl logs -n keda -l app.kubernetes.io/name=keda-metrics-apiserver -f
 ```
 ### 查看扩缩容事件
 ```bash
 # 查看 HPA 事件
 kubectl describe hpa -n <namespace>
 # 查看 Pod 事件
 kubectl get events -n <namespace> --sort-by='.lastTimestamp'
 ```
 ### 在 Prometheus 中查询 KEDA 指标
 访问 https://prometheus.u6.net3w.com，查询：
 ```promql
 # KEDA Scaler 活跃状态
 keda_scaler_active
 # KEDA Scaler 错误
 keda_scaler_errors_total
 # 当前指标值
 keda_scaler_metrics_value
 ```
 ### 在 Grafana 中查看 KEDA 仪表板
 1. 访问 https://grafana.u6.net3w.com
 2. 导入 KEDA 官方仪表板 ID: **14691**
 3. 查看实时扩缩容状态
 ## 测试自动扩缩容
 ### 测试 Navigation 服务
 **测试缩容到 0：**
 ```bash
 # 1. 停止访问导航页面，等待 3 分钟
 sleep 180
 # 2. 检查副本数
 kubectl get deployment navigation -n navigation
 # 预期输出：READY 0/0
 ```
 **测试从 0 扩容：**
 ```bash
 # 1. 访问导航页面
 curl https://dh.u6.net3w.com
 # 2. 监控副本数变化
 kubectl get deployment navigation -n navigation -w
 # 预期：副本数从 0 变为 1（约 10-30 秒）
 ```
 ### 测试 Redis 服务
 **测试基于连接数扩容：**
 ```bash
 # 1. 连接 Redis
 kubectl run redis-client --rm -it --image=redis:7-alpine -- redis-cli -h redis.redis.svc.cluster.local
 # 2. 在另一个终端监控
 kubectl get deployment redis -n redis -w
 # 预期：有连接时副本数从 0 变为 1
 ```
 ### 测试 PostgreSQL 服务
 **测试基于连接数扩容：**
 ```bash
 # 1. 创建多个数据库连接
 for i in {1..15}; do
  kubectl run pg-client-$i --image=postgres:16-alpine --restart=Never -- \
    psql -h postgresql-service.postgresql.svc.cluster.local -U postgres -c "SELECT pg_sleep(60);" &
 done
 # 2. 监控副本数
 kubectl get statefulset postgresql -n postgresql -w
 # 预期：连接数超过 10 时，副本数从 1 增加到 2
 ```
 ## 故障排查
 ### ScaledObject 未生效
 **检查 ScaledObject 状态：**
 ```bash
 kubectl describe scaledobject <name> -n <namespace>
 ```
 **常见问题：**
 1. **Deployment 设置了固定 replicas**
   - 解决：移除 Deployment 中的 `replicas` 字段
 2. **缺少 resources.requests**
   - 解决：为容器添加 `resources.requests` 配置
 3. **Prometheus 查询错误**
   - 解决：在 Prometheus UI 中测试查询语句
 ### 服务无法缩容到 0
 **可能原因：**
 1. **仍有活跃连接或请求**
   - 检查：查看 Prometheus 指标值
 2. **cooldownPeriod 未到**
   - 检查：等待冷却期结束
 3. **minReplicaCount 设置错误**
   - 检查：确认 `minReplicaCount: 0`
 ### 扩容速度慢
 **优化建议：**
 1. **减少 pollingInterval**
   ```yaml
   pollingInterval: 15  # 从 30 秒改为 15 秒
   ```
 2. **降低 threshold**
   ```yaml
   threshold: "5"  # 降低触发阈值
   ```
 3. **使用多个触发器**
   ```yaml
   triggers:
   - type: prometheus
     # ...
   - type: cpu
     # ...
   ```
 ## 最佳实践
 ### 1. 合理设置副本数范围
 - **无状态服务**：`minReplicaCount: 0`，节省资源
 - **有状态服务**：`minReplicaCount: 1`，保证可用性
 - **关键服务**：`minReplicaCount: 2`，保证高可用
 ### 2. 选择合适的冷却期
 - **快速响应服务**：`cooldownPeriod: 60-180`（1-3 分钟）
 - **一般服务**：`cooldownPeriod: 300`（5 分钟）
 - **数据库服务**：`cooldownPeriod: 600-900`（10-15 分钟）
 ### 3. 监控扩缩容行为
 - 定期查看 Grafana 仪表板
 - 设置告警规则
 - 分析扩缩容历史
 ### 4. 测试冷启动时间
 - 测量从 0 扩容到可用的时间
 - 优化镜像大小和启动脚本
 - 考虑使用 `minReplicaCount: 1` 避免冷启动
 ## 配置参考
 ### ScaledObject 完整配置示例
 ```yaml
 apiVersion: keda.sh/v1alpha1
 kind: ScaledObject
 metadata:
  name: example-scaler
  namespace: example
 spec:
  scaleTargetRef:
    name: example-deployment
    kind: Deployment                    # 可选：Deployment, StatefulSet
    apiVersion: apps/v1                 # 可选
  minReplicaCount: 0                    # 最小副本数
  maxReplicaCount: 10                   # 最大副本数
  pollingInterval: 30                   # 轮询间隔（秒）
  cooldownPeriod: 300                   # 缩容冷却期（秒）
  idleReplicaCount: 0                   # 空闲时的副本数
  fallback:                             # 故障回退配置
    failureThreshold: 3
    replicas: 2
  advanced:                             # 高级配置
    restoreToOriginalReplicaCount: false
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 300
          policies:
          - type: Percent
            value: 50
            periodSeconds: 60
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus:9090
      metricName: custom_metric
      query: sum(rate(metric[1m]))
      threshold: "100"
 ```
 ## 卸载 KEDA
 ```bash
 # 删除所有 ScaledObject
 kubectl delete scaledobject --all -A
 # 卸载 KEDA
 helm uninstall keda -n keda
 # 删除命名空间
 kubectl delete namespace keda
 ```
 ## 参考资源
 - KEDA 官方文档: https://keda.sh/docs/
 - KEDA Scalers: https://keda.sh/docs/scalers/
 - KEDA GitHub: https://github.com/kedacore/keda
 - Grafana 仪表板: https://grafana.com/grafana/dashboards/14691
 ---
 **KEDA 让您的 K3s 集群更智能、更高效！** 🚀
--- a/009-基础设施/007-keda/scalers/KEDA-自动缩容到0-配置指南.md
+++ b/009-基础设施/007-keda/scalers/KEDA-自动缩容到0-配置指南.md
@@ -0,0 +1,380 @@
 # KEDA HTTP Add-on 自动缩容到 0 配置指南
 本指南说明如何使用 KEDA HTTP Add-on 实现应用在无流量时自动缩容到 0，有访问时自动启动。
 ## 前提条件
 1. K3s 集群已安装
 2. KEDA 已安装
 3. KEDA HTTP Add-on 已安装
 4. Traefik 作为 Ingress Controller
 ### 检查 KEDA HTTP Add-on 是否已安装
 ```bash
 kubectl get pods -n keda | grep http
 ```
 应该看到类似输出：
 ```
 keda-add-ons-http-controller-manager-xxx   1/1     Running
 keda-add-ons-http-external-scaler-xxx      1/1     Running
 keda-add-ons-http-interceptor-xxx          1/1     Running
 ```
 ### 如果未安装，执行以下命令安装
 ```bash
 helm repo add kedacore https://kedacore.github.io/charts
 helm repo update
 helm install http-add-on kedacore/keda-add-ons-http --namespace keda
 ```
 ## 配置步骤
 ### 1. 准备应用的基础资源
 确保你的应用已经有以下资源：
 - Deployment
 - Service
 - Namespace
 示例：
 ```yaml
 apiVersion: v1
 kind: Namespace
 metadata:
  name: myapp
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: myapp
  namespace: myapp
 spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: your-image:tag
        ports:
        - containerPort: 80
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: myapp
  namespace: myapp
 spec:
  selector:
    app: myapp
  ports:
  - port: 80
    targetPort: 80
 ```
 ### 2. 创建 HTTPScaledObject
 这是实现自动缩容到 0 的核心配置。
 ```yaml
 apiVersion: http.keda.sh/v1alpha1
 kind: HTTPScaledObject
 metadata:
  name: myapp-http-scaler
  namespace: myapp  # 必须与应用在同一个 namespace
 spec:
  hosts:
  - myapp.example.com  # 你的域名
  pathPrefixes:
  - /  # 匹配的路径前缀
  scaleTargetRef:
    name: myapp  # Deployment 名称
    kind: Deployment
    apiVersion: apps/v1
    service: myapp  # Service 名称
    port: 80  # Service 端口
  replicas:
    min: 0  # 空闲时缩容到 0
    max: 10  # 最多扩容到 10 个副本
  scalingMetric:
    requestRate:
      granularity: 1s
      targetValue: 100  # 每秒 100 个请求时扩容
      window: 1m
  scaledownPeriod: 300  # 5 分钟（300秒）无流量后缩容到 0
 ```
 **重要参数说明：**
 - `hosts`: 你的应用域名
 - `scaleTargetRef.name`: 你的 Deployment 名称
 - `scaleTargetRef.service`: 你的 Service 名称
 - `scaleTargetRef.port`: 你的 Service 端口
 - `replicas.min: 0`: 允许缩容到 0
 - `scaledownPeriod`: 无流量后多久缩容（秒）
 ### 3. 创建 Traefik IngressRoute
 **重要：IngressRoute 必须在 keda namespace 中创建**，因为它需要引用 keda namespace 的拦截器服务。
 ```yaml
 apiVersion: traefik.io/v1alpha1
 kind: IngressRoute
 metadata:
  name: myapp-ingress
  namespace: keda  # 注意：必须在 keda namespace
 spec:
  entryPoints:
    - web  # HTTP 入口
    # - websecure  # 如果需要 HTTPS，添加这个
  routes:
  - match: Host(`myapp.example.com`)  # 你的域名
    kind: Rule
    services:
    - name: keda-add-ons-http-interceptor-proxy
      port: 8080
 ```
 **如果需要 HTTPS，添加 TLS 配置：**
 ```yaml
 apiVersion: traefik.io/v1alpha1
 kind: IngressRoute
 metadata:
  name: myapp-ingress
  namespace: keda
 spec:
  entryPoints:
    - websecure
  routes:
  - match: Host(`myapp.example.com`)
    kind: Rule
    services:
    - name: keda-add-ons-http-interceptor-proxy
      port: 8080
  tls:
    certResolver: letsencrypt  # 你的证书解析器
 ```
 ### 4. 完整配置文件模板
 将以下内容保存为 `myapp-keda-scaler.yaml`，并根据你的应用修改相应的值：
 ```yaml
 ---
 # HTTPScaledObject - 实现自动缩容到 0
 apiVersion: http.keda.sh/v1alpha1
 kind: HTTPScaledObject
 metadata:
  name: myapp-http-scaler
  namespace: myapp  # 改为你的 namespace
 spec:
  hosts:
  - myapp.example.com  # 改为你的域名
  pathPrefixes:
  - /
  scaleTargetRef:
    name: myapp  # 改为你的 Deployment 名称
    kind: Deployment
    apiVersion: apps/v1
    service: myapp  # 改为你的 Service 名称
    port: 80  # 改为你的 Service 端口
  replicas:
    min: 0
    max: 10
  scalingMetric:
    requestRate:
      granularity: 1s
      targetValue: 100
      window: 1m
  scaledownPeriod: 300  # 5 分钟无流量后缩容
 ---
 # Traefik IngressRoute - 路由流量到 KEDA 拦截器
 apiVersion: traefik.io/v1alpha1
 kind: IngressRoute
 metadata:
  name: myapp-ingress
  namespace: keda  # 必须在 keda namespace
 spec:
  entryPoints:
    - web
  routes:
  - match: Host(`myapp.example.com`)  # 改为你的域名
    kind: Rule
    services:
    - name: keda-add-ons-http-interceptor-proxy
      port: 8080
 ```
 ### 5. 应用配置
 ```bash
 kubectl apply -f myapp-keda-scaler.yaml
 ```
 ### 6. 验证配置
 ```bash
 # 查看 HTTPScaledObject 状态
 kubectl get httpscaledobject -n myapp
 # 应该看到 READY = True
 # NAME                TARGETWORKLOAD              TARGETSERVICE   MINREPLICAS   MAXREPLICAS   AGE   READY
 # myapp-http-scaler   apps/v1/Deployment/myapp    myapp:80        0             10            10s   True
 # 查看 IngressRoute
 kubectl get ingressroute -n keda
 # 查看当前 Pod 数量
 kubectl get pods -n myapp
 ```
 ## 工作原理
 1. **有流量时**：
   - 用户访问 `myapp.example.com`
   - Traefik 将流量路由到 KEDA HTTP 拦截器
   - 拦截器检测到请求，通知 KEDA 启动 Pod
   - Pod 启动后（5-10秒），拦截器将流量转发到应用
   - 用户看到正常响应（首次访问可能有延迟）
 2. **无流量时**：
   - 5 分钟（scaledownPeriod）无请求后
   - KEDA 自动将 Deployment 缩容到 0
   - 不消耗任何计算资源
 ## 常见问题排查
 ### 1. 访问返回 404
 **检查 IngressRoute 是否在 keda namespace：**
 ```bash
 kubectl get ingressroute -n keda | grep myapp
 ```
 如果不在，删除并重新创建：
 ```bash
 kubectl delete ingressroute myapp-ingress -n myapp  # 删除错误的
 kubectl apply -f myapp-keda-scaler.yaml  # 重新创建
 ```
 ### 2. HTTPScaledObject READY = False
 **查看详细错误信息：**
 ```bash
 kubectl describe httpscaledobject myapp-http-scaler -n myapp
 ```
 **常见错误：**
 - `workload already managed by ScaledObject`: 删除旧的 ScaledObject
  ```bash
  kubectl delete scaledobject myapp-scaler -n myapp
  ```
 ### 3. Pod 没有自动缩容到 0
 **检查是否有旧的 ScaledObject 阻止缩容：**
 ```bash
 kubectl get scaledobject -n myapp
 ```
 如果有，删除它：
 ```bash
 kubectl delete scaledobject <name> -n myapp
 ```
 ### 4. 查看 KEDA 拦截器日志
 ```bash
 kubectl logs -n keda -l app.kubernetes.io/name=keda-add-ons-http-interceptor --tail=50
 ```
 ### 5. 测试拦截器是否工作
 ```bash
 # 获取拦截器服务 IP
 kubectl get svc keda-add-ons-http-interceptor-proxy -n keda
 # 直接测试拦截器
 curl -H "Host: myapp.example.com" http://<CLUSTER-IP>:8080
 ```
 ## 调优建议
 ### 调整缩容时间
 根据你的应用特点调整 `scaledownPeriod`：
 - **频繁访问的应用**：设置较长时间（如 600 秒 = 10 分钟）
 - **偶尔访问的应用**：设置较短时间（如 180 秒 = 3 分钟）
 - **演示/测试环境**：可以设置很短（如 60 秒 = 1 分钟）
 ```yaml
 scaledownPeriod: 600  # 10 分钟
 ```
 ### 调整扩容阈值
 根据应用负载调整 `targetValue`：
 ```yaml
 scalingMetric:
  requestRate:
    targetValue: 50  # 每秒 50 个请求时扩容（更敏感）
 ```
 ### 调整最大副本数
 ```yaml
 replicas:
  min: 0
  max: 20  # 根据你的资源和需求调整
 ```
 ## 监控和观察
 ### 实时监控 Pod 变化
 ```bash
 watch -n 2 'kubectl get pods -n myapp'
 ```
 ### 查看 HTTPScaledObject 事件
 ```bash
 kubectl describe httpscaledobject myapp-http-scaler -n myapp
 ```
 ### 查看 Deployment 副本数变化
 ```bash
 kubectl get deployment myapp -n myapp -w
 ```
 ## 完整示例：navigation 应用
 参考 `navigation-complete.yaml` 文件，这是一个完整的工作示例。
 ## 注意事项
 1. **首次访问延迟**：Pod 从 0 启动需要 5-10 秒，用户首次访问会有延迟
 2. **数据库连接**：确保应用能够快速重新建立数据库连接
 3. **会话状态**：不要在 Pod 中存储会话状态，使用 Redis 等外部存储
 4. **健康检查**：配置合理的 readinessProbe，确保 Pod 就绪后才接收流量
 5. **资源限制**：设置合理的 resources limits，避免启动过慢
 ## 参考资源
 - KEDA 官方文档: https://keda.sh/
 - KEDA HTTP Add-on: https://github.com/kedacore/http-add-on
 - Traefik IngressRoute: https://doc.traefik.io/traefik/routing/providers/kubernetes-crd/
--- a/009-基础设施/007-keda/scalers/navigation-complete.yaml
+++ b/009-基础设施/007-keda/scalers/navigation-complete.yaml
@@ -0,0 +1,45 @@
 ---
 # HTTPScaledObject - 用于实现缩容到 0 的核心配置
 apiVersion: http.keda.sh/v1alpha1
 kind: HTTPScaledObject
 metadata:
  name: navigation-http-scaler
  namespace: navigation
 spec:
  hosts:
  - dh.u6.net3w.com
  pathPrefixes:
  - /
  scaleTargetRef:
    name: navigation
    kind: Deployment
    apiVersion: apps/v1
    service: navigation
    port: 80
  replicas:
    min: 0                              # 空闲时缩容到 0
    max: 10                             # 最多 10 个副本
  scalingMetric:
    requestRate:
      granularity: 1s
      targetValue: 100                  # 每秒 100 个请求时扩容
      window: 1m
  scaledownPeriod: 300                  # 5 分钟无流量后缩容到 0
 ---
 # Traefik IngressRoute - 将流量路由到 KEDA HTTP Add-on 的拦截器
 # 注意：必须在 keda namespace 中才能引用该 namespace 的服务
 apiVersion: traefik.io/v1alpha1
 kind: IngressRoute
 metadata:
  name: navigation-ingress
  namespace: keda
 spec:
  entryPoints:
    - web
  routes:
  - match: Host(`dh.u6.net3w.com`)
    kind: Rule
    services:
    - name: keda-add-ons-http-interceptor-proxy
      port: 8080
--- a/009-基础设施/007-keda/scalers/navigation-http-scaler.yaml
+++ b/009-基础设施/007-keda/scalers/navigation-http-scaler.yaml
@@ -0,0 +1,24 @@
 apiVersion: http.keda.sh/v1alpha1
 kind: HTTPScaledObject
 metadata:
  name: navigation-http-scaler
  namespace: navigation
 spec:
  hosts:
  - dh.u6.net3w.com
  pathPrefixes:
  - /
  scaleTargetRef:
    name: navigation
    kind: Deployment
    apiVersion: apps/v1
    service: navigation
    port: 80
  replicas:
    min: 0                              # 空闲时缩容到 0
    max: 10                             # 最多 10 个副本
  scalingMetric:
    requestRate:
      granularity: 1s
      targetValue: 100                  # 每秒 100 个请求时扩容
      window: 1m
--- a/009-基础设施/007-keda/scalers/navigation-ingress-http.yaml
+++ b/009-基础设施/007-keda/scalers/navigation-ingress-http.yaml
@@ -0,0 +1,19 @@
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: navigation-ingress
  namespace: navigation
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web
 spec:
  rules:
  - host: dh.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: keda-add-ons-http-interceptor-proxy
            port:
              number: 8080
--- a/009-基础设施/007-keda/scalers/navigation-scaler.yaml
+++ b/009-基础设施/007-keda/scalers/navigation-scaler.yaml
@@ -0,0 +1,23 @@
 apiVersion: keda.sh/v1alpha1
 kind: ScaledObject
 metadata:
  name: navigation-scaler
  namespace: navigation
 spec:
  scaleTargetRef:
    name: navigation
  minReplicaCount: 1                    # 至少保持 1 个副本（HPA 限制）
  maxReplicaCount: 10                   # 最多 10 个副本
  pollingInterval: 15                   # 每 15 秒检查一次
  cooldownPeriod: 180                   # 缩容冷却期 3 分钟
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090
      metricName: nginx_http_requests_total
      query: sum(rate(nginx_http_requests_total{namespace="navigation"}[1m]))
      threshold: "10"                   # 每分钟超过 10 个请求时启动
  - type: cpu
    metricType: Utilization
    metadata:
      value: "60"                       # CPU 使用率超过 60% 时扩容
--- a/009-基础设施/007-keda/scalers/postgresql-说明.md
+++ b/009-基础设施/007-keda/scalers/postgresql-说明.md
@@ -0,0 +1,261 @@
 # ⚠️ PostgreSQL 不适合使用 KEDA 自动扩缩容
 ## 问题说明
 对于传统的 PostgreSQL 架构，直接通过 KEDA 增加副本数会导致：
 ### 1. 存储冲突
 - 多个 Pod 尝试挂载同一个 PVC
 - ReadWriteOnce 存储只能被一个 Pod 使用
 - 会导致 Pod 启动失败
 ### 2. 数据损坏风险
 - 如果使用 ReadWriteMany 存储，多个实例同时写入会导致数据损坏
 - PostgreSQL 不支持多主写入
 - 没有锁机制保护数据一致性
 ### 3. 缺少主从复制
 - 需要配置 PostgreSQL 流复制（Streaming Replication）
 - 需要配置主从切换机制
 - 需要使用专门的 PostgreSQL Operator
 ## 正确的 PostgreSQL 扩展方案
 ### 方案 1: 使用 PostgreSQL Operator
 推荐使用专业的 PostgreSQL Operator：
 #### Zalando PostgreSQL Operator
 ```bash
 # 添加 Helm 仓库
 helm repo add postgres-operator-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator
 # 安装 Operator
 helm install postgres-operator postgres-operator-charts/postgres-operator
 # 创建 PostgreSQL 集群
 apiVersion: "acid.zalan.do/v1"
 kind: postgresql
 metadata:
  name: acid-minimal-cluster
 spec:
  teamId: "acid"
  volume:
    size: 10Gi
    storageClass: longhorn
  numberOfInstances: 3  # 1 主 + 2 从
  users:
    zalando:
    - superuser
    - createdb
  databases:
    foo: zalando
  postgresql:
    version: "16"
 ```
 #### CloudNativePG Operator
 ```bash
 # 安装 CloudNativePG
 kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.22/releases/cnpg-1.22.0.yaml
 # 创建集群
 apiVersion: postgresql.cnpg.io/v1
 kind: Cluster
 metadata:
  name: cluster-example
 spec:
  instances: 3
  storage:
    storageClass: longhorn
    size: 10Gi
 ```
 ### 方案 2: 读写分离 + KEDA
 如果需要使用 KEDA，正确的架构是：
 ```
 ┌─────────────────┐
 │  主库 (Master)  │ ← 固定 1 个副本，处理写入
 │  StatefulSet    │
 └─────────────────┘
        │
        │ 流复制
        ↓
 ┌─────────────────┐
 │  从库 (Replica) │ ← KEDA 管理，处理只读查询
 │  Deployment     │    可以 0-N 个副本
 └─────────────────┘
 ```
 **配置示例：**
 ```yaml
 # 主库 - 固定副本
 apiVersion: apps/v1
 kind: StatefulSet
 metadata:
  name: postgresql-master
 spec:
  replicas: 1  # 固定 1 个
  # ... 配置主库
 ---
 # 从库 - KEDA 管理
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: postgresql-replica
 spec:
  # replicas 由 KEDA 管理
  # ... 配置从库（只读）
 ---
 # KEDA ScaledObject - 只扩展从库
 apiVersion: keda.sh/v1alpha1
 kind: ScaledObject
 metadata:
  name: postgresql-replica-scaler
 spec:
  scaleTargetRef:
    name: postgresql-replica  # 只针对从库
  minReplicaCount: 0
  maxReplicaCount: 5
  triggers:
  - type: postgresql
    metadata:
      connectionString: postgresql://user:pass@postgresql-master:5432/db
      query: "SELECT COUNT(*) FROM pg_stat_activity WHERE state = 'active' AND query NOT LIKE '%pg_stat_activity%'"
      targetQueryValue: "10"
 ```
 ### 方案 3: 垂直扩展（推荐用于单实例）
 对于单实例 PostgreSQL，使用 VPA (Vertical Pod Autoscaler) 更合适：
 ```yaml
 apiVersion: autoscaling.k8s.io/v1
 kind: VerticalPodAutoscaler
 metadata:
  name: postgresql-vpa
 spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: StatefulSet
    name: postgresql
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: postgresql
      minAllowed:
        cpu: 250m
        memory: 512Mi
      maxAllowed:
        cpu: 2000m
        memory: 4Gi
 ```
 ## 当前部署建议
 对于您当前的 PostgreSQL 部署（`/home/fei/k3s/010-中间件/002-postgresql/`）：
 ### ❌ 不要使用 KEDA 水平扩展
 - 当前是单实例 StatefulSet
 - 没有配置主从复制
 - 直接扩展会导致数据问题
 ### ✅ 推荐的优化方案
 1. **保持单实例运行**
   ```yaml
   replicas: 1  # 固定不变
   ```
 2. **优化资源配置**
   ```yaml
   resources:
     requests:
       cpu: 500m
       memory: 1Gi
     limits:
       cpu: 2000m
       memory: 4Gi
   ```
 3. **配置连接池**
   - 使用 PgBouncer 作为连接池
   - PgBouncer 可以使用 KEDA 扩展
 4. **定期备份**
   - 使用 Longhorn 快照
   - 备份到 S3
 ## PgBouncer + KEDA 方案
 这是最实用的方案：PostgreSQL 保持单实例，PgBouncer 使用 KEDA 扩展。
 ```yaml
 # PostgreSQL - 固定单实例
 apiVersion: apps/v1
 kind: StatefulSet
 metadata:
  name: postgresql
 spec:
  replicas: 1  # 固定
  # ...
 ---
 # PgBouncer - 连接池
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: pgbouncer
 spec:
  # replicas 由 KEDA 管理
  template:
    spec:
      containers:
      - name: pgbouncer
        image: pgbouncer/pgbouncer:latest
        # ...
 ---
 # KEDA ScaledObject - 扩展 PgBouncer
 apiVersion: keda.sh/v1alpha1
 kind: ScaledObject
 metadata:
  name: pgbouncer-scaler
 spec:
  scaleTargetRef:
    name: pgbouncer
  minReplicaCount: 1
  maxReplicaCount: 10
  triggers:
  - type: postgresql
    metadata:
      connectionString: postgresql://postgres:postgres123@postgresql:5432/postgres
      query: "SELECT COUNT(*) FROM pg_stat_activity WHERE state = 'active'"
      targetQueryValue: "20"
 ```
 ## 总结
 | 方案 | 适用场景 | 复杂度 | 推荐度 |
 |------|---------|--------|--------|
 | PostgreSQL Operator | 生产环境，需要高可用 | 高 | ⭐⭐⭐⭐⭐ |
 | 读写分离 + KEDA | 读多写少场景 | 中 | ⭐⭐⭐⭐ |
 | PgBouncer + KEDA | 连接数波动大 | 低 | ⭐⭐⭐⭐⭐ |
 | VPA 垂直扩展 | 单实例，资源需求变化 | 低 | ⭐⭐⭐ |
 | 直接 KEDA 扩展 | ❌ 不适用 | - | ❌ |
 **对于当前部署，建议保持 PostgreSQL 单实例运行，不使用 KEDA 扩展。**
 如果需要扩展能力，优先考虑：
 1. 部署 PgBouncer 连接池 + KEDA
 2. 或者迁移到 PostgreSQL Operator
 ---
 **重要提醒：有状态服务的扩展需要特殊处理，不能简单地增加副本数！** ⚠️
--- a/009-基础设施/007-keda/scalers/redis-scaler.yaml
+++ b/009-基础设施/007-keda/scalers/redis-scaler.yaml
@@ -0,0 +1,23 @@
 apiVersion: keda.sh/v1alpha1
 kind: ScaledObject
 metadata:
  name: redis-scaler
  namespace: redis
 spec:
  scaleTargetRef:
    name: redis
  minReplicaCount: 0                    # 空闲时缩容到 0
  maxReplicaCount: 5                    # 最多 5 个副本
  pollingInterval: 30                   # 每 30 秒检查一次
  cooldownPeriod: 300                   # 缩容冷却期 5 分钟
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090
      metricName: redis_connected_clients
      query: sum(redis_connected_clients{namespace="redis"})
      threshold: "1"                    # 有连接时启动
  - type: cpu
    metricType: Utilization
    metadata:
      value: "70"                       # CPU 使用率超过 70% 时扩容
--- a/009-基础设施/007-keda/values.yaml
+++ b/009-基础设施/007-keda/values.yaml
@@ -0,0 +1,41 @@
 # KEDA Helm 配置
 # Operator 配置
 operator:
  replicaCount: 1
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 500m
      memory: 512Mi
 # Metrics Server 配置
 metricsServer:
  replicaCount: 1
  resources:
    requests:
      cpu: 100m
      memory: 128Mi
    limits:
      cpu: 500m
      memory: 512Mi
 # 与 Prometheus 集成
 prometheus:
  metricServer:
    enabled: true
    port: 9022
    path: /metrics
  operator:
    enabled: true
    port: 8080
    path: /metrics
 # ServiceMonitor 用于 Prometheus 抓取
 serviceMonitor:
  enabled: true
  namespace: keda
  additionalLabels:
    release: kube-prometheus-stack
--- a/009-基础设施/007-keda/最终总结.md
+++ b/009-基础设施/007-keda/最终总结.md
@@ -0,0 +1,197 @@
 # KEDA 部署最终总结
 ## ✅ 成功部署
 ### KEDA 核心组件
 - **keda-operator**: ✅ 运行中
 - **keda-metrics-apiserver**: ✅ 运行中
 - **keda-admission-webhooks**: ✅ 运行中
 - **命名空间**: keda
 ### 已配置的服务
 | 服务 | 状态 | 最小副本 | 最大副本 | 说明 |
 |------|------|---------|---------|------|
 | Navigation | ✅ 已应用 | 0 | 10 | 空闲时自动缩容到 0 |
 | Redis | ⏳ 待应用 | 0 | 5 | 需要先配置 Prometheus exporter |
 | PostgreSQL | ❌ 不适用 | - | - | 有状态服务，不能直接扩展 |
 ## ⚠️ 重要修正：PostgreSQL
 ### 问题说明
 PostgreSQL 是有状态服务，**不能**直接使用 KEDA 扩展副本数，原因：
 1. **存储冲突**: 多个 Pod 尝试挂载同一个 PVC 会失败
 2. **数据损坏**: 如果使用 ReadWriteMany，多实例写入会导致数据损坏
 3. **缺少复制**: 没有配置主从复制，无法保证数据一致性
 ### 正确方案
 已创建详细说明文档：`/home/fei/k3s/009-基础设施/007-keda/scalers/postgresql-说明.md`
 推荐方案：
 1. **PostgreSQL Operator** (Zalando 或 CloudNativePG)
 2. **PgBouncer + KEDA** (扩展连接池而非数据库)
 3. **读写分离** (主库固定，从库使用 KEDA)
 ## 📁 文件结构
 ```
 /home/fei/k3s/009-基础设施/007-keda/
 ├── deploy.sh                      # ✅ 部署脚本
 ├── values.yaml                    # ✅ KEDA Helm 配置
 ├── readme.md                      # ✅ 详细使用文档
 ├── 部署总结.md                    # ✅ 部署总结
 └── scalers/
    ├── navigation-scaler.yaml     # ✅ 已应用
    ├── redis-scaler.yaml          # ⏳ 待应用
    └── postgresql-说明.md         # ⚠️ 重要说明
 ```
 ## 🧪 验证结果
 ### Navigation 服务自动扩缩容
 ```bash
 # 当前状态
 $ kubectl get deployment navigation -n navigation
 NAME         READY   UP-TO-DATE   AVAILABLE   AGE
 navigation   0/0     0            0           8h
 # ScaledObject 状态
 $ kubectl get scaledobject -n navigation
 NAME                READY   ACTIVE   TRIGGERS         AGE
 navigation-scaler   True    False    prometheus,cpu   5m
 # HPA 已自动创建
 $ kubectl get hpa -n navigation
 NAME                         REFERENCE               MINPODS   MAXPODS   REPLICAS
 keda-hpa-navigation-scaler   Deployment/navigation   1         10        0
 ```
 ### 测试从 0 扩容
 ```bash
 # 访问导航页面
 curl https://dh.u6.net3w.com
 # 观察副本数变化（10-30秒）
 kubectl get deployment navigation -n navigation -w
 # 预期: 0/0 → 1/1
 ```
 ## 📊 资源节省预期
 | 服务 | 之前 | 现在 | 节省 |
 |------|------|------|------|
 | Navigation | 24/7 运行 | 按需启动 | 80-90% |
 | Redis | 24/7 运行 | 按需启动 | 70-80% (配置后) |
 | PostgreSQL | 24/7 运行 | 保持运行 | 不适用 |
 ## 🔧 已修复的问题
 ### 1. CPU 触发器配置错误
 **问题**: 使用了已弃用的 `type` 字段
 ```yaml
 # ❌ 错误
 - type: cpu
  metadata:
    type: Utilization
    value: "60"
 ```
 **修复**: 改为 `metricType`
 ```yaml
 # ✅ 正确
 - type: cpu
  metricType: Utilization
  metadata:
    value: "60"
 ```
 ### 2. Navigation 缺少资源配置
 **修复**: 添加了 resources 配置
 ```yaml
 resources:
  requests:
    cpu: 50m
    memory: 64Mi
  limits:
    cpu: 200m
    memory: 128Mi
 ```
 ### 3. PostgreSQL 配置错误
 **修复**:
 - 删除了 `postgresql-scaler.yaml`
 - 创建了 `postgresql-说明.md` 详细说明
 - 更新了所有文档，明确标注不适用
 ## 📚 文档
 - **使用指南**: `/home/fei/k3s/009-基础设施/007-keda/readme.md`
 - **部署总结**: `/home/fei/k3s/009-基础设施/007-keda/部署总结.md`
 - **PostgreSQL 说明**: `/home/fei/k3s/009-基础设施/007-keda/scalers/postgresql-说明.md`
 ## 🎯 下一步建议
 期（1周内）
 1. ✅ 监控 Navigation 服务的扩缩容行为
 2. ⏳ 为 Redis 配置 Prometheus exporter
 3. ⏳ 应用 Redis ScaledObject
 ### 中期（1-2周）
 1. ⏳ 在 Grafana 中导入 KEDA 仪表板 (ID: 14691)
 2. ⏳ 根据实际使用情况调整触发阈值
 3. ⏳ 为其他无状态服务配置 KEDA
 ### 长期（1个月+）
 1. ⏳ 评估是否需要 PostgreSQL 高可用
 2. ⏳ 如需要，部署 PostgreSQL Operator
 3. ⏳ 或部署 PgBouncer 连接池 + KEDA
 ## ⚡ 快速命令
 ```bash
 # 查看 KEDA 状态
 kubectl get pods -n keda
 # 查看所有 ScaledObject
 kubectl get scaledobject -A
 # 查看 HPA
 kubectl get hpa -A
 # 查看 Navigation 副本数
 kubectl get deployment navigation -n navigation -w
 # 测试扩容
 curl https://dh.u6.net3w.com
 # 查看 KEDA 日志
 kubectl logs -n keda -l app.kubernetes.io/name=keda-operator -f
 ```
 ## 🎉 总结
 ✅ **KEDA 已成功部署并运行**
 - Navigation 服务实现按需启动，空闲时自动缩容到 0
 - 修复了所有配置问题
 - 明确了有状态服务（PostgreSQL）的正确处理方式
 - 提供了完整的文档和使用指南
 ⚠️ **重要提醒**
 - 有状态服务不能简单地增加副本数
 - PostgreSQL 需要使用专业的 Operator 或连接池方案
 - 定期监控扩缩容行为，根据实际情况调整配置
 ---
 **KEDA 让您的 K3s 集群更智能、更节省资源！** 🚀
--- a/009-基础设施/007-keda/部署总结.md
+++ b/009-基础设施/007-keda/部署总结.md
@@ -0,0 +1,260 @@
 # KEDA 自动扩缩容部署总结
 部署时间: 2026-01-30
 ## ✅ 部署完成
 ### KEDA 核心组件
 | 组件 | 状态 | 说明 |
 |------|------|------|
 | keda-operator | ✅ Running | KEDA 核心控制器 |
 | keda-metrics-apiserver | ✅ Running | 指标 API 服务器 |
 | keda-admission-webhooks | ✅ Running | 准入 Webhook |
 **命名空间**: `keda`
 ### 已配置的自动扩缩容服务
 #### 1. Navigation 导航服务 ✅
 - **状态**: 已配置并运行
 - **当前副本数**: 0（空闲状态）
 - **配置**:
  - 最小副本: 0
  - 最大副本: 10
  - 触发器: Prometheus (HTTP 请求) + CPU 使用率
  - 冷却期: 3 分钟
 **ScaledObject**: `navigation-scaler`
 **HPA**: `keda-hpa-navigation-scaler`
 #### 2. Redis 缓存服务 ⏳
 - **状态**: 配置文件已创建，待应用
 - **说明**: 需要先为 Redis 配置 Prometheus exporter
 - **配置文件**: `scalers/redis-scaler.yaml`
 #### 3. PostgreSQL 数据库 ❌
 - **状态**: 不推荐使用 KEDA 扩展
 - **原因**:
  - PostgreSQL 是有状态服务，多副本会导致存储冲突
  - 需要配置主从复制才能安全扩展
  - 建议使用 PostgreSQL Operator 或 PgBouncer + KEDA
 - **详细说明**: `scalers/postgresql-说明.md`
 ## 配置文件位置
 ```
 /home/fei/k3s/009-基础设施/007-keda/
 ├── deploy.sh                          # 部署脚本
 ├── values.yaml                        # KEDA Helm 配置
 ├── readme.md                          # 详细文档
 ├── 部署总结.md                        # 本文档
 └── scalers/                           # ScaledObject 配置
    ├── navigation-scaler.yaml         # ✅ 已应用
    ├── redis-scaler.yaml              # ⏳ 待应用
    └── postgresql-说明.md             # ⚠️ PostgreSQL 不适合 KEDA
 ```
 ## 验证 KEDA 功能
 ### 测试缩容到 0
 Navigation 服务已经自动缩容到 0：
 ```bash
 kubectl get deployment navigation -n navigation
 # 输出: READY 0/0
 ```
 ### 测试从 0 扩容
 访问导航页面触发扩容：
 ```bash
 # 1. 访问页面
 curl https://dh.u6.net3w.com
 # 2. 观察副本数变化
 kubectl get deployment navigation -n navigation -w
 # 预期: 10-30 秒内副本数从 0 变为 1
 ```
 ## 查看 KEDA 状态
 ### 查看所有 ScaledObject
 ```bash
 kubectl get scaledobject -A
 ```
 ### 查看 HPA（自动创建）
 ```bash
 kubectl get hpa -A
 ```
 ### 查看 KEDA 日志
 ```bash
 kubectl logs -n keda -l app.kubernetes.io/name=keda-operator -f
 ```
 ## 下一步操作
 ### 1. 应用 Redis 自动扩缩容
 ```bash
 # 首先需要为 Redis 添加 Prometheus exporter
 # 然后应用 ScaledObject
 kubectl apply -f /home/fei/k3s/009-基础设施/007-keda/scalers/redis-scaler.yaml
 ```
 ### 2. PostgreSQL 扩展方案
 **不要使用 KEDA 直接扩展 PostgreSQL！**
 推荐方案：
 - **方案 1**: 使用 PostgreSQL Operator（Zalando 或 CloudNativePG）
 - **方案 2**: 部署 PgBouncer 连接池 + KEDA 扩展 PgBouncer
 - **方案 3**: 配置读写分离，只对只读副本使用 KEDA
 详细说明：`/home/fei/k3s/009-基础设施/007-keda/scalers/postgresql-说明.md`
 ### 3. 监控扩缩容行为
 在 Grafana 中导入 KEDA 仪表板：
 - 访问: https://grafana.u6.net3w.com
 - 导入仪表板 ID: **14691**
 ## 已修复的问题
 ### 问题 1: CPU 触发器配置错误
 **错误信息**:
 ```
 The 'type' setting is DEPRECATED and is removed in v2.18 - Use 'metricType' instead.
 ```
 **解决方案**:
 将 CPU 触发器配置从：
 ```yaml
 - type: cpu
  metadata:
    type: Utilization
    value: "60"
 ```
 改为：
 ```yaml
 - type: cpu
  metricType: Utilization
  metadata:
    value: "60"
 ```
 ### 问题 2: Navigation 缺少资源配置
 **解决方案**:
 为 Navigation deployment 添加了 resources 配置：
 ```yaml
 resources:
  requests:
    cpu: 50m
    memory: 64Mi
  limits:
    cpu: 200m
    memory: 128Mi
 ```
 ## 资源节省效果
 ### Navigation 服务
 - **之前**: 24/7 运行 1 个副本
 - **现在**: 空闲时 0 个副本，有流量时自动启动
 - **预计节省**: 80-90% 资源（假设大部分时间空闲）
 ### 预期总体效果
 - **Navigation**: 节省 80-90% 资源 ✅
 - **Redis**: 节省 70-80% 资源（配置后）⏳
 - **PostgreSQL**: ❌ 不使用 KEDA，保持单实例运行
 ## 监控指标
 ### Prometheus 查询
 ```promql
 # KEDA Scaler 活跃状态
 keda_scaler_active{namespace="navigation"}
 # 当前指标值
 keda_scaler_metrics_value{scaledObject="navigation-scaler"}
 # HPA 当前副本数
 kube_horizontalpodautoscaler_status_current_replicas{horizontalpodautoscaler="keda-hpa-navigation-scaler"}
 ```
 ## 注意事项
 ### 1. 冷启动时间
 从 0 扩容到可用需要 10-30 秒：
 - 拉取镜像（如果本地没有）
 - 启动容器
 - 健康检查通过
 ### 2. 连接保持
 客户端需要支持重连机制，因为服务可能会缩容到 0。
 ### 3. 有状态服务
 PostgreSQL 等有状态服务**不能**直接使用 KEDA 扩展：
 - ❌ 多副本会导致存储冲突
 - ❌ 没有主从复制会导致数据不一致
 - ✅ 需要使用专业的 Operator 或连接池方案
 ## 故障排查
 ### ScaledObject 未生效
 ```bash
 # 查看详细状态
 kubectl describe scaledobject <name> -n <namespace>
 # 查看事件
 kubectl get events -n <namespace> --sort-by='.lastTimestamp'
 ```
 ### HPA 未创建
 检查 KEDA operator 日志：
 ```bash
 kubectl logs -n keda -l app.kubernetes.io/name=keda-operator
 ```
 ## 文档参考
 - 详细使用文档: `/home/fei/k3s/009-基础设施/007-keda/readme.md`
 - KEDA 官方文档: https://keda.sh/docs/
 - Scalers 参考: https://keda.sh/docs/scalers/
 ## 总结
 ✅ **KEDA 已成功部署并运行**
 - KEDA 核心组件运行正常
 - Navigation 服务已配置自动扩缩容
 - 已验证缩容到 0 功能正常
 - 准备好为更多服务配置自动扩缩容
 **下一步**: 根据实际使用情况，逐步为 Redis 和 PostgreSQL 配置自动扩缩容。
 ---
 **KEDA 让您的 K3s 集群更智能、更节省资源！** 🚀
--- a/009-基础设施/008-portainer/README.md
+++ b/009-基础设施/008-portainer/README.md
@@ -0,0 +1,191 @@
 # Portainer 部署指南
 ## 概述
 本文档记录了在 k3s 集群中部署 Portainer 的完整过程，包括域名绑定、KEDA 自动缩放和 CSRF 校验问题的解决方案。
 ## 部署步骤
 ### 1. 使用 Helm 安装 Portainer
 ```bash
 # 添加 Helm 仓库
 helm repo add portainer https://portainer.github.io/k8s/
 helm repo update
 # 安装 Portainer（使用 Longhorn 作为存储类）
 helm install --create-namespace -n portainer portainer portainer/portainer \
  --set persistence.enabled=true \
  --set persistence.storageClass=longhorn \
  --set service.type=NodePort
 ```
 ### 2. 配置域名访问
 #### 2.1 Caddy 反向代理配置
 修改 Caddy ConfigMap，添加 Portainer 的反向代理规则：
 ```yaml
 # Portainer 容器管理 - 直接转发到 Portainer HTTPS 端口
 portainer.u6.net3w.com {
    reverse_proxy https://portainer.portainer.svc.cluster.local:9443 {
        transport http {
            tls_insecure_skip_verify
        }
    }
 }
 ```
 **关键点：**
 - 直接转发到 Portainer 的 HTTPS 端口（9443），而不是通过 Traefik
 - 这样可以避免协议不匹配导致的 CSRF 校验失败
 #### 2.2 更新 Caddy ConfigMap
 ```bash
 kubectl patch configmap caddy-config -n default --type merge -p '{"data":{"Caddyfile":"..."}}'
 ```
 #### 2.3 重启 Caddy Pod
 ```bash
 kubectl delete pod -n default -l app=caddy
 ```
 ### 3. 配置 KEDA 自动缩放（可选）
 如果需要实现访问时启动、空闲时缩容的功能，应用 KEDA 配置：
 ```bash
 kubectl apply -f keda-scaler.yaml
 ```
 **配置说明：**
 - 最小副本数：0（空闲时缩容到 0）
 - 最大副本数：3
 - 缩容延迟：5 分钟无流量后缩容
 ### 4. 解决 CSRF 校验问题
 #### 问题描述
 登录时提示 "Unable to login"，日志显示：
 ```
 Failed to validate Origin or Referer | error="origin invalid"
 ```
 #### 问题原因
 Portainer 新版本对 CSRF 校验非常严格。当通过域名访问时，协议不匹配导致校验失败：
 - 客户端发送：HTTPS 请求
 - Portainer 接收：x_forwarded_proto=http
 #### 解决方案
 **步骤 1：添加环境变量禁用 CSRF 校验**
 ```bash
 kubectl set env deployment/portainer -n portainer CONTROLLER_DISABLE_CSRF=true
 ```
 **步骤 2：添加环境变量配置 origins**
 ```bash
 kubectl set env deployment/portainer -n portainer PORTAINER_ADMIN_ORIGINS="*"
 ```
 **步骤 3：重启 Portainer**
 ```bash
 kubectl rollout restart deployment portainer -n portainer
 ```
 **步骤 4：修改 Caddy 配置（最关键）**
 直接转发到 Portainer 的 HTTPS 端口，避免通过 Traefik 导致的协议转换问题：
 ```yaml
 portainer.u6.net3w.com {
    reverse_proxy https://portainer.portainer.svc.cluster.local:9443 {
        transport http {
            tls_insecure_skip_verify
        }
    }
 }
 ```
 ## 配置文件
 ### portainer-server.yaml
 记录 Portainer deployment 的环境变量配置：
 ```yaml
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: portainer
  namespace: portainer
 spec:
  template:
    spec:
      containers:
      - name: portainer
        env:
        - name: CONTROLLER_DISABLE_CSRF
          value: "true"
        - name: PORTAINER_ADMIN_ORIGINS
          value: "*"
 ```
 ### keda-scaler.yaml
 KEDA 自动缩放配置，实现访问时启动、空闲时缩容。
 ## 访问 Portainer
 部署完成后，访问：
 ```
 https://portainer.u6.net3w.com
 ```
 ## 常见问题
 ### Q: 登录时提示 "Unable to login"
 **A:** 这通常是 CSRF 校验失败导致的。检查以下几点：
 1. 确认已添加环境变量 `CONTROLLER_DISABLE_CSRF=true`
 2. 确认 Caddy 配置直接转发到 Portainer HTTPS 端口
 3. 检查 Portainer 日志中是否有 "origin invalid" 错误
 4. 重启 Portainer pod 使配置生效
 ### Q: 为什么要直接转发到 HTTPS 端口而不是通过 Traefik？
 **A:** 因为通过 Traefik 转发时，协议头会被转换为 HTTP，导致 Portainer 接收到的协议与客户端发送的协议不匹配，从而 CSRF 校验失败。直接转发到 HTTPS 端口可以保持协议一致。
 ### Q: KEDA 自动缩放是否必须配置？
 **A:** 不是必须的。KEDA 自动缩放是可选功能，用于节省资源。如果不需要自动缩放，可以跳过这一步。
 ## 相关文件
 - `portainer-server.yaml` - Portainer deployment 环境变量配置
 - `keda-scaler.yaml` - KEDA 自动缩放配置
 - `ingress.yaml` - 原始 Ingress 配置（已弃用，改用 Caddy 直接转发）
 ## 下次部署检查清单
 - [ ] 使用 Helm 安装 Portainer
 - [ ] 修改 Caddy 配置，直接转发到 Portainer HTTPS 端口
 - [ ] 添加 Portainer 环境变量（CONTROLLER_DISABLE_CSRF、PORTAINER_ADMIN_ORIGINS）
 - [ ] 重启 Caddy 和 Portainer pods
 - [ ] 测试登录功能
 - [ ] （可选）配置 KEDA 自动缩放
 ## 参考资源
 - Portainer 官方文档：https://docs.portainer.io/
 - k3s 官方文档：https://docs.k3s.io/
 - KEDA 官方文档：https://keda.sh/
--- a/009-基础设施/008-portainer/ingress.yaml
+++ b/009-基础设施/008-portainer/ingress.yaml
@@ -0,0 +1,20 @@
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: portainer-ingress
  namespace: portainer
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web
 spec:
  ingressClassName: traefik
  rules:
  - host: portainer.u6.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: portainer
            port:
              number: 9000
--- a/009-基础设施/008-portainer/keda-scaler.yaml
+++ b/009-基础设施/008-portainer/keda-scaler.yaml
@@ -0,0 +1,58 @@
 ---
 # HTTPScaledObject - 用于实现缩容到 0 的核心配置
 apiVersion: http.keda.sh/v1alpha1
 kind: HTTPScaledObject
 metadata:
  name: portainer-http-scaler
  namespace: portainer
 spec:
  hosts:
  - portainer.u6.net3w.com
  pathPrefixes:
  - /
  scaleTargetRef:
    name: portainer
    kind: Deployment
    apiVersion: apps/v1
    service: portainer
    port: 9000
  replicas:
    min: 0                              # 空闲时缩容到 0
    max: 3                              # 最多 3 个副本
  scalingMetric:
    requestRate:
      granularity: 1s
      targetValue: 50                   # 每秒 50 个请求时扩容
      window: 1m
  scaledownPeriod: 300                  # 5 分钟无流量后缩容到 0
 ---
 # Traefik Middleware - 设置正确的协议头
 apiVersion: traefik.io/v1alpha1
 kind: Middleware
 metadata:
  name: portainer-headers
  namespace: keda
 spec:
  headers:
    customRequestHeaders:
      X-Forwarded-Proto: "https"
 ---
 # Traefik IngressRoute - 将流量路由到 KEDA HTTP Add-on 的拦截器
 apiVersion: traefik.io/v1alpha1
 kind: IngressRoute
 metadata:
  name: portainer-ingress
  namespace: keda
 spec:
  entryPoints:
    - web
  routes:
  - match: Host(`portainer.u6.net3w.com`)
    kind: Rule
    middlewares:
    - name: portainer-headers
    services:
    - name: keda-add-ons-http-interceptor-proxy
      port: 8080
--- a/009-基础设施/008-portainer/portainer-server.yaml
+++ b/009-基础设施/008-portainer/portainer-server.yaml
@@ -0,0 +1,16 @@
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: portainer
  namespace: portainer
 spec:
  template:
    spec:
      containers:
      - name: portainer
        env:
        - name: CONTROLLER_DISABLE_CSRF
          value: "true"
        # 说明：禁用 CSRF 校验是因为 Portainer 新版本对 CSRF 校验非常严格
        # 当使用域名访问时（如 portainer.u6.net3w.com），需要禁用此校验
        # 如果需要重新启用，将此值改为 "false" 或删除此环境变量
--- a/009-基础设施/008-portainer/readme.md
+++ b/009-基础设施/008-portainer/readme.md
@@ -0,0 +1,10 @@
 # 添加 Helm 仓库
 helm repo add portainer https://portainer.github.io/k8s/
 helm repo update
 # 安装 Portainer
 # 注意：这里我们利用 Longhorn 作为默认存储类
 helm install --create-namespace -n portainer portainer portainer/portainer \
 --set persistence.enabled=true \
 --set persistence.storageClass=longhorn \
 --set service.type=NodePort
--- a/009-基础设施/域名绑定配置.md
+++ b/009-基础设施/域名绑定配置.md
@@ -0,0 +1,272 @@
 # 域名绑定配置总结
 ## 配置完成时间
 2026-01-30
 ## 域名配置
 所有服务已绑定到 `*.u9.net3w.com` 子域名，通过 Caddy 作为前端反向代理。
 ### 已配置的子域名
 | 服务 | 域名 | 后端服务 | 命名空间 |
 |------|------|---------|---------|
 | Longhorn UI | https://longhorn.u9.net3w.com | longhorn-frontend:80 | longhorn-system |
 | Grafana | https://grafana.u9.net3w.com | kube-prometheus-stack-grafana:80 | monitoring |
 | Prometheus | https://prometheus.u9.net3w.com | kube-prometheus-stack-prometheus:9090 | monitoring |
 | Alertmanager | https://alertmanager.u9.net3w.com | kube-prometheus-stack-alertmanager:9093 | monitoring |
 | MinIO S3 API | https://s3.u6.net3w.com | minio:9000 | minio |
 | MinIO Console | https://console.s3.u6.net3w.com | minio:9001 | minio |
 ## 架构说明
 ```
 Internet (*.u9.net3w.com)
    ↓
 Caddy (前端反向代理, 80/443)
    ↓
 Traefik Ingress Controller
    ↓
 Kubernetes Services
 ```
 ### 流量路径
 1. **外部请求** → DNS 解析到服务器 IP
 2. **Caddy** (端口 80/443) → 接收请求，自动申请 Let's Encrypt SSL 证书
 3. **Traefik** → Caddy 转发到 Traefik Ingress Controller
 4. **Kubernetes Service** → Traefik 根据 Ingress 规则路由到对应服务
 ## Caddy 配置
 配置文件位置: `/home/fei/k3s/009-基础设施/005-ingress/Caddyfile`
 ```caddyfile
 {
    email admin@u6.net3w.com
 }
 # Longhorn 存储管理
 longhorn.u9.net3w.com {
    reverse_proxy traefik.kube-system.svc.cluster.local:80
 }
 # Grafana 监控仪表板
 grafana.u9.net3w.com {
    reverse_proxy traefik.kube-system.svc.cluster.local:80
 }
 # Prometheus 监控
 prometheus.u9.net3w.com {
    reverse_proxy traefik.kube-system.svc.cluster.local:80
 }
 # Alertmanager 告警管理
 alertmanager.u9.net3w.com {
    reverse_proxy traefik.kube-system.svc.cluster.local:80
 }
 ```
 ## Ingress 配置
 ### Longhorn Ingress
 - 文件: `/home/fei/k3s/009-基础设施/005-ingress/longhorn-ingress.yaml`
 - Host: `longhorn.u9.net3w.com`
 ### 监控系统 Ingress
 - 文件: `/home/fei/k3s/009-基础设施/006-monitoring/ingress.yaml`
 - Hosts:
  - `grafana.u9.net3w.com`
  - `prometheus.u9.net3w.com`
  - `alertmanager.u9.net3w.com`
 ## SSL/TLS 证书
 Caddy 会自动为所有配置的域名申请和续期 Let's Encrypt SSL 证书。
 - **证书存储**: Caddy Pod 的 `/data` 目录
 - **自动续期**: Caddy 自动管理
 - **邮箱**: admin@u6.net3w.com
 ## 访问地址
 ### 监控和管理
 - **Longhorn 存储管理**: https://longhorn.u9.net3w.com
 - **Grafana 监控**: https://grafana.u9.net3w.com
  - 用户名: `admin`
  - 密码: `prom-operator`
 - **Prometheus**: https://prometheus.u9.net3w.com
 - **Alertmanager**: https://alertmanager.u9.net3w.com
 ### 对象存储
 - **MinIO S3 API**: https://s3.u6.net3w.com
 - **MinIO Console**: https://console.s3.u6.net3w.com
 ## DNS 配置
 确保以下 DNS 记录已配置（A 记录或 CNAME）：
 ```
 *.u9.net3w.com  →  <服务器IP>
 ```
 或者单独配置每个子域名：
 ```
 longhorn.u9.net3w.com      →  <服务器IP>
 grafana.u9.net3w.com       →  <服务器IP>
 prometheus.u9.net3w.com    →  <服务器IP>
 alertmanager.u9.net3w.com  →  <服务器IP>
 ```
 ## 验证配置
 ### 检查 Caddy 状态
 ```bash
 kubectl get pods -n default -l app=caddy
 kubectl logs -n default -l app=caddy -f
 ```
 ### 检查 Ingress 状态
 ```bash
 kubectl get ingress -A
 ```
 ### 测试域名访问
 ```bash
 curl -I https://longhorn.u9.net3w.com
 curl -I https://grafana.u9.net3w.com
 curl -I https://prometheus.u9.net3w.com
 curl -I https://alertmanager.u9.net3w.com
 ```
 ## 添加新服务
 如果需要添加新的服务到 u9.net3w.com 域名：
 ### 1. 更新 Caddyfile
 编辑 `/home/fei/k3s/009-基础设施/005-ingress/Caddyfile`，添加：
 ```caddyfile
 newservice.u9.net3w.com {
    reverse_proxy traefik.kube-system.svc.cluster.local:80
 }
 ```
 ### 2. 更新 Caddy ConfigMap
 ```bash
 kubectl create configmap caddy-config \
  --from-file=Caddyfile=/home/fei/k3s/009-基础设施/005-ingress/Caddyfile \
  -n default --dry-run=client -o yaml | kubectl apply -f -
 ```
 ### 3. 重启 Caddy
 ```bash
 kubectl rollout restart deployment caddy -n default
 ```
 ### 4. 创建 Ingress
 ```yaml
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: newservice-ingress
  namespace: your-namespace
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: web
 spec:
  rules:
  - host: newservice.u9.net3w.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: your-service
            port:
              number: 80
 ```
 ### 5. 应用 Ingress
 ```bash
 kubectl apply -f newservice-ingress.yaml
 ```
 ## 故障排查
 ### Caddy 无法启动
 ```bash
 # 查看 Caddy 日志
 kubectl logs -n default -l app=caddy
 # 检查 ConfigMap
 kubectl get configmap caddy-config -n default -o yaml
 ```
 ### 域名无法访问
 ```bash
 # 检查 Ingress
 kubectl describe ingress <ingress-name> -n <namespace>
 # 检查 Traefik
 kubectl logs -n kube-system -l app.kubernetes.io/name=traefik
 # 测试内部连接
 kubectl run test --rm -it --image=curlimages/curl -- curl -v http://traefik.kube-system.svc.cluster.local:80
 ```
 ### SSL 证书问题
 ```bash
 # 查看 Caddy 证书状态
 kubectl exec -n default -it <caddy-pod> -- ls -la /data/caddy/certificates/
 # 强制重新申请证书
 kubectl rollout restart deployment caddy -n default
 ```
 ## 安全建议
 1. **启用基本认证**: 为敏感服务（如 Prometheus、Alertmanager）添加认证
 2. **IP 白名单**: 限制管理界面的访问 IP
 3. **定期更新**: 保持 Caddy 和 Traefik 版本更新
 4. **监控日志**: 定期检查访问日志，发现异常访问
 ## 维护命令
 ```bash
 # 更新 Caddy 配置
 kubectl create configmap caddy-config \
  --from-file=Caddyfile=/home/fei/k3s/009-基础设施/005-ingress/Caddyfile \
  -n default --dry-run=client -o yaml | kubectl apply -f -
 kubectl rollout restart deployment caddy -n default
 # 查看所有 Ingress
 kubectl get ingress -A
 # 查看 Caddy 日志
 kubectl logs -n default -l app=caddy -f
 # 查看 Traefik 日志
 kubectl logs -n kube-system -l app.kubernetes.io/name=traefik -f
 ```
 ## 备份
 重要配置文件已保存在：
 - Caddyfile: `/home/fei/k3s/009-基础设施/005-ingress/Caddyfile`
 - Longhorn Ingress: `/home/fei/k3s/009-基础设施/005-ingress/longhorn-ingress.yaml`
 - 监控 Ingress: `/home/fei/k3s/009-基础设施/006-monitoring/ingress.yaml`
 建议定期备份这些配置文件。
 ---
 **配置完成！所有服务现在可以通过 *.u9.net3w.com 域名访问。** 🎉
--- a/009-基础设施/部署总结.md
+++ b/009-基础设施/部署总结.md
@@ -0,0 +1,225 @@
 # K3s 基础设施部署总结
 部署日期: 2026-01-30
 ## 已完成的基础设施组件
 ### ✅ 1. Helm 包管理工具
 - **版本**: v3.20.0
 - **位置**: /usr/local/bin/helm
 - **配置**: KUBECONFIG 已添加到 ~/.bashrc
 ### ✅ 2. Longhorn 分布式存储
 - **版本**: v1.11.0
 - **命名空间**: longhorn-system
 - **存储类**: longhorn (默认)
 - **S3 备份**: 已配置 MinIO S3 备份
  - 备份目标: s3://longhorn-backup@us-east-1/
  - 凭证 Secret: longhorn-crypto
 - **访问**: http://longhorn.local
 ### ✅ 3. Redis 中间件
 - **版本**: Redis 7 (Alpine)
 - **命名空间**: redis
 - **存储**: 5Gi Longhorn 卷
 - **持久化**: RDB + AOF 双重持久化
 - **内存限制**: 2GB
 - **访问**: redis.redis.svc.cluster.local:6379
 ### ✅ 4. PostgreSQL 数据库
 - **版本**: PostgreSQL 16.11
 - **命名空间**: postgresql
 - **存储**: 10Gi Longhorn 卷
 - **内存限制**: 2GB
 - **访问**: postgresql-service.postgresql.svc.cluster.local:5432
 - **凭证**:
  - 用户: postgres
  - 密码: postgres123
 ### ✅ 5. Traefik Ingress 控制器
 - **状态**: K3s 默认已安装
 - **命名空间**: kube-system
 - **已配置 Ingress**:
  - Longhorn UI: http://longhorn.local
  - MinIO API: http://s3.u6.net3w.com
  - MinIO Console: http://console.s3.u6.net3w.com
  - Grafana: http://grafana.local
  - Prometheus: http://prometheus.local
  - Alertmanager: http://alertmanager.local
 ### ✅ 6. Prometheus + Grafana 监控系统
 - **命名空间**: monitoring
 - **组件**:
  - Prometheus: 时间序列数据库 (20Gi 存储, 15天保留)
  - Grafana: 可视化仪表板 (5Gi 存储)
  - Alertmanager: 告警管理 (5Gi 存储)
  - Node Exporter: 节点指标收集
  - Kube State Metrics: K8s 资源状态
 - **Grafana 凭证**:
  - 用户: admin
  - 密码: prom-operator
 - **访问**:
  - Grafana: http://grafana.local
  - Prometheus: http://prometheus.local
  - Alertmanager: http://alertmanager.local
 ## 目录结构
 ```
 /home/fei/k3s/009-基础设施/
 ├── 003-helm/
 │   ├── install_helm.sh
 │   └── readme.md
 ├── 004-longhorn/
 │   ├── deploy.sh
 │   ├── s3-secret.yaml
 │   ├── values.yaml
 │   ├── readme.md
 │   └── 说明.md
 ├── 005-ingress/
 │   ├── deploy-longhorn-ingress.sh
 │   ├── longhorn-ingress.yaml
 │   └── readme.md
 └── 006-monitoring/
    ├── deploy.sh
    ├── values.yaml
    ├── ingress.yaml
    └── readme.md
 /home/fei/k3s/010-中间件/
 ├── 001-redis/
 │   ├── deploy.sh
 │   ├── redis-deployment.yaml
 │   └── readme.md
 └── 002-postgresql/
    ├── deploy.sh
    ├── postgresql-deployment.yaml
    └── readme.md
 ```
 ## 存储使用情况
 | 组件 | 存储大小 | 存储类 |
 |------|---------|--------|
 | MinIO | 50Gi | local-path |
 | Redis | 5Gi | longhorn |
 | PostgreSQL | 10Gi | longhorn |
 | Prometheus | 20Gi | longhorn |
 | Grafana | 5Gi | longhorn |
 | Alertmanager | 5Gi | longhorn |
 | **总计** | **95Gi** | - |
 ## 访问地址汇总
 需要在 `/etc/hosts` 中添加以下配置（将 `<节点IP>` 替换为实际 IP）：
 ```
 <节点IP> longhorn.local
 <节点IP> grafana.local
 <节点IP> prometheus.local
 <节点IP> alertmanager.local
 <节点IP> s3.u6.net3w.com
 <节点IP> console.s3.u6.net3w.com
 ```
 ## 快速验证命令
 ```bash
 # 查看所有命名空间的 Pods
 kubectl get pods -A
 # 查看所有 PVC
 kubectl get pvc -A
 # 查看所有 Ingress
 kubectl get ingress -A
 # 查看存储类
 kubectl get storageclass
 # 测试 Redis
 kubectl exec -n redis $(kubectl get pod -n redis -l app=redis -o jsonpath='{.items[0].metadata.name}') -- redis-cli ping
 # 测试 PostgreSQL
 kubectl exec -n postgresql postgresql-0 -- psql -U postgres -c "SELECT version();"
 ```
 ## 备份策略
 1. **Longhorn 卷备份**:
   - 所有持久化数据存储在 Longhorn 卷上
   - 可通过 Longhorn UI 创建快照
   - 自动备份到 MinIO S3 (s3://longhorn-backup@us-east-1/)
 2. **数据库备份**:
   - Redis: AOF + RDB 持久化
   - PostgreSQL: 可使用 pg_dump 进行逻辑备份
 3. **配置备份**:
   - 所有配置文件已保存在 `/home/fei/k3s/` 目录
   - 建议定期备份此目录
 ## 下一步建议
 1. **安全加固**:
   - 修改 PostgreSQL 默认密码
   - 配置 TLS/SSL 证书
   - 启用 RBAC 权限控制
 2. **监控优化**:
   - 配置告警通知（邮件、Slack、钉钉）
   - 导入更多 Grafana 仪表板
   - 为 Redis 和 PostgreSQL 添加专用监控
 3. **高可用**:
   - 考虑 Redis 主从复制或 Sentinel
   - 考虑 PostgreSQL 主从复制
   - 增加 K3s 节点实现多节点高可用
 4. **日志收集**:
   - 部署 Loki 或 ELK 进行日志聚合
   - 配置日志持久化和查询
 5. **CI/CD**:
   - 部署 GitLab Runner 或 Jenkins
   - 配置自动化部署流程
 ## 维护命令
 ```bash
 # 更新 Helm 仓库
 helm repo update
 # 升级 Longhorn
 helm upgrade longhorn longhorn/longhorn --namespace longhorn-system -f values.yaml
 # 升级监控栈
 helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack --namespace monitoring -f values.yaml
 # 查看 Helm 发布
 helm list -A
 # 清理未使用的镜像
 kubectl get pods -A -o jsonpath='{range .items[*]}{.spec.containers[*].image}{"\n"}{end}' | sort -u
 ```
 ## 故障排查
 如果遇到问题，请检查：
 1. Pod 状态: `kubectl get pods -A`
 2. 事件日志: `kubectl get events -A --sort-by='.lastTimestamp'`
 3. Pod 日志: `kubectl logs -n <namespace> <pod-name>`
 4. 存储状态: `kubectl get pvc -A`
 5. Longhorn 卷状态: 访问 http://longhorn.local
 ## 联系和支持
 - Longhorn 文档: https://longhorn.io/docs/
 - Prometheus 文档: https://prometheus.io/docs/
 - Grafana 文档: https://grafana.com/docs/
 - K3s 文档: https://docs.k3s.io/
 ---
 **部署完成！所有基础设施组件已成功运行。** 🎉
--- a/010-中间件/001-redis/deploy.sh
+++ b/010-中间件/001-redis/deploy.sh
@@ -0,0 +1,17 @@
 #!/bin/bash
 # 创建命名空间
 kubectl create namespace redis
 # 部署 Redis
 kubectl apply -f redis-deployment.yaml
 # 等待 Redis 启动
 echo "等待 Redis 启动..."
 kubectl wait --for=condition=ready pod -l app=redis -n redis --timeout=300s
 # 显示状态
 echo "Redis 部署完成！"
 kubectl get pods -n redis
 kubectl get pvc -n redis
 kubectl get svc -n redis
--- a/010-中间件/001-redis/readme.md
+++ b/010-中间件/001-redis/readme.md
@@ -0,0 +1,52 @@
 # Redis 部署说明
 ## 配置信息
 - **命名空间**: redis
 - **存储**: 使用 Longhorn 提供 5Gi 持久化存储
 - **镜像**: redis:7-alpine
 - **持久化**: 启用 RDB + AOF 双重持久化
 - **内存限制**: 2GB
 - **访问地址**: redis.redis.svc.cluster.local:6379
 ## 部署方式
 ```bash
 bash deploy.sh
 ```
 ## 持久化配置
 ### RDB 快照
 - 900秒内至少1个key变化
 - 300秒内至少10个key变化
 - 60秒内至少10000个key变化
 ### AOF 日志
 - 每秒同步一次
 - 自动重写阈值: 64MB
 ## 内存策略
 - 最大内存: 2GB
 - 淘汰策略: allkeys-lru (所有key的LRU算法)
 ## 连接测试
 在集群内部测试连接：
 ```bash
 kubectl run redis-test --rm -it --image=redis:7-alpine -- redis-cli -h redis.redis.svc.cluster.local ping
 ```
 ## 备份说明
 Redis 数据存储在 Longhorn 卷上，可以通过 Longhorn UI 创建快照和备份到 S3。
 ## 监控
 可以通过以下命令查看 Redis 状态：
 ```bash
 kubectl exec -n redis $(kubectl get pod -n redis -l app=redis -o jsonpath='{.items[0].metadata.name}') -- redis-cli info
 ```
--- a/010-中间件/001-redis/redis-deployment.yaml
+++ b/010-中间件/001-redis/redis-deployment.yaml
@@ -0,0 +1,123 @@
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: redis-pvc
  namespace: redis
 spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 5Gi
 ---
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: redis-config
  namespace: redis
 data:
  redis.conf: |
    # Redis 配置
    bind 0.0.0.0
    protected-mode yes
    port 6379
    tcp-backlog 511
    timeout 0
    tcp-keepalive 300
    # 持久化配置
    save 900 1
    save 300 10
    save 60 10000
    stop-writes-on-bgsave-error yes
    rdbcompression yes
    rdbchecksum yes
    dbfilename dump.rdb
    dir /data
    # AOF 持久化
    appendonly yes
    appendfilename "appendonly.aof"
    appendfsync everysec
    no-appendfsync-on-rewrite no
    auto-aof-rewrite-percentage 100
    auto-aof-rewrite-min-size 64mb
    # 内存管理
    maxmemory 2gb
    maxmemory-policy allkeys-lru
    # 日志
    loglevel notice
    logfile ""
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: redis
  namespace: redis
 spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        command:
          - redis-server
          - /etc/redis/redis.conf
        ports:
        - containerPort: 6379
          name: redis
        volumeMounts:
        - name: data
          mountPath: /data
        - name: config
          mountPath: /etc/redis
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        livenessProbe:
          tcpSocket:
            port: 6379
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - redis-cli
            - ping
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: redis-pvc
      - name: config
        configMap:
          name: redis-config
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: redis
  namespace: redis
 spec:
  selector:
    app: redis
  ports:
  - port: 6379
    targetPort: 6379
    protocol: TCP
  type: ClusterIP
--- a/010-中间件/002-postgresql/deploy.sh
+++ b/010-中间件/002-postgresql/deploy.sh
@@ -0,0 +1,25 @@
 #!/bin/bash
 # 创建命名空间
 kubectl create namespace postgresql
 # 部署 PostgreSQL
 kubectl apply -f postgresql-deployment.yaml
 # 等待 PostgreSQL 启动
 echo "等待 PostgreSQL 启动..."
 kubectl wait --for=condition=ready pod -l app=postgresql -n postgresql --timeout=300s
 # 显示状态
 echo "PostgreSQL 部署完成！"
 kubectl get pods -n postgresql
 kubectl get pvc -n postgresql
 kubectl get svc -n postgresql
 echo ""
 echo "连接信息："
 echo "  主机: postgresql-service.postgresql.svc.cluster.local"
 echo "  端口: 5432"
 echo "  用户: postgres"
 echo "  密码: postgres123"
 echo "  数据库: postgres"
--- a/010-中间件/002-postgresql/postgresql-deployment.yaml
+++ b/010-中间件/002-postgresql/postgresql-deployment.yaml
@@ -0,0 +1,167 @@
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: postgresql-pvc
  namespace: postgresql
 spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi
 ---
 apiVersion: v1
 kind: Secret
 metadata:
  name: postgresql-secret
  namespace: postgresql
 type: Opaque
 stringData:
  POSTGRES_PASSWORD: "postgres123"
  POSTGRES_USER: "postgres"
  POSTGRES_DB: "postgres"
 ---
 apiVersion: v1
 kind: ConfigMap
 metadata:
  name: postgresql-config
  namespace: postgresql
 data:
  postgresql.conf: |
    # 连接设置
    listen_addresses = '*'
    max_connections = 100
    # 内存设置
    shared_buffers = 256MB
    effective_cache_size = 1GB
    maintenance_work_mem = 64MB
    work_mem = 4MB
    # WAL 设置
    wal_level = replica
    max_wal_size = 1GB
    min_wal_size = 80MB
    # 日志设置
    logging_collector = on
    log_directory = 'log'
    log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
    log_statement = 'all'
    log_duration = on
    # 性能优化
    random_page_cost = 1.1
    effective_io_concurrency = 200
  pg_hba.conf: |
    # TYPE  DATABASE        USER            ADDRESS                 METHOD
    local   all             all                                     trust
    host    all             all             0.0.0.0/0               md5
    host    all             all             ::0/0                   md5
 ---
 apiVersion: apps/v1
 kind: StatefulSet
 metadata:
  name: postgresql
  namespace: postgresql
 spec:
  serviceName: postgresql
  replicas: 1
  selector:
    matchLabels:
      app: postgresql
  template:
    metadata:
      labels:
        app: postgresql
    spec:
      containers:
      - name: postgresql
        image: postgres:16-alpine
        ports:
        - containerPort: 5432
          name: postgresql
        env:
        - name: POSTGRES_USER
          valueFrom:
            secretKeyRef:
              name: postgresql-secret
              key: POSTGRES_USER
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgresql-secret
              key: POSTGRES_PASSWORD
        - name: POSTGRES_DB
          valueFrom:
            secretKeyRef:
              name: postgresql-secret
              key: POSTGRES_DB
        - name: PGDATA
          value: /var/lib/postgresql/data/pgdata
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
        - name: config
          mountPath: /etc/postgresql
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        livenessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - postgres
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - pg_isready
            - -U
            - postgres
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: postgresql-pvc
      - name: config
        configMap:
          name: postgresql-config
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: postgresql
  namespace: postgresql
 spec:
  selector:
    app: postgresql
  ports:
  - port: 5432
    targetPort: 5432
    protocol: TCP
  type: ClusterIP
  clusterIP: None
 ---
 apiVersion: v1
 kind: Service
 metadata:
  name: postgresql-service
  namespace: postgresql
 spec:
  selector:
    app: postgresql
  ports:
  - port: 5432
    targetPort: 5432
    protocol: TCP
  type: ClusterIP
--- a/010-中间件/002-postgresql/readme.md
+++ b/010-中间件/002-postgresql/readme.md
@@ -0,0 +1,99 @@
 # PostgreSQL 16 部署说明
 ## 配置信息
 - **命名空间**: postgresql
 - **版本**: PostgreSQL 16 (Alpine)
 - **存储**: 使用 Longhorn 提供 10Gi 持久化存储
 - **内存限制**: 2GB
 - **访问地址**: postgresql-service.postgresql.svc.cluster.local:5432
 ## 默认凭证
 - **用户名**: postgres
 - **密码**: postgres123
 - **数据库**: postgres
 ⚠️ **安全提示**: 生产环境请修改默认密码！
 ## 部署方式
 ```bash
 bash deploy.sh
 ```
 ## 数据库配置
 ### 连接设置
 - 最大连接数: 100
 - 监听地址: 所有接口 (*)
 ### 内存配置
 - shared_buffers: 256MB
 - effective_cache_size: 1GB
 - work_mem: 4MB
 ### WAL 配置
 - wal_level: replica (支持主从复制)
 - max_wal_size: 1GB
 ### 日志配置
 - 记录所有 SQL 语句
 - 记录执行时间
 ## 连接测试
 在集群内部测试连接：
 ```bash
 kubectl run pg-test --rm -it --image=postgres:16-alpine --env="PGPASSWORD=postgres123" -- psql -h postgresql-service.postgresql.svc.cluster.local -U postgres -c "SELECT version();"
 ```
 ## 数据持久化
 PostgreSQL 数据存储在 Longhorn 卷上：
 - 数据目录: /var/lib/postgresql/data/pgdata
 - 可以通过 Longhorn UI 创建快照和备份到 S3
 ## 常用操作
 ### 查看日志
 ```bash
 kubectl logs -n postgresql postgresql-0 -f
 ```
 ### 进入数据库
 ```bash
 kubectl exec -it -n postgresql postgresql-0 -- psql -U postgres
 ```
 ### 创建新数据库
 ```bash
 kubectl exec -n postgresql postgresql-0 -- psql -U postgres -c "CREATE DATABASE myapp;"
 ```
 ### 创建新用户
 ```bash
 kubectl exec -n postgresql postgresql-0 -- psql -U postgres -c "CREATE USER myuser WITH PASSWORD 'mypassword';"
 kubectl exec -n postgresql postgresql-0 -- psql -U postgres -c "GRANT ALL PRIVILEGES ON DATABASE myapp TO myuser;"
 ```
 ## 备份与恢复
 ### 手动备份
 ```bash
 kubectl exec -n postgresql postgresql-0 -- pg_dump -U postgres postgres > backup.sql
 ```
 ### 恢复备份
 ```bash
 cat backup.sql | kubectl exec -i -n postgresql postgresql-0 -- psql -U postgres postgres
 ```
 ## 监控
 查看数据库状态：
 ```bash
 kubectl exec -n postgresql postgresql-0 -- psql -U postgres -c "SELECT * FROM pg_stat_activity;"
 ```
--- a/010-中间件/003-navigation/Dockerfile
+++ b/010-中间件/003-navigation/Dockerfile
@@ -0,0 +1,32 @@
 FROM python:3.11-alpine
 # 安装 nginx
 RUN apk add --no-cache nginx
 # 创建工作目录
 WORKDIR /app
 # 复制生成器脚本
 COPY generator.py /app/
 COPY index.html /usr/share/nginx/html/
 # 创建 nginx 配置
 RUN mkdir -p /run/nginx && \
    echo 'server {' > /etc/nginx/http.d/default.conf && \
    echo '    listen 80;' >> /etc/nginx/http.d/default.conf && \
    echo '    root /usr/share/nginx/html;' >> /etc/nginx/http.d/default.conf && \
    echo '    index index.html;' >> /etc/nginx/http.d/default.conf && \
    echo '    location / {' >> /etc/nginx/http.d/default.conf && \
    echo '        try_files $uri $uri/ =404;' >> /etc/nginx/http.d/default.conf && \
    echo '    }' >> /etc/nginx/http.d/default.conf && \
    echo '}' >> /etc/nginx/http.d/default.conf
 # 启动脚本
 RUN echo '#!/bin/sh' > /app/start.sh && \
    echo 'nginx' >> /app/start.sh && \
    echo 'python3 /app/generator.py' >> /app/start.sh && \
    chmod +x /app/start.sh
 EXPOSE 80
 CMD ["/app/start.sh"]
--- a/Show More
+++ b/Show More