首次提交:初始化项目

This commit is contained in:
fei
2026-02-05 00:11:05 +08:00
commit 26eaf8110b
171 changed files with 17105 additions and 0 deletions

30
.claude/.claude.md Normal file
View File

@@ -0,0 +1,30 @@
和我对话要用中文,
需要sudo的时候用 echo "1" | sudo -S 命令
给当前的主机安装上 一个单体的k3s集群,
还有我已经把*.u6.net3w.com指向到了这个宿主机,当我部署新项目的收,你要自己增加子域名指向.
默认都是https访问
当你创建说明文档的收前面自动加上001 002 等等.再同一个文件夹下的md数量顺序排序
yaml文件要在对应的分类中建立新目录,用来存放配置文件.新文件夹中的新项目的文件夹要用 001 002之类开头的命名.
1 一个pg数据库 里面有300个数据库,用户名和数据库名分别位pg001 以此类推到pg300
Caddy 只做SSLTraefik做路由
Caddy 接收 HTTPS然后通过 HTTP (80) 转给 Traefik。
私有git
git remote add origin https://git.u6.net3w.com/fei/k3s-configs.git
git push -u origin main

12
.claude/settings.json Normal file
View File

@@ -0,0 +1,12 @@
{
"alwaysThinkingEnabled": true,
"env": {
"ANTHROPIC_AUTH_TOKEN": "sk-5WAPtYaCjxXgoJiOz9kVR7Wg0MUTpDNY2MDASCNaNYdtdDxC",
"ANTHROPIC_BASE_URL": "https://new-api.yuyugod.top",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "claude-haiku-4-5",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "claude-opus-4-5-20251101",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-5-20250929",
"ANTHROPIC_MODEL": "claude-sonnet-4-5-20250929"
},
"model": "claude-sonnet-4-5-20250929"
}

View File

@@ -0,0 +1,389 @@
---
name: caddy-ssl-termination
description: 专门用于 Traefik 前置 Caddy 进行 SSL 卸载的架构配置,适用于 K3s 环境。
---
# Caddy SSL Termination Skill
## Architecture Overview
**Setup**: Traefik (routing) → Caddy (HTTPS/SSL termination) → HTTP backend
- **Caddy**: Handles HTTPS (443) with automatic SSL certificates, forwards to Traefik on HTTP (80)
- **Traefik**: Routes HTTP traffic to appropriate backend services
- **Flow**: Internet → Caddy:443 (HTTPS) → Traefik:80 (HTTP) → Backend Pods
## Quick Configuration Template
### 1. Basic Caddyfile Structure
```caddy
# /etc/caddy/Caddyfile
# Domain configuration
example.com {
reverse_proxy traefik-service:80
}
# Multiple domains
app1.example.com {
reverse_proxy traefik-service:80
}
app2.example.com {
reverse_proxy traefik-service:80
}
# Wildcard subdomain (requires DNS wildcard)
*.example.com {
reverse_proxy traefik-service:80
}
```
### 2. ConfigMap for Caddyfile
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: caddy-config
namespace: default
data:
Caddyfile: |
# Global options
{
email your-email@example.com
# Use Let's Encrypt staging for testing
# acme_ca https://acme-staging-v02.api.letsencrypt.org/directory
}
# Your domains
example.com {
reverse_proxy traefik-service:80 {
header_up Host {host}
header_up X-Real-IP {remote}
header_up X-Forwarded-For {remote}
header_up X-Forwarded-Proto {scheme}
}
}
```
### 3. Caddy Deployment
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: caddy
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: caddy
template:tadata:
labels:
app: caddy
spec:
containers:
- name: caddy
image: caddy:latest
ports:
- containerPort: 80
- containerPort: 443
- containerPort: 2019 # Admin API
volumeMounts:
- name: config
mountPath: /etc/caddy
- name: data
mountPath: /data
- name: config-cache
mountPath: /config
volumes:
- name: config
configMap:
name: caddy-config
- name: data
persistentVolumeClaim:
claimName: caddy-data
- name: config-cache
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: caddy
namespace: default
spec:
type: LoadBalancer # or NodePort
ports:
- name: http
port: 80
targetPort: 80
- name: https
port: 443
targetPort: 443
selector:
app: caddy
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: caddy-data
namespace: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
```
## Common Operations
### Reload Configuration
After updating the ConfigMap:
```bash
# Method 1: Reload via exec (preserves connections)
kubectl exec -n default deployment/caddy -- caddy reload --config /etc/caddy/Caddyfile
# Method 2: Restart pod (brief downtime)
kubectl rollout restart deployment/caddy -n default
# Method 3: Delete pod (auto-recreates)
kubectl delete pod -n default -l app=caddy
```
### Update Caddyfile
```bash
# Edit ConfigMap
kubectl edit configmap caddy-config -n default
# Or apply updated file
kubectl apply -f caddy-configmap.yaml
# Then reload
kubectl exec -n default deployment/caddy -- caddy reload --config /etc/caddy/Caddyfile
```
### View Logs
```bash
# Follow logs
kubectl logs -n default -f deployment/caddy
# Check SSL certificate issues
kubectl logs -n default deployment/caddy | grep -i "certificate\|acme\|tls"
```
### Check Configuration
```bash
# Validate Caddyfile syntax
kubectl exec -n default deployment/caddy -- caddy validate --config /etc/caddy/Caddyfile
# View current config via API
kubectl exec -n default deployment/caddy -- curl localhost:2019/config/
```
## Adding New Domain
### Step-by-step Process
1. **Update DNS**: Point new domain to Caddy's LoadBalancer IP
```bash
kubectl get svc caddy -n default -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
```
2. **Update ConfigMap**: Add new domain block
```bash
kubectl edit configmap caddy-config -n default
```
Add:
```caddy
newapp.example.com {
reverse_proxy traefik-service:80 {
header_up Host {host}
header_up X-Real-IP {remote}
header_up X-Forwarded-For {remote}
header_up X-Forwarded-Proto {scheme}
}
}
```
3. **Reload Caddy**:
```bash
kubectl exec -n default deployment/caddy -- caddy reload --config /etc/caddy/Caddyfile
```
4. **Verify**: Check logs for certificate acquisition
```bash
kubectl logs -n default deployment/caddy | tail -20
```
## Traefik Integration
### Traefik IngressRoute Example
```yaml
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: myapp
namespace: default
spec:
entryPoints:
- web # HTTP only, Caddy handles HTTPS
routes:
- match: Host(`myapp.example.com`)
kind: Rule
services:
- name: myapp-service
port: 8080
```
### Important Notes
- Traefik should listen on HTTP (80) only
- Caddy handles all HTTPS/SSL
- Use `Host()` matcher in Traefik to route by domain
- Caddy forwards the original `Host` header to Traefik
## Troubleshooting
### SSL Certificate Issues
```bash
# Check certificate status
kubectl exec -n default deployment/caddy -- caddy list-certificates
# View ACME logs
kubectl logs -n default deployment/caddy | grep -i acme
# Common issues:
# - Port 80/443 not accessible from internet
# - DNS not pointing to correct IP
# - Rate limit hit (use staging CA for testing)
```
### Configuration Errors
```bash
# Test config before reload
kubectl exec -n default deployment/caddy -- caddy validate --config /etc/caddy/Caddyfile
# Check for syntax errors
kubectl logs -n default deployment/caddy | grep -i error
```
### Connection Issues
```bash
# Test from inside cluster
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl -v http://traefik-service:80
# Check if Caddy can reach Traefik
kubectl exec -n default deployment/caddy -- curl -v http://traefik-service:80
```
## Advanced Configurations
### Custom TLS Settings
```caddy
example.com {
tls {
protocols tls1.2 tls1.3
ciphers TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
}
reverse_proxy traefik-service:80
}
```
### Rate Limiting
```caddy
example.com {
rate_limit {
zone dynamic {
key {remote_host}
events 100
window 1m
}
}
reverse_proxy traefik-service:80
}
```
### Custom Error Pages
```caddy
example.com {
handle_errors {
respond "{err.status_code} {err.status_text}"
}
reverse_proxy traefik-service:80
}
```
### Health Checks
```caddy
example.com {
reverse_proxy traefik-service:80 {
health_uri /health
health_interval 10s
health_timeout 5s
}
}
```
## Quick Reference Commands
```bash
# Get Caddy pod name
kubectl get pods -n default -l app=caddy
# Reload config
kubectl exec -n default deployment/caddy -- caddy reload --config /etc/caddy/Caddyfile
# View current config
kubectl exec -n default deployment/caddy -- cat /etc/caddy/Caddyfile
# Check certificates
kubectl exec -n default deployment/caddy -- caddy list-certificates
# Restart Caddy
kubectl rollout restart deployment/caddy -n default
# Watch logs
kubectl logs -n default -f deployment/caddy
# Get LoadBalancer IP
kubectl get svc caddy -n default -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
```
## Checklist for New Service
- [ ] DNS A/AAAA record points to Caddy LoadBalancer IP
- [ ] Domain added to Caddy ConfigMap
- [ ] Caddy reloaded successfully
- [ ] Traefik IngressRoute created (HTTP only)
- [ ] Backend service accessible from Traefik
- [ ] SSL certificate acquired (check logs)
- [ ] HTTPS access working from browser
- [ ] HTTP redirects to HTTPS (Caddy default behavior)
## Notes
- Caddy automatically obtains and renews SSL certificates
- Certificates stored in `/data` volume (must be persistent)
- Reload is graceful - no connection drops
- Caddy handles HTTP→HTTPS redirect automatically
- Use staging CA for testing to avoid rate limits
- Wildcard certificates require DNS challenge (more complex setup)

View File

@@ -0,0 +1,14 @@
---
name: sudo-automated-preference
description: 强制 Claude Code 在需要 sudo 权限执行命令时,使用特定的管道方式自动输入密码。
---
# Sudo Automation Skill
## Usage Guidelines
> 当你需要执行任何需要 `sudo` 权限的 shell 命令时,请严格遵守此格式。
## 强制命令格式
严禁直接使用 `sudo <command>`,必须使用以下格式:
```bash
echo "1" | sudo -S <command>

789
.claude/skills/minio/s3.md Normal file
View File

@@ -0,0 +1,789 @@
---
name: minio-s3-expert
description: 提供 MinIO 对象存储的配置、Bucket 管理及 S3 API 调用建议。
---
# MinIO S3 Object Storage Skill
## Architecture Overview
**Setup**: Caddy (HTTPS/SSL) → Traefik (routing) → MinIO (S3 storage)
- **MinIO**: S3-compatible object storage with web console
- **Caddy**: Handles HTTPS (443) with automatic SSL certificates
- **Traefik**: Routes HTTP traffic to MinIO services
- **Policy Manager**: Automatically sets new buckets to public-read (download) permission
- **Flow**: Internet → Caddy:443 (HTTPS) → Traefik:80 (HTTP) → MinIO (9000: API, 9001: Console)
## Quick Deployment Template
### 1. Complete MinIO Deployment YAML
```yaml
apiVersion: v1
kind: Namespace
metadata:
name: minio
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: minio-data
namespace: minio
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: local-path
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: minio
namespace: minio
spec:
replicas: 1
selector:
matchLabels:
app: minio
template:
metadata:
labels:
app: minio
spec:
containers:
- name: minio
image: minio/minio:latest
command:
- /bin/sh
- -c
- minio server /data --console-address ":9001"
ports:
- containerPort: 9000
name: api
- containerPort: 9001
name: console
env:
- name: MINIO_ROOT_USER
value: "admin"
- name: MINIO_ROOT_PASSWORD
value: "your-password-here"
- name: MINIO_SERVER_URL
value: "https://s3.yourdomain.com"
- name: MINIO_BROWSER_REDIRECT_URL
value: "https://console.s3.yourdomain.com"
volumeMounts:
- name: data
mountPath: /data
livenessProbe:
httpGet:
path: /minio/health/live
port: 9000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /minio/health/ready
port: 9000
initialDelaySeconds: 10
periodSeconds: 5
- name: policy-manager
image: alpine:latest
command:
- /bin/sh
- -c
- |
# Install MinIO Client
wget https://dl.min.io/client/mc/release/linux-arm64/mc -O /usr/local/bin/mc
chmod +x /usr/local/bin/mc
# Wait for MinIO to start
sleep 10
# Configure mc client
mc alias set myminio http://localhost:9000 ${MINIO_ROOT_USER} ${MINIO_ROOT_PASSWORD}
echo "Policy manager started. Monitoring buckets..."
# Continuously monitor and set bucket policies
while true; do
# Get all buckets
mc ls myminio 2>/dev/null | awk '{print $NF}' | sed 's/\///' | while read -r BUCKET; do
if [ -n "$BUCKET" ]; then
# Check current policy
POLICY_OUTPUT=$(mc anonymous get myminio/${BUCKET} 2>&1)
# If private (contains "Access permission for" but not "download")
if echo "$POLICY_OUTPUT" | grep -q "Access permission for" && ! echo "$POLICY_OUTPUT" | grep -q "download"; then
echo "Setting download policy for bucket: ${BUCKET}"
mc anonymous set download myminio/${BUCKET}
fi
fi
done
sleep 30
done
env:
- name: MINIO_ROOT_USER
value: "admin"
- name: MINIO_ROOT_PASSWORD
value: "your-password-here"
volumes:
- name: data
persistentVolumeClaim:
claimName: minio-data
---
apiVersion: v1
kind: Service
metadata:
name: minio
namespace: minio
spec:
type: ClusterIP
ports:
- port: 9000
targetPort: 9000
name: api
- port: 9001
targetPort: 9001
name: console
selector:
app: minio
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: minio-api
namespace: minio
spec:
ingressClassName: traefik
rules:
- host: s3.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: minio
port:
number: 9000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: minio-console
namespace: minio
spec:
ingressClassName: traefik
rules:
- host: console.s3.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: minio
port:
number: 9001
```
### 2. Configuration Checklist
Before deploying, update these values in the YAML:
**Domains (4 places):**
- `s3.yourdomain.com` → Your S3 API domain
- `console.s3.yourdomain.com` → Your console domain
**Credentials (4 places):**
- `MINIO_ROOT_USER: "admin"` → Your admin username
- `MINIO_ROOT_PASSWORD: "your-password-here"` → Your admin password (min 8 chars)
**Architecture (1 place):**
- `linux-arm64` → Change based on your CPU:
- ARM64: `linux-arm64`
- x86_64: `linux-amd64`
**Storage (1 place):**
- `storage: 50Gi` → Adjust storage size as needed
## Deployment Steps
### 1. Prepare DNS
Point your domains to the server IP:
```bash
# Add DNS A records
s3.yourdomain.com A your-server-ip
console.s3.yourdomain.com A your-server-ip
```
### 2. Configure Caddy
Add domains to Caddy ConfigMap:
```bash
kubectl edit configmap caddy-config -n default
```
Add these blocks:
```caddy
s3.yourdomain.com {
reverse_proxy traefik.kube-system.svc.cluster.local:80 {
header_up Host {host}
header_up X-Real-IP {remote}
header_up X-Forwarded-For {remote}
header_up X-Forwarded-Proto {scheme}
}
}
console.s3.yourdomain.com {
reverse_proxy traefik.kube-system.svc.cluster.local:80 {
header_up Host {host}
header_up X-Real-IP {remote}
header_up X-Forwarded-For {remote}
header_up X-Forwarded-Proto {scheme}
}
}
```
Reload Caddy:
```bash
kubectl exec -n default deployment/caddy -- caddy reload --config /etc/caddy/Caddyfile
```
### 3. Deploy MinIO
```bash
# Apply the configuration
kubectl apply -f minio.yaml
# Check deployment status
kubectl get pods -n minio
# Wait for pods to be ready
kubectl wait --for=condition=ready pod -l app=minio -n minio --timeout=300s
```
### 4. Verify Deployment
```bash
# Check MinIO logs
kubectl logs -n minio -l app=minio -c minio
# Check policy manager logs
kubectl logs -n minio -l app=minio -c policy-manager
# Check ingress
kubectl get ingress -n minio
# Check service
kubectl get svc -n minio
```
## Access MinIO
### Web Console
- URL: `https://console.s3.yourdomain.com`
- Username: Your configured `MINIO_ROOT_USER`
- Password: Your configured `MINIO_ROOT_PASSWORD`
### S3 API Endpoint
- URL: `https://s3.yourdomain.com`
- Use with AWS CLI, SDKs, or any S3-compatible client
## Bucket Policy Management
### Automatic Public-Read Policy
The policy manager sidecar automatically:
- Scans all buckets every 30 seconds
- Sets new private buckets to `download` (public-read) permission
- Allows anonymous downloads, requires auth for uploads/deletes
### Manual Policy Management
```bash
# Get pod name
POD=$(kubectl get pod -n minio -l app=minio -o jsonpath='{.items[0].metadata.name}')
# Access MinIO Client in pod
kubectl exec -n minio $POD -c policy-manager -- mc alias set myminio http://localhost:9000 admin your-password
# List buckets
kubectl exec -n minio $POD -c policy-manager -- mc ls myminio
# Check bucket policy
kubectl exec -n minio $POD -c policy-manager -- mc anonymous get myminio/bucket-name
# Set bucket to public-read (download)
kubectl exec -n minio $POD -c policy-manager -- mc anonymous set download myminio/bucket-name
# Set bucket to private
kubectl exec -n minio $POD -c policy-manager -- mc anonymous set private myminio/bucket-name
# Set bucket to public (read + write)
kubectl exec -n minio $POD -c policy-manager -- mc anonymous set public myminio/bucket-name
```
## Using MinIO
### Create Bucket via Web Console
1. Access `https://console.s3.yourdomain.com`
2. Login with credentials
3. Click "Buckets" → "Create Bucket"
4. Enter bucket name
5. Wait 30 seconds for auto-policy to apply
### Upload Files via Web Console
1. Navigate to bucket
2. Click "Upload" → "Upload File"
3. Select files
4. Files are immediately accessible via public URL
### Access Files
Public URL format:
```
https://s3.yourdomain.com/bucket-name/file-path
```
Example:
```bash
# Upload via console, then access:
curl https://s3.yourdomain.com/my-bucket/image.png
```
### Using AWS CLI
```bash
# Configure AWS CLI
aws configure set aws_access_key_id admin
aws configure set aws_secret_access_key your-password
aws configure set default.region us-east-1
# List buckets
aws --endpoint-url https://s3.yourdomain.com s3 ls
# Create bucket
aws --endpoint-url https://s3.yourdomain.com s3 mb s3://my-bucket
# Upload file
aws --endpoint-url https://s3.yourdomain.com s3 cp file.txt s3://my-bucket/
# Download file
aws --endpoint-url https://s3.yourdomain.com s3 cp s3://my-bucket/file.txt ./
# List bucket contents
aws --endpoint-url https://s3.yourdomain.com s3 ls s3://my-bucket/
```
### Using MinIO Client (mc)
```bash
# Install mc locally
wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
sudo mv mc /usr/local/bin/
# Configure alias
mc alias set myminio https://s3.yourdomain.com admin your-password
# List buckets
mc ls myminio
# Create bucket
mc mb myminio/my-bucket
# Upload file
mc cp file.txt myminio/my-bucket/
# Download file
mc cp myminio/my-bucket/file.txt ./
# Mirror directory
mc mirror ./local-dir myminio/my-bucket/remote-dir
```
## Common Operations
### View Logs
```bash
# MinIO server logs
kubectl logs -n minio -l app=minio -c minio -f
# Policy manager logs
kubectl logs -n minio -l app=minio -c policy-manager -f
# Both containers
kubectl logs -n minio -l app=minio --all-containers -f
```
### Restart MinIO
```bash
# Graceful restart
kubectl rollout restart deployment/minio -n minio
# Force restart (delete pod)
kubectl delete pod -n minio -l app=minio
```
### Scale Storage
```bash
# Edit PVC (note: can only increase, not decrease)
kubectl edit pvc minio-data -n minio
# Update storage size
# Change: storage: 50Gi → storage: 100Gi
```
### Backup Data
```bash
# Get pod name
POD=$(kubectl get pod -n minio -l app=minio -o jsonpath='{.items[0].metadata.name}')
# Copy data from pod
kubectl cp minio/$POD:/data ./minio-backup -c minio
# Or use mc mirror
mc mirror myminio/bucket-name ./backup/bucket-name
```
### Restore Data
```bash
# Copy data to pod
kubectl cp ./minio-backup minio/$POD:/data -c minio
# Restart MinIO
kubectl rollout restart deployment/minio -n minio
# Or use mc mirror
mc mirror ./backup/bucket-name myminio/bucket-name
```
## Troubleshooting
### Pod Not Starting
```bash
# Check pod status
kubectl describe pod -n minio -l app=minio
# Check events
kubectl get events -n minio --sort-by='.lastTimestamp'
# Common issues:
# - PVC not bound (check storage class)
# - Image pull error (check network/registry)
# - Resource limits (check node resources)
```
### Cannot Access Web Console
```bash
# Check ingress
kubectl get ingress -n minio
kubectl describe ingress minio-console -n minio
# Check service
kubectl get svc -n minio
# Test from inside cluster
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl -v http://minio.minio.svc.cluster.local:9001
# Check Caddy logs
kubectl logs -n default -l app=caddy | grep -i s3
# Check Traefik logs
kubectl logs -n kube-system -l app.kubernetes.io/name=traefik
```
### SSL Certificate Issues
```bash
# Check Caddy certificates
kubectl exec -n default deployment/caddy -- caddy list-certificates
# Check Caddy logs for ACME
kubectl logs -n default deployment/caddy | grep -i "s3\|acme\|certificate"
# Verify DNS resolution
nslookup s3.yourdomain.com
nslookup console.s3.yourdomain.com
```
### Policy Manager Not Working
```bash
# Check policy manager logs
kubectl logs -n minio -l app=minio -c policy-manager
# Manually test mc commands
POD=$(kubectl get pod -n minio -l app=minio -o jsonpath='{.items[0].metadata.name}')
kubectl exec -n minio $POD -c policy-manager -- mc ls myminio
# Restart policy manager (restart pod)
kubectl delete pod -n minio -l app=minio
```
### Files Not Accessible
```bash
# Check bucket policy
kubectl exec -n minio $POD -c policy-manager -- mc anonymous get myminio/bucket-name
# Should show: Access permission for `myminio/bucket-name` is set to `download`
# If not, manually set
kubectl exec -n minio $POD -c policy-manager -- mc anonymous set download myminio/bucket-name
# Test access
curl -I https://s3.yourdomain.com/bucket-name/file.txt
```
## Advanced Configuration
### Custom Storage Class
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: minio-data
namespace: minio
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: fast-ssd # Custom storage class
```
### Resource Limits
```yaml
containers:
- name: minio
image: minio/minio:latest
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
```
### Multiple Replicas (Distributed Mode)
For production, use distributed MinIO:
```yaml
# Requires 4+ nodes with persistent storage
command:
- /bin/sh
- -c
- minio server http://minio-{0...3}.minio.minio.svc.cluster.local/data --console-address ":9001"
```
### Custom Bucket Policies
Create custom policy JSON:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {"AWS": ["*"]},
"Action": ["s3:GetObject"],
"Resource": ["arn:aws:s3:::bucket-name/*"]
}
]
}
```
Apply via mc:
```bash
kubectl exec -n minio $POD -c policy-manager -- mc anonymous set-json policy.json myminio/bucket-name
```
### Disable Auto-Policy Manager
Remove the `policy-manager` container from deployment if you want manual control.
## Best Practices
### Bucket Naming
- Use lowercase letters, numbers, hyphens
- Avoid underscores, spaces, special characters
- Keep names short and descriptive
- Example: `user-uploads`, `static-assets`, `backups-2024`
### Folder Structure
Use prefixes (folders) to organize files:
```
bucket-name/
├── user1/
│ ├── profile.jpg
│ └── documents/
├── user2/
│ └── avatar.png
└── shared/
└── logo.png
```
### Security
- Change default credentials immediately
- Use strong passwords (16+ characters)
- Create separate access keys for applications
- Use bucket policies to restrict access
- Enable versioning for important buckets
- Regular backups of critical data
### Performance
- Use CDN for frequently accessed files
- Enable compression for text files
- Use appropriate storage class
- Monitor disk usage and scale proactively
## Quick Reference Commands
```bash
# Deploy MinIO
kubectl apply -f minio.yaml
# Check status
kubectl get pods -n minio
kubectl get svc -n minio
kubectl get ingress -n minio
# View logs
kubectl logs -n minio -l app=minio -c minio -f
kubectl logs -n minio -l app=minio -c policy-manager -f
# Restart MinIO
kubectl rollout restart deployment/minio -n minio
# Get pod name
POD=$(kubectl get pod -n minio -l app=minio -o jsonpath='{.items[0].metadata.name}')
# Access mc client
kubectl exec -n minio $POD -c policy-manager -- mc ls myminio
# Check bucket policy
kubectl exec -n minio $POD -c policy-manager -- mc anonymous get myminio/bucket-name
# Set bucket policy
kubectl exec -n minio $POD -c policy-manager -- mc anonymous set download myminio/bucket-name
# Delete deployment
kubectl delete -f minio.yaml
```
## Integration Examples
### Node.js (AWS SDK)
```javascript
const AWS = require('aws-sdk');
const s3 = new AWS.S3({
endpoint: 'https://s3.yourdomain.com',
accessKeyId: 'admin',
secretAccessKey: 'your-password',
s3ForcePathStyle: true,
signatureVersion: 'v4'
});
// Upload file
s3.putObject({
Bucket: 'my-bucket',
Key: 'file.txt',
Body: 'Hello World'
}, (err, data) => {
if (err) console.error(err);
else console.log('Uploaded:', data);
});
// Download file
s3.getObject({
Bucket: 'my-bucket',
Key: 'file.txt'
}, (err, data) => {
if (err) console.error(err);
else console.log('Content:', data.Body.toString());
});
```
### Python (boto3)
```python
import boto3
s3 = boto3.client('s3',
endpoint_url='https://s3.yourdomain.com',
aws_access_key_id='admin',
aws_secret_access_key='your-password'
)
# Upload file
s3.upload_file('local-file.txt', 'my-bucket', 'remote-file.txt')
# Download file
s3.download_file('my-bucket', 'remote-file.txt', 'downloaded.txt')
# List objects
response = s3.list_objects_v2(Bucket='my-bucket')
for obj in response.get('Contents', []):
print(obj['Key'])
```
### Go (minio-go)
```go
package main
import (
"github.com/minio/minio-go/v7"
"github.com/minio/minio-go/v7/pkg/credentials"
)
func main() {
client, _ := minio.New("s3.yourdomain.com", &minio.Options{
Creds: credentials.NewStaticV4("admin", "your-password", ""),
Secure: true,
})
// Upload file
client.FPutObject(ctx, "my-bucket", "file.txt", "local-file.txt", minio.PutObjectOptions{})
// Download file
client.FGetObject(ctx, "my-bucket", "file.txt", "downloaded.txt", minio.GetObjectOptions{})
}
```
## Notes
- MinIO is fully S3-compatible
- Automatic SSL via Caddy
- Auto-policy sets buckets to public-read by default
- Policy manager runs every 30 seconds
- Persistent storage required for data retention
- Single replica suitable for development/small deployments
- Use distributed mode for production high-availability
- Regular backups recommended for critical data

View File

View File

@@ -0,0 +1,29 @@
# Traefik Middleware - CORS 配置
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: cors-headers
namespace: registry-system
spec:
headers:
accessControlAllowMethods:
- "GET"
- "HEAD"
- "POST"
- "PUT"
- "DELETE"
- "OPTIONS"
accessControlAllowOriginList:
- "http://registry.u6.net3w.com"
- "https://registry.u6.net3w.com"
accessControlAllowCredentials: true
accessControlAllowHeaders:
- "Authorization"
- "Content-Type"
- "Accept"
- "Cache-Control"
accessControlExposeHeaders:
- "Docker-Content-Digest"
- "WWW-Authenticate"
accessControlMaxAge: 100
addVaryHeader: true

View File

@@ -0,0 +1,10 @@
apiVersion: v1
kind: Secret
metadata:
name: registry-auth-secret
namespace: registry-system
type: Opaque
stringData:
# ▼▼▼ 重点:这是 123456 的 bcrypt 加密,直接复制不要改 ▼▼▼
htpasswd: |
admin:$2y$05$WSu.LllzUnHQcNPgklqqqum3o69unaC6lCUNz.rRmmq3YhowL99RW

View File

@@ -0,0 +1,27 @@
root@98-hk:~/k3s/registry# docker run --rm --entrypoint htpasswd httpd:alpine -Bbn admin 123456
Unable to find image 'httpd:alpine' locally
alpine: Pulling from library/httpd
1074353eec0d: Pull complete
0bd765d2a2cb: Pull complete
0c4ffdba1e9e: Pull complete
4f4fb700ef54: Pull complete
0c51c0b07eae: Pull complete
e626d5c4ed2c: Pull complete
988cd7d09a31: Pull complete
Digest: sha256:6b7535d8a33c42b0f0f48ff0067765d518503e465b1bf6b1629230b62a466a87
Status: Downloaded newer image for httpd:alpine
admin:$2y$05$yYEah4y9O9F/5TumcJSHAuytQko2MAyFM1MuqgAafDED7Fmiyzzse
root@98-hk:~/k3s/registry# # 注意:两边要有单引号 ' '
kubectl create secret generic registry-auth-secret \
--from-literal=htpasswd='admin:$2y$05$yYEah4y9O9F/5TumcJSHAuytQko2MAyFM1MuqgAafDED7Fmiyzzse' \
--namespace registry-system
secret/registry-auth-secret created
root@98-hk:~/k3s/registry# # 重新部署应用
kubectl apply -f registry-stack.yaml
namespace/registry-system unchanged
persistentvolumeclaim/registry-pvc unchanged
deployment.apps/registry created
service/registry-service unchanged
ingress.networking.k8s.io/registry-ingress unchanged
root@98-hk:~/k3s/registry#

View File

@@ -0,0 +1,131 @@
# 1. 创建独立的命名空间
apiVersion: v1
kind: Namespace
metadata:
name: registry-system
---
# 2. 将刚才生成的密码文件创建为 K8s Secret
---
# 3. 申请硬盘空间 (存放镜像文件)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: registry-pvc
namespace: registry-system
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 20Gi # 给仓库 20G 空间,不够随时可以扩
---
# 4. 部署 Registry 应用
apiVersion: apps/v1
kind: Deployment
metadata:
name: registry
namespace: registry-system
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: registry
template:
metadata:
labels:
app: registry
spec:
containers:
- name: registry
image: registry:2
ports:
- containerPort: 5000
env:
# --- 开启认证 ---
- name: REGISTRY_AUTH
value: "htpasswd"
- name: REGISTRY_AUTH_HTPASSWD_REALM
value: "Registry Realm"
- name: REGISTRY_AUTH_HTPASSWD_PATH
value: "/auth/htpasswd"
# --- 存储路径 ---
- name: REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY
value: "/var/lib/registry"
volumeMounts:
- name: data-volume
mountPath: /var/lib/registry
- name: auth-volume
mountPath: /auth
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: registry-pvc
- name: auth-volume
secret:
secretName: registry-auth-secret
---
# 5. 内部服务
apiVersion: v1
kind: Service
metadata:
name: registry-service
namespace: registry-system
spec:
selector:
app: registry
ports:
- protocol: TCP
port: 80
targetPort: 5000
---
# 6. 暴露 HTTPS 域名
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: registry-ingress
namespace: registry-system
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
# 增加上传大小限制 (Docker 镜像层可能很大)
ingress.kubernetes.io/proxy-body-size: "0"
nginx.ingress.kubernetes.io/proxy-body-size: "0"
# CORS 配置 (允许 UI 访问 Registry API)
traefik.ingress.kubernetes.io/router.middlewares: registry-system-cors-headers@kubernetescrd
spec:
rules:
- host: registry.u6.net3w.com
http:
paths:
# Registry API 路径 (优先级高,必须放在前面)
- path: /v2
pathType: Prefix
backend:
service:
name: registry-service
port:
number: 80
# UI 显示在根路径
- path: /
pathType: Prefix
backend:
service:
name: registry-ui-service
port:
number: 80
tls:
- hosts:
- registry.u6.net3w.com
secretName: registry-tls-secret

View File

@@ -0,0 +1,84 @@
# Joxit Docker Registry UI - 轻量级 Web 界面
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: registry-ui
namespace: registry-system
spec:
replicas: 1
selector:
matchLabels:
app: registry-ui
template:
metadata:
labels:
app: registry-ui
spec:
containers:
- name: registry-ui
image: joxit/docker-registry-ui:latest
ports:
- containerPort: 80
env:
# Registry API 地址(通过 nginx 代理,避免混合内容问题)
- name: NGINX_PROXY_PASS_URL
value: "http://registry-service.registry-system.svc.cluster.local"
# 允许删除镜像
- name: DELETE_IMAGES
value: "true"
# 显示内容摘要
- name: SHOW_CONTENT_DIGEST
value: "true"
# 单个 registry 模式
- name: SINGLE_REGISTRY
value: "true"
# Registry 标题
- name: REGISTRY_TITLE
value: "U9 Docker Registry"
# 启用搜索功能
- name: CATALOG_ELEMENTS_LIMIT
value: "1000"
---
# UI 服务
apiVersion: v1
kind: Service
metadata:
name: registry-ui-service
namespace: registry-system
spec:
selector:
app: registry-ui
ports:
- protocol: TCP
port: 80
targetPort: 80
---
# 暴露 UI 到外网
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: registry-ui-ingress
namespace: registry-system
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
rules:
- host: registry-ui.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: registry-ui-service
port:
number: 80
tls:
- hosts:
- registry-ui.u6.net3w.com
secretName: registry-ui-tls-secret

View File

@@ -0,0 +1,72 @@
# 01-mysql.yaml (新版)
# --- 第一部分:申请一张硬盘券 (PVC) ---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pvc # 记住这个券的名字
namespace: demo-space
spec:
accessModes:
- ReadWriteOnce # 只能被一个节点读写
storageClassName: longhorn # K3s 默认的存储驱动,利用 VPS 本地硬盘
resources:
requests:
storage: 2Gi # 申请 2GB 大小
---
# --- 第二部分:数据库服务 (不变) ---
apiVersion: v1
kind: Service
metadata:
name: mysql-service
namespace: demo-space
spec:
ports:
- port: 3306
selector:
app: wordpress-mysql
---
# --- 第三部分:部署数据库 (挂载硬盘) ---
apiVersion: apps/v1
kind: Deployment
metadata:
name: wordpress-mysql
namespace: demo-space
spec:
selector:
matchLabels:
app: wordpress-mysql
strategy:
type: Recreate # 有状态应用建议用 Recreate (先关旧的再开新的)
template:
metadata:
labels:
app: wordpress-mysql
spec:
containers:
- image: mariadb:10.6.4-focal
name: mysql
env:
- name: MYSQL_ROOT_PASSWORD
value: "password123"
- name: MYSQL_DATABASE
value: "wordpress"
- name: MYSQL_USER
value: "wordpress"
- name: MYSQL_PASSWORD
value: "wordpress"
ports:
- containerPort: 3306
name: mysql
# ▼▼▼ 重点变化在这里 ▼▼▼
volumeMounts:
- name: mysql-store
mountPath: /var/lib/mysql # 容器里数据库存文件的位置
volumes:
- name: mysql-store
persistentVolumeClaim:
claimName: mysql-pvc # 使用上面定义的那张券

View File

@@ -0,0 +1,64 @@
# 02-wordpress.yaml
apiVersion: v1
kind: Service
metadata:
name: wordpress-service
namespace: demo-space
spec:
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 10800 # 3 hours
ports:
- port: 80
selector:
app: wordpress
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: wordpress
namespace: demo-space
spec:
replicas: 2 # 我们启动 2 个 WordPress 前台
selector:
matchLabels:
app: wordpress
template:
metadata:
labels:
app: wordpress
spec:
containers:
- image: wordpress:latest
name: wordpress
env:
- name: WORDPRESS_DB_HOST
value: "mysql-service" # 魔法所在!直接填名字
- name: WORDPRESS_DB_USER
value: "wordpress"
- name: WORDPRESS_DB_PASSWORD
value: "wordpress"
- name: WORDPRESS_DB_NAME
value: "wordpress"
- name: WORDPRESS_CONFIG_EXTRA
value: |
/* HTTPS behind reverse proxy - Complete configuration */
if (isset($_SERVER['HTTP_X_FORWARDED_PROTO']) && $_SERVER['HTTP_X_FORWARDED_PROTO'] === 'https') {
$_SERVER['HTTPS'] = 'on';
}
if (isset($_SERVER['HTTP_X_FORWARDED_HOST'])) {
$_SERVER['HTTP_HOST'] = $_SERVER['HTTP_X_FORWARDED_HOST'];
}
/* Force SSL for admin */
define('FORCE_SSL_ADMIN', true);
/* Redis session storage for multi-replica support */
@ini_set('session.save_handler', 'redis');
@ini_set('session.save_path', 'tcp://redis-service:6379');
/* Fix cookie issues */
@ini_set('session.cookie_httponly', true);
@ini_set('session.cookie_secure', true);
@ini_set('session.use_only_cookies', true);
ports:
- containerPort: 80
name: wordpress

View File

@@ -0,0 +1,31 @@
# 03-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: wordpress-ingress
namespace: demo-space
annotations:
# ▼▼▼ 关键注解:我要申请证书 ▼▼▼
cert-manager.io/cluster-issuer: letsencrypt-prod
# ▼▼▼ Traefik sticky session 配置 ▼▼▼
traefik.ingress.kubernetes.io/affinity: "true"
traefik.ingress.kubernetes.io/session-cookie-name: "wordpress-session"
spec:
rules:
- host: blog.u6.net3w.com # 您的域名
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: wordpress-service
port:
number: 80
# ▼▼▼ 关键配置:证书存放在这个 Secret 里 ▼▼▼
tls:
- hosts:
- blog.u6.net3w.com
secretName: blog-tls-secret # K3s 会自动创建这个 secret 并填入证书

View File

@@ -0,0 +1,40 @@
# 04-redis.yaml - Redis for WordPress session storage
apiVersion: v1
kind: Service
metadata:
name: redis-service
namespace: demo-space
spec:
ports:
- port: 6379
targetPort: 6379
selector:
app: redis
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
namespace: demo-space
spec:
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"

View File

@@ -0,0 +1,8 @@
# Custom WordPress image with Redis PHP extension
FROM wordpress:latest
# Install Redis PHP extension
RUN pecl install redis && docker-php-ext-enable redis
# Verify installation
RUN php -m | grep redis

View File

@@ -0,0 +1,30 @@
# 1. 定义一个“虚假”的服务,作为 K8s 内部的入口
#
# external-app.yaml (修正版)
apiVersion: v1
kind: Service
metadata:
name: host-app-service
namespace: demo-space
spec:
ports:
- name: http # <--- Service 这里叫 http
protocol: TCP
port: 80
targetPort: 3100
---
apiVersion: v1
kind: Endpoints
metadata:
name: host-app-service
namespace: demo-space
subsets:
- addresses:
- ip: 85.137.244.98
ports:
- port: 3100
name: http # <--- 【关键修改】这里必须也叫 http才能配对成功

View File

@@ -0,0 +1,25 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: host-app-ingress
namespace: demo-space
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
# ▼▼▼ 核心修复:添加这一行 ▼▼▼
ingress.kubernetes.io/custom-response-headers: "Content-Security-Policy: upgrade-insecure-requests"
spec:
rules:
- host: wt.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: host-app-service
port:
number: 80
tls:
- hosts:
- wt.u6.net3w.com
secretName: wt-tls-secret

View File

@@ -0,0 +1,16 @@
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
# Let's Encrypt 的生产环境接口
server: https://acme-v02.api.letsencrypt.org/directory
# 填您的真实邮箱,证书过期前会发邮件提醒(虽然它会自动续期)
email: fszy2021@gmail.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: traefik

View File

@@ -0,0 +1,27 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: longhorn-ingress
namespace: longhorn-system # 注意Longhorn 安装在这个命名空间
annotations:
# 1. 告诉 Cert-Manager请用这个发证机构给我发证
cert-manager.io/cluster-issuer: letsencrypt-prod
# (可选) 强制 Traefik 使用 HTTPS 入口但这行通常不需要Traefik 会自动识别 TLS
# traefik.ingress.kubernetes.io/router.entrypoints: websecure
spec:
rules:
- host: storage.u6.net3w.com # 您的域名
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: longhorn-frontend
port:
number: 80
# 2. 告诉 K3s证书下载下来后存在哪里
tls:
- hosts:
- storage.u6.net3w.com
secretName: longhorn-tls-secret # 证书会自动保存在这个 Secret 里

View File

@@ -0,0 +1,37 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
namespace: demo-space
spec:
selector:
matchLabels:
run: php-apache
replicas: 1
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: registry.k8s.io/hpa-example
ports:
- containerPort: 80
resources:
# 必须设置资源限制HPA 才能计算百分比
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
namespace: demo-space
spec:
ports:
- port: 80
selector:
run: php-apache

View File

@@ -0,0 +1,120 @@
# 1. 独立的命名空间
apiVersion: v1
kind: Namespace
metadata:
name: n8n-system
---
# 2. 数据持久化 (保存工作流和账号信息)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: n8n-pvc
namespace: n8n-system
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 5Gi
---
# 3. 核心应用
apiVersion: apps/v1
kind: Deployment
metadata:
name: n8n
namespace: n8n-system
labels:
app: n8n
spec:
replicas: 1
selector:
matchLabels:
app: n8n
template:
metadata:
labels:
app: n8n
spec:
securityContext:
fsGroup: 1000
containers:
- name: n8n
image: n8nio/n8n:latest
securityContext:
runAsUser: 1000
runAsGroup: 1000
ports:
- containerPort: 5678
env:
# ▼▼▼ 关键配置 ▼▼▼
- name: N8N_HOST
value: "n8n.u6.net3w.com"
- name: N8N_PORT
value: "5678"
- name: N8N_PROTOCOL
value: "https"
- name: WEBHOOK_URL
value: "https://n8n.u6.net3w.com/"
# 时区设置 (方便定时任务)
- name: GENERIC_TIMEZONE
value: "Asia/Shanghai"
- name: TZ
value: "Asia/Shanghai"
# 禁用 n8n 的一些统计收集
- name: N8N_DIAGNOSTICS_ENABLED
value: "false"
volumeMounts:
- name: data
mountPath: /home/node/.n8n
volumes:
- name: data
persistentVolumeClaim:
claimName: n8n-pvc
---
# 4. 服务暴露
apiVersion: v1
kind: Service
metadata:
name: n8n-service
namespace: n8n-system
spec:
selector:
app: n8n
ports:
- protocol: TCP
port: 80
targetPort: 5678
---
# 5. Ingress (自动 HTTPS)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: n8n-ingress
namespace: n8n-system
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- n8n.u6.net3w.com
secretName: n8n-tls
rules:
- host: n8n.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: n8n-service
port:
number: 80

View File

@@ -0,0 +1,109 @@
# 1. 命名空间
apiVersion: v1
kind: Namespace
metadata:
name: gitea-system
---
# 2. 数据持久化 (存放代码仓库和数据库)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: gitea-data-pvc
namespace: gitea-system
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn # 沿用你的 Longhorn
resources:
requests:
storage: 10Gi
---
# 3. 部署 Gitea 应用
apiVersion: apps/v1
kind: Deployment
metadata:
name: gitea
namespace: gitea-system
spec:
replicas: 1
selector:
matchLabels:
app: gitea
template:
metadata:
labels:
app: gitea
spec:
containers:
- name: gitea
image: gitea/gitea:latest
ports:
- containerPort: 3000
name: http
- containerPort: 22
name: ssh
volumeMounts:
- name: gitea-data
mountPath: /data
env:
# 初始设置,避免手动改配置文件
- name: GITEA__server__DOMAIN
value: "git.u6.net3w.com"
- name: GITEA__server__ROOT_URL
value: "https://git.u6.net3w.com/"
- name: GITEA__server__SSH_PORT
value: "22" # 注意:通过 Ingress 访问时通常用 HTTPSSSH 需要额外配置 NodePort暂时先设为标准
volumes:
- name: gitea-data
persistentVolumeClaim:
claimName: gitea-data-pvc
---
# 4. Service (内部网络)
apiVersion: v1
kind: Service
metadata:
name: gitea-service
namespace: gitea-system
spec:
selector:
app: gitea
ports:
- protocol: TCP
port: 80
targetPort: 3000
name: http
- protocol: TCP
port: 2222 # 如果未来要用 SSH可以映射这个端口
targetPort: 22
name: ssh
---
# 5. Ingress (暴露 HTTPS 域名)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: gitea-ingress
namespace: gitea-system
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
# 允许大文件上传 (Git push 可能很大)
nginx.ingress.kubernetes.io/proxy-body-size: "0"
spec:
rules:
- host: git.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: gitea-service
port:
number: 80
tls:
- hosts:
- git.u6.net3w.com
secretName: gitea-tls-secret

View File

@@ -0,0 +1,97 @@
# 1. 创建一个独立的命名空间,保持整洁
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
---
# 2. 申请一块 10GB 的硬盘 (使用 Longhorn)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: kuma-pvc
namespace: monitoring
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 2Gi
---
# 3. 部署应用 (StatefulSet 也可以用 Deployment单实例用 Deployment 足够)
apiVersion: apps/v1
kind: Deployment
metadata:
name: uptime-kuma
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: uptime-kuma
strategy:
type: Recreate
template:
metadata:
labels:
app: uptime-kuma
spec:
containers:
- name: uptime-kuma
image: louislam/uptime-kuma:1
ports:
- containerPort: 3001
volumeMounts:
- name: data
mountPath: /app/data
volumes:
- name: data
persistentVolumeClaim:
claimName: kuma-pvc
---
# 4. 创建内部服务
apiVersion: v1
kind: Service
metadata:
name: kuma-service
namespace: monitoring
spec:
selector:
app: uptime-kuma
ports:
- protocol: TCP
port: 80
targetPort: 3001
---
# 5. 暴露到外网 (HTTPS + 域名)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: kuma-ingress
namespace: monitoring
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
rules:
- host: status.u6.net3w.com # <--- 您的新域名
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: kuma-service
port:
number: 80
tls:
- hosts:
- status.u6.net3w.com
secretName: status-tls-secret

View File

@@ -0,0 +1,62 @@
apiVersion: v1
kind: Namespace
metadata:
name: navigation
---
# ▼▼▼ 核心知识点ConfigMap ▼▼▼
apiVersion: v1
kind: ConfigMap
metadata:
name: homepage-config
namespace: navigation
data:
# 配置文件 1: 定义小组件 (显示时间、搜索框、资源占用)
widgets.yaml: |
- search:
provider: google
target: _blank
- resources:
cpu: true
memory: true
disk: true
- datetime:
text_size: xl
format:
timeStyle: short
# 配置文件 2: 定义您的服务链接 (请注意看下面的 icon 和 href)
services.yaml: |
- 我的应用:
- 个人博客:
icon: wordpress.png
href: https://blog.u6.net3w.com
description: 我的数字花园
- 远程桌面:
icon: linux.png
href: https://wt.u6.net3w.com
description: K8s 外部反代测试
- 基础设施:
- 状态监控:
icon: uptime-kuma.png
href: https://status.u6.net3w.com
description: Uptime Kuma
widget:
type: uptimekuma
url: http://kuma-service.monitoring.svc.cluster.local # ▼ 重点K8s 内部 DNS
slug: my-wordpress-blog # (高级玩法:稍后填这个)
- 存储管理:
icon: longhorn.png
href: https://storage.u6.net3w.com
description: 分布式存储面板
widget:
type: longhorn
url: http://longhorn-frontend.longhorn-system.svc.cluster.local
# 配置文件 3: 基础设置
settings.yaml: |
title: K3s 指挥中心
background: https://images.unsplash.com/photo-1519681393784-d120267933ba?auto=format&fit=crop&w=1920&q=80
theme: dark
color: slate

View File

@@ -0,0 +1,71 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: homepage
namespace: navigation
spec:
replicas: 1
selector:
matchLabels:
app: homepage
template:
metadata:
labels:
app: homepage
spec:
containers:
- name: homepage
image: ghcr.io/gethomepage/homepage:latest
ports:
- containerPort: 3000
# ▼▼▼ 关键动作:把 ConfigMap 挂载成文件 ▼▼▼
volumeMounts:
- name: config-volume
mountPath: /app/config # 容器里的配置目录
volumes:
- name: config-volume
configMap:
name: homepage-config # 引用上面的 ConfigMap
---
apiVersion: v1
kind: Service
metadata:
name: homepage-service
namespace: navigation
spec:
selector:
app: homepage
ports:
- protocol: TCP
port: 80
targetPort: 3000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: homepage-ingress
namespace: navigation
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
# 开启这个可以允许跨域调用 (可选)
nginx.ingress.kubernetes.io/enable-cors: "true"
spec:
rules:
- host: nav.u6.net3w.com # <--- 您的新域名
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: homepage-service
port:
number: 80
tls:
- hosts:
- nav.u6.net3w.com
secretName: nav-tls-secret

View File

@@ -0,0 +1,33 @@
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: k3s-apps
namespace: argocd
spec:
project: default
# Git 仓库配置
source:
repoURL: https://git.u6.net3w.com/admin/k3s-configs.git
targetRevision: HEAD
path: k3s
# 目标集群配置
destination:
server: https://kubernetes.default.svc
namespace: default
# 自动同步配置
syncPolicy:
automated:
prune: true # 自动删除 Git 中不存在的资源
selfHeal: true # 自动修复被手动修改的资源
allowEmpty: false
syncOptions:
- CreateNamespace=true # 自动创建命名空间
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m

View File

@@ -0,0 +1,24 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: argocd-server-ingress
namespace: argocd
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
traefik.ingress.kubernetes.io/router.entrypoints: websecure
spec:
tls:
- hosts:
- argocd.u6.net3w.com
secretName: argocd-tls-secret
rules:
- host: argocd.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: argocd-server
port:
name: http

View File

@@ -0,0 +1,25 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: rancher
namespace: cattle-system
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
traefik.ingress.kubernetes.io/router.entrypoints: websecure
spec:
ingressClassName: traefik
tls:
- hosts:
- rancher.u6.net3w.com
secretName: rancher-tls
rules:
- host: rancher.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: rancher
port:
number: 80

View File

@@ -0,0 +1,23 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: portainer-ingress
namespace: portainer
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- portainer.u6.net3w.com # <--- 换个新子域名
secretName: portainer-tls
rules:
- host: portainer.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: portainer
port:
number: 9000

View File

@@ -0,0 +1,148 @@
这是一个非常棒的想法!天天面对黑底白字的命令行确实容易累,而且在查看日志流、多 Pod 资源占用时图形化界面GUI有着无法比拟的优势。
对于 K3s我有 **三个最强推荐**,分别对应不同的使用场景。
---
### 方案一:最强桌面端神器 —— OpenLens (推荐开发使用)
如果您希望在**自己的电脑Windows/Mac**上管理远程的 K3s而不占用 VPS 的资源Lens 是绝对的王者。它就像是一个专门用来操作 K8s 的“远程桌面”。
**优点:**
* **零资源占用**:运行在您自己电脑上,不消耗 VPS 内存。
* **功能极其强大**:查看日志、进入终端、编辑 YAML、查看监控点击即达。
* **体验流畅**:原生应用,比网页版更丝滑。
#### 🚀 如何连接 (只需 3 步)
要让本地的 Lens 连上远程的 K3s您需要把 VPS 上的“钥匙”拿回来。
1. **在 VPS 上获取 Kubeconfig**
执行命令查看配置文件内容:
```bash
cat /etc/rancher/k3s/k3s.yaml
```
*(复制这一大段内容)*
2. **在本地电脑创建配置文件**
在您的电脑上新建一个文件(例如 `my-k3s-config`),粘贴刚才的内容。
**⚠️ 关键修改**:找到 `server: https://127.0.0.1:6443` 这一行,把 `127.0.0.1` 改成您的 **域名** (比如 `dev.u6.net3w.com`,前提是 6443 端口已开放) 或者 **VPS 公网 IP**
3. **导入 Lens**
下载并安装 **OpenLens** (免费开源版) 或 **Lens Desktop**
打开软件 -> File -> Add Cluster -> 粘贴刚才修改后的配置内容。
**瞬间,您的 K3s 集群全貌就会展现在眼前!**
---
### 方案二:最强网页版 —— Portainer (推荐运维/监控)
如果您希望打开浏览器就能管理,而且您习惯了 Docker 的管理方式,那么 **Portainer** 是不二之选。它对 K8s 的支持非常友好,界面简洁漂亮。
**优点:**
* **随时随地访问**:有浏览器就能管。
* **可视化极佳**:有漂亮的仪表盘,能看到 CPU/内存使用率。
* **部署简单**:它自己就是 K3s 里的一个 Pod。
#### 🚀 如何安装 (30秒搞定)
直接在您的 VPS 上执行:
1. **安装 Portainer**
```bash
kubectl apply -n portainer -f https://raw.githubusercontent.com/portainer/k8s/master/deploy/manifests/portainer/portainer.yaml
```
*(这一步会自动创建命名空间和所有资源)*
2. **暴露端口 (我们要用 NodePort 访问)**
默认 Portainer 用的是 LoadBalancer我们改一下让它能直接通过端口访问。
```bash
# 编辑服务配置
kubectl edit service portainer -n portainer
```
找到 `type: LoadBalancer` 改为 `type: NodePort`
或者,既然您已经玩得很溜了,**建议直接写一个 Ingress** 给它!
**创建 `portainer-ingress.yaml`**
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: portainer-ingress
namespace: portainer
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- portainer.u6.net3w.com # <--- 换个新子域名
secretName: portainer-tls
rules:
- host: portainer.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: portainer
port:
number: 9000
```
3. **应用并访问**
`kubectl apply -f portainer-ingress.yaml`
稍等片刻,访问 `https://portainer.u6.net3w.com`,设置管理员密码即可。
---
### 方案三:官方全家桶 —— Rancher (企业级)
K3s 是 Rancher Labs 开发的Rancher 是他们的旗舰管理平台。
**优点:** 功能最全,企业级标准,可以管理成百上千个集群。
**缺点:** **非常吃内存**。Rancher 本身启动就需要占用 2G+ 内存。如果您的 VPS 只有 4G 或更少,**强烈不推荐**,它会把您的业务挤垮。
---
### 💡 极客的选择K9s (终端里的 GUI)
如果您还是离不开终端,但想要图形化的便利,**K9s** 是必备神器。它是一个运行在终端里的图形界面。
**安装:**
```bash
# 在 VPS 上安装 Linuxbrew 后安装,或者直接下载二进制文件
curl -sS https://webinstall.dev/k9s | bash
```
**启动:**
输入 `k9s`
您可以像打游戏一样:
*`0` 显示所有 Pod。
*`l` 看日志。
*`s` 进入 Shell。
*`ctrl-d` 删除 Pod。
* 全键盘操作,酷炫到没朋友。
---
### 我的建议
1. **首选方案**:先在本地安装 **OpenLens**。不用在 VPS 上装任何东西,无负担,体验最好。
2. **次选方案**:如果经常需要在不同电脑上管理,在 VPS 上部署一个 **Portainer**
**您想先试试哪一个?如果是 OpenLens我可以教您怎么把 VPS 的 6443 端口安全地放行。**

View File

@@ -0,0 +1,98 @@
**有眼光!** 既然您的 VPS 配置不错(之前提到有 16GB 内存),跑 Rancher 是**绝对最爽**的选择。它就像给您的 Kubernetes 集群装了一个“Windows 操作系统”,所有操作都能点点鼠标完成。
虽然您喜欢用 YAML 硬编码(这很好),但在安装 Rancher 这种巨型软件时,**Helm**Kubernetes 的包管理器,类似于 Python 的 pip 或 Linux 的 apt是唯一标准且靠谱的途径。手动写 YAML 安装 Rancher 会涉及到几百个文件,极易出错。
请按照以下步骤把这头“牛”Rancher 的 Logo牵进您的集群。
---
### 🚀 第一步:安装 Helm (如果您还没有)
Helm 是安装 Rancher 的必备工具。在 Node 1 上执行:
```bash
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
```
### 第二步:添加 Rancher 仓库
告诉 Helm 去哪里下载 Rancher。
```bash
# 1. 添加仓库
helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
# 2. 创建 Rancher 专属的命名空间
kubectl create namespace cattle-system
```
### 第三步:一键安装 Rancher (关键)
我们要安装 Rancher并让它**自动利用**您集群里已经存在的 `cert-manager` 来申请 `rancher.u6.net3w.com` 的 HTTPS 证书。
请**直接复制**下面这整段命令(我已经帮您把参数配好了):
* **域名**: `rancher.u6.net3w.com`
* **初始密码**: `admin123456` (进去后强制要求改)
* **副本数**: `1` (为了节省资源,生产环境通常用 3但 1 个也够用)
* **证书**: 使用 Let's Encrypt 自动生成
```bash
helm install rancher rancher-latest/rancher \
--namespace cattle-system \
--set hostname=rancher.u6.net3w.com \
--set bootstrapPassword=admin123456 \
--set replicas=1 \
--set ingress.tls.source=letsEncrypt \
--set letsEncrypt.email=admin@net3w.com \
--set letsEncrypt.ingress.class=traefik
```
---
### 第四步:等待“巨兽”苏醒
Rancher 是一个庞大的 Java/Go 应用,启动需要一点时间(通常 2-5 分钟)。
您可以像看电影一样监控它的启动过程:
```bash
watch kubectl get pods -n cattle-system
```
* **阶段 1**: 您会看到 `rancher-webhook` 先启动。
* **阶段 2**: 接着 `rancher-xxxx` 主 Pod 开始 `ContainerCreating`
* **阶段 3**: 当 `rancher-xxxx` 状态变为 **`Running`** 且 **`READY 1/1`** 时,说明启动成功。
*(如果 Ready 很久都是 0/1不要急那是它在做数据库初始化只要不报错 CrashLoopBackOff 就等着)*
---
### 第五步:登录并接管一切
一旦 Pod 跑起来证书也签发好Ingress 生效)后:
1. 打开浏览器访问:**[https://rancher.u6.net3w.com](https://www.google.com/search?q=https://rancher.u6.net3w.com)**
2. **密码**:输入刚才设置的 `admin123456`
3. **接管**
* 登录后,它会提示您设置一个新的复杂密码。
* 然后您会看到一个名为 **"local"** 的集群 —— **这就是您的 K3s 集群!**
---
### 🎉 您现在可以用图形界面做什么?
进入 "local" 集群后,您可以尝试以下操作,体验一下为什么我说它是“神器”:
1. **看工作负载**:点击左侧菜单 **Workload (工作负载)**,您会看到之前的 `my-python-app`, `registry`, `wordpress` 全都在列表里CPU/内存占用一目了然。
2. **查看日志**:点进 `my-python-app` -> 点击 Pod 名字 -> 右上角三个点 -> **View Logs**。界面非常舒服,支持实时滚动。
3. **进入终端**:同上,点击 **Execute Shell**,直接在浏览器里获得一个 root 终端,再也不用 SSH 连来连去了。
4. **拖拽扩容**:找到 Deployment点一下 `+` 号,瞬间从 1 个 Pod 变成 5 个 Pod感受 Kubernetes 的弹性。
**快去试试吧!如果安装过程中卡在 Pulling Image 或者 Pending请告诉我 `kubectl get pods -n cattle-system` 的截图。**

View File

@@ -0,0 +1,59 @@
---
apiVersion: v1
kind: Service
metadata:
name: localhost-15001
namespace: default
spec:
ports:
- protocol: TCP
port: 80
targetPort: 15001
---
apiVersion: v1
kind: Endpoints
metadata:
name: localhost-15001
namespace: default
subsets:
- addresses:
- ip: 134.195.210.237
ports:
- port: 15001
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: 1go-proxy
namespace: default
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
traefik.ingress.kubernetes.io/router.entrypoints: websecure
spec:
ingressClassName: traefik
tls:
- hosts:
- 1go.u6.net3w.com
- gl.u6.net3w.com
secretName: 1go-proxy-tls
rules:
- host: 1go.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: localhost-15001
port:
number: 80
- host: gl.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: localhost-15001
port:
number: 80

View File

@@ -0,0 +1,84 @@
#!/bin/bash
#
# 节点健康检查脚本
# 使用方法: bash check-node-health.sh
#
# 颜色输出
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
echo -e "${BLUE}================================${NC}"
echo -e "${BLUE}K3s 集群健康检查${NC}"
echo -e "${BLUE}================================${NC}"
echo ""
# 1. 检查节点状态
echo -e "${YELLOW}[1/8] 检查节点状态...${NC}"
kubectl get nodes -o wide
echo ""
# 2. 检查节点资源
echo -e "${YELLOW}[2/8] 检查节点资源使用...${NC}"
kubectl top nodes 2>/dev/null || echo -e "${YELLOW}⚠ metrics-server 未就绪${NC}"
echo ""
# 3. 检查系统 Pods
echo -e "${YELLOW}[3/8] 检查系统组件...${NC}"
kubectl get pods -n kube-system
echo ""
# 4. 检查 Longhorn
echo -e "${YELLOW}[4/8] 检查 Longhorn 存储...${NC}"
kubectl get pods -n longhorn-system | head -10
echo ""
# 5. 检查 PVC
echo -e "${YELLOW}[5/8] 检查持久化存储卷...${NC}"
kubectl get pvc -A
echo ""
# 6. 检查应用 Pods
echo -e "${YELLOW}[6/8] 检查应用 Pods...${NC}"
kubectl get pods -A | grep -v "kube-system\|longhorn-system\|cert-manager" | head -20
echo ""
# 7. 检查 Ingress
echo -e "${YELLOW}[7/8] 检查 Ingress 配置...${NC}"
kubectl get ingress -A
echo ""
# 8. 检查证书
echo -e "${YELLOW}[8/8] 检查 SSL 证书...${NC}"
kubectl get certificate -A
echo ""
# 统计信息
echo -e "${BLUE}================================${NC}"
echo -e "${BLUE}集群统计信息${NC}"
echo -e "${BLUE}================================${NC}"
TOTAL_NODES=$(kubectl get nodes --no-headers | wc -l)
READY_NODES=$(kubectl get nodes --no-headers | grep " Ready " | wc -l)
TOTAL_PODS=$(kubectl get pods -A --no-headers | wc -l)
RUNNING_PODS=$(kubectl get pods -A --no-headers | grep "Running" | wc -l)
TOTAL_PVC=$(kubectl get pvc -A --no-headers | wc -l)
BOUND_PVC=$(kubectl get pvc -A --no-headers | grep "Bound" | wc -l)
echo -e "节点总数: ${GREEN}${TOTAL_NODES}${NC} (就绪: ${GREEN}${READY_NODES}${NC})"
echo -e "Pod 总数: ${GREEN}${TOTAL_PODS}${NC} (运行中: ${GREEN}${RUNNING_PODS}${NC})"
echo -e "PVC 总数: ${GREEN}${TOTAL_PVC}${NC} (已绑定: ${GREEN}${BOUND_PVC}${NC})"
echo ""
# 健康评分
if [ $READY_NODES -eq $TOTAL_NODES ] && [ $RUNNING_PODS -gt $((TOTAL_PODS * 80 / 100)) ]; then
echo -e "${GREEN}✓ 集群健康状态: 良好${NC}"
elif [ $READY_NODES -gt $((TOTAL_NODES / 2)) ]; then
echo -e "${YELLOW}⚠ 集群健康状态: 一般${NC}"
else
echo -e "${RED}✗ 集群健康状态: 异常${NC}"
fi
echo ""

View File

@@ -0,0 +1,113 @@
#!/bin/bash
#
# 快速配置脚本生成器
# 为新节点生成定制化的加入脚本
#
# 颜色输出
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
echo -e "${GREEN}================================${NC}"
echo -e "${GREEN}K3s 节点加入脚本生成器${NC}"
echo -e "${GREEN}================================${NC}"
echo ""
# 获取当前配置
MASTER_IP="134.195.210.237"
NODE_TOKEN="K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d"
echo -e "${YELLOW}当前 Master 节点信息:${NC}"
echo "IP: $MASTER_IP"
echo "Token: ${NODE_TOKEN:0:20}..."
echo ""
# 选择节点类型
echo "请选择要加入的节点类型:"
echo "1) Worker 节点 (推荐用于 2 节点方案)"
echo "2) Master 节点 (用于 HA 高可用方案)"
echo ""
read -p "请输入选项 (1 或 2): " NODE_TYPE
if [ "$NODE_TYPE" == "1" ]; then
SCRIPT_NAME="join-worker-custom.sh"
echo ""
echo -e "${GREEN}生成 Worker 节点加入脚本...${NC}"
cat > $SCRIPT_NAME << 'EOFWORKER'
#!/bin/bash
set -e
# 配置信息
MASTER_IP="134.195.210.237"
NODE_TOKEN="K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d"
echo "开始加入 Worker 节点..."
# 系统准备
swapoff -a
sed -i '/ swap / s/^/#/' /etc/fstab
apt-get update -qq
apt-get install -y curl open-iscsi nfs-common
systemctl enable --now iscsid
# 安装 k3s agent
curl -sfL https://get.k3s.io | K3S_URL=https://${MASTER_IP}:6443 \
K3S_TOKEN=${NODE_TOKEN} sh -
echo "Worker 节点加入完成!"
echo "在 Master 节点执行: kubectl get nodes"
EOFWORKER
chmod +x $SCRIPT_NAME
elif [ "$NODE_TYPE" == "2" ]; then
SCRIPT_NAME="join-master-custom.sh"
echo ""
read -p "请输入负载均衡器 IP: " LB_IP
echo -e "${GREEN}生成 Master 节点加入脚本...${NC}"
cat > $SCRIPT_NAME << EOFMASTER
#!/bin/bash
set -e
# 配置信息
FIRST_MASTER_IP="134.195.210.237"
LB_IP="$LB_IP"
NODE_TOKEN="K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d"
echo "开始加入 Master 节点 (HA 模式)..."
# 系统准备
swapoff -a
sed -i '/ swap / s/^/#/' /etc/fstab
apt-get update -qq
apt-get install -y curl open-iscsi nfs-common
systemctl enable --now iscsid
# 安装 k3s server
curl -sfL https://get.k3s.io | sh -s - server \\
--server https://\${FIRST_MASTER_IP}:6443 \\
--token \${NODE_TOKEN} \\
--tls-san=\${LB_IP} \\
--write-kubeconfig-mode 644
echo "Master 节点加入完成!"
echo "在任意 Master 节点执行: kubectl get nodes"
EOFMASTER
chmod +x $SCRIPT_NAME
else
echo "无效的选项"
exit 1
fi
echo ""
echo -e "${GREEN}✓ 脚本已生成: $SCRIPT_NAME${NC}"
echo ""
echo "使用方法:"
echo "1. 将脚本复制到新节点"
echo "2. 在新节点上执行: sudo bash $SCRIPT_NAME"
echo ""

View File

@@ -0,0 +1,137 @@
#!/bin/bash
#
# K3s Master 节点快速加入脚本 (用于 HA 集群)
# 使用方法: sudo bash join-master.sh
#
set -e
# 颜色输出
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
echo -e "${GREEN}================================${NC}"
echo -e "${GREEN}K3s Master 节点加入脚本 (HA)${NC}"
echo -e "${GREEN}================================${NC}"
echo ""
# 检查是否为 root
if [ "$EUID" -ne 0 ]; then
echo -e "${RED}错误: 请使用 sudo 运行此脚本${NC}"
exit 1
fi
# 配置信息
FIRST_MASTER_IP="134.195.210.237"
NODE_TOKEN="K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d"
echo -e "${YELLOW}第一个 Master 节点 IP: ${FIRST_MASTER_IP}${NC}"
echo ""
# 获取负载均衡器 IP
read -p "请输入负载均衡器 IP 地址: " LB_IP
if [ -z "$LB_IP" ]; then
echo -e "${RED}错误: 负载均衡器 IP 不能为空${NC}"
exit 1
fi
echo -e "${YELLOW}负载均衡器 IP: ${LB_IP}${NC}"
echo ""
# 1. 检查网络连通性
echo -e "${YELLOW}[1/6] 检查网络连通性...${NC}"
if ping -c 2 ${FIRST_MASTER_IP} > /dev/null 2>&1; then
echo -e "${GREEN}✓ 可以连接到第一个 Master 节点${NC}"
else
echo -e "${RED}✗ 无法连接到第一个 Master 节点 ${FIRST_MASTER_IP}${NC}"
exit 1
fi
if ping -c 2 ${LB_IP} > /dev/null 2>&1; then
echo -e "${GREEN}✓ 可以连接到负载均衡器${NC}"
else
echo -e "${RED}✗ 无法连接到负载均衡器 ${LB_IP}${NC}"
exit 1
fi
# 2. 检查端口
echo -e "${YELLOW}[2/6] 检查端口...${NC}"
if timeout 5 bash -c "cat < /dev/null > /dev/tcp/${FIRST_MASTER_IP}/6443" 2>/dev/null; then
echo -e "${GREEN}✓ Master 节点端口 6443 可访问${NC}"
else
echo -e "${RED}✗ Master 节点端口 6443 无法访问${NC}"
exit 1
fi
# 3. 系统准备
echo -e "${YELLOW}[3/6] 准备系统环境...${NC}"
# 禁用 swap
swapoff -a
sed -i '/ swap / s/^/#/' /etc/fstab
echo -e "${GREEN}✓ 已禁用 swap${NC}"
# 安装依赖
apt-get update -qq
apt-get install -y curl open-iscsi nfs-common > /dev/null 2>&1
systemctl enable --now iscsid > /dev/null 2>&1
echo -e "${GREEN}✓ 已安装必要依赖${NC}"
# 4. 设置主机名
echo -e "${YELLOW}[4/6] 配置主机名...${NC}"
read -p "请输入此节点的主机名 (例如: master-2): " HOSTNAME
if [ -n "$HOSTNAME" ]; then
hostnamectl set-hostname $HOSTNAME
echo -e "${GREEN}✓ 主机名已设置为: $HOSTNAME${NC}"
else
echo -e "${YELLOW}⚠ 跳过主机名设置${NC}"
fi
# 5. 安装 k3s server
echo -e "${YELLOW}[5/6] 安装 k3s server (HA 模式)...${NC}"
echo -e "${YELLOW}这可能需要几分钟时间...${NC}"
curl -sfL https://get.k3s.io | sh -s - server \
--server https://${FIRST_MASTER_IP}:6443 \
--token ${NODE_TOKEN} \
--tls-san=${LB_IP} \
--write-kubeconfig-mode 644 > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo -e "${GREEN}✓ k3s server 安装成功${NC}"
else
echo -e "${RED}✗ k3s server 安装失败${NC}"
exit 1
fi
# 6. 验证安装
echo -e "${YELLOW}[6/6] 验证安装...${NC}"
sleep 15
if systemctl is-active --quiet k3s; then
echo -e "${GREEN}✓ k3s 服务运行正常${NC}"
else
echo -e "${RED}✗ k3s 服务未运行${NC}"
echo -e "${YELLOW}查看日志: sudo journalctl -u k3s -f${NC}"
exit 1
fi
echo ""
echo -e "${GREEN}================================${NC}"
echo -e "${GREEN}✓ Master 节点加入成功!${NC}"
echo -e "${GREEN}================================${NC}"
echo ""
echo -e "${YELLOW}下一步操作:${NC}"
echo -e "1. 在任意 Master 节点执行以下命令查看节点状态:"
echo -e " ${GREEN}kubectl get nodes${NC}"
echo ""
echo -e "2. 检查 etcd 集群状态:"
echo -e " ${GREEN}kubectl get pods -n kube-system | grep etcd${NC}"
echo ""
echo -e "3. 查看节点详细信息:"
echo -e " ${GREEN}kubectl describe node $HOSTNAME${NC}"
echo ""
echo -e "4. 更新负载均衡器配置,添加此节点的 IP"
echo ""

View File

@@ -0,0 +1,116 @@
#!/bin/bash
#
# K3s Worker 节点快速加入脚本
# 使用方法: sudo bash join-worker.sh
#
set -e
# 颜色输出
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
echo -e "${GREEN}================================${NC}"
echo -e "${GREEN}K3s Worker 节点加入脚本${NC}"
echo -e "${GREEN}================================${NC}"
echo ""
# 检查是否为 root
if [ "$EUID" -ne 0 ]; then
echo -e "${RED}错误: 请使用 sudo 运行此脚本${NC}"
exit 1
fi
# 配置信息
MASTER_IP="134.195.210.237"
NODE_TOKEN="K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d"
echo -e "${YELLOW}Master 节点 IP: ${MASTER_IP}${NC}"
echo ""
# 1. 检查网络连通性
echo -e "${YELLOW}[1/6] 检查网络连通性...${NC}"
if ping -c 2 ${MASTER_IP} > /dev/null 2>&1; then
echo -e "${GREEN}✓ 网络连通正常${NC}"
else
echo -e "${RED}✗ 无法连接到 Master 节点 ${MASTER_IP}${NC}"
exit 1
fi
# 2. 检查端口
echo -e "${YELLOW}[2/6] 检查 Master 节点端口 6443...${NC}"
if timeout 5 bash -c "cat < /dev/null > /dev/tcp/${MASTER_IP}/6443" 2>/dev/null; then
echo -e "${GREEN}✓ 端口 6443 可访问${NC}"
else
echo -e "${RED}✗ 端口 6443 无法访问,请检查防火墙${NC}"
exit 1
fi
# 3. 系统准备
echo -e "${YELLOW}[3/6] 准备系统环境...${NC}"
# 禁用 swap
swapoff -a
sed -i '/ swap / s/^/#/' /etc/fstab
echo -e "${GREEN}✓ 已禁用 swap${NC}"
# 安装依赖
apt-get update -qq
apt-get install -y curl open-iscsi nfs-common > /dev/null 2>&1
systemctl enable --now iscsid > /dev/null 2>&1
echo -e "${GREEN}✓ 已安装必要依赖${NC}"
# 4. 设置主机名
echo -e "${YELLOW}[4/6] 配置主机名...${NC}"
read -p "请输入此节点的主机名 (例如: worker-1): " HOSTNAME
if [ -n "$HOSTNAME" ]; then
hostnamectl set-hostname $HOSTNAME
echo -e "${GREEN}✓ 主机名已设置为: $HOSTNAME${NC}"
else
echo -e "${YELLOW}⚠ 跳过主机名设置${NC}"
fi
# 5. 安装 k3s agent
echo -e "${YELLOW}[5/6] 安装 k3s agent...${NC}"
echo -e "${YELLOW}这可能需要几分钟时间...${NC}"
curl -sfL https://get.k3s.io | K3S_URL=https://${MASTER_IP}:6443 \
K3S_TOKEN=${NODE_TOKEN} \
sh - > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo -e "${GREEN}✓ k3s agent 安装成功${NC}"
else
echo -e "${RED}✗ k3s agent 安装失败${NC}"
exit 1
fi
# 6. 验证安装
echo -e "${YELLOW}[6/6] 验证安装...${NC}"
sleep 10
if systemctl is-active --quiet k3s-agent; then
echo -e "${GREEN}✓ k3s-agent 服务运行正常${NC}"
else
echo -e "${RED}✗ k3s-agent 服务未运行${NC}"
echo -e "${YELLOW}查看日志: sudo journalctl -u k3s-agent -f${NC}"
exit 1
fi
echo ""
echo -e "${GREEN}================================${NC}"
echo -e "${GREEN}✓ Worker 节点加入成功!${NC}"
echo -e "${GREEN}================================${NC}"
echo ""
echo -e "${YELLOW}下一步操作:${NC}"
echo -e "1. 在 Master 节点执行以下命令查看节点状态:"
echo -e " ${GREEN}kubectl get nodes${NC}"
echo ""
echo -e "2. 为节点添加标签 (在 Master 节点执行):"
echo -e " ${GREEN}kubectl label nodes $HOSTNAME node-role.kubernetes.io/worker=worker${NC}"
echo ""
echo -e "3. 查看节点详细信息:"
echo -e " ${GREEN}kubectl describe node $HOSTNAME${NC}"
echo ""

View File

@@ -0,0 +1,88 @@
#!/bin/bash
# 项目状态检查脚本
# 扫描仓库并显示项目状态、部署情况、文档完整性等
echo "╔════════════════════════════════════════════════════════════════╗"
echo "║ K3s Monorepo - 项目状态 ║"
echo "╚════════════════════════════════════════════════════════════════╝"
echo ""
# 检查已部署的应用
echo "📦 已部署应用:"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
if command -v kubectl &> /dev/null; then
kubectl get deployments -A 2>/dev/null | grep -E "(php-test|go01|wordpress|registry|n8n|gitea)" | \
awk '{printf " ✅ %-25s %-15s %s/%s replicas\n", $2, $1, $4, $3}' || echo " ⚠️ 无法获取部署信息"
else
echo " ⚠️ kubectl 未安装,无法检查部署状态"
fi
echo ""
echo "📱 应用项目:"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
# 检查每个应用目录
for dir in php-test go01 rails/*/ www; do
if [ -d "$dir" ]; then
name=$(basename "$dir")
readme=""
dockerfile=""
k8s=""
[ -f "$dir/README.md" ] && readme="📄" || readme=" "
[ -f "$dir/Dockerfile" ] && dockerfile="🐳" || dockerfile=" "
[ -d "$dir/k8s" ] || [ -f "$dir/k8s-deployment.yaml" ] && k8s="☸️ " || k8s=" "
printf " %-30s %s %s %s\n" "$name" "$readme" "$dockerfile" "$k8s"
fi
done
echo ""
echo "🏗️ 基础设施服务:"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
for dir in k3s/*/; do
if [ -d "$dir" ]; then
name=$(basename "$dir")
yaml_count=$(find "$dir" -name "*.yaml" 2>/dev/null | wc -l)
printf " %-30s %2d YAML 文件\n" "$name" "$yaml_count"
fi
done
echo ""
echo "🛠️ 平台工具:"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
for dir in traefik kuboard proxy; do
if [ -d "$dir" ]; then
yaml_count=$(find "$dir" -name "*.yaml" 2>/dev/null | wc -l)
printf " %-30s %2d YAML 文件\n" "$dir" "$yaml_count"
fi
done
echo ""
echo "📊 统计信息:"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
total_yaml=$(find . -name "*.yaml" -type f 2>/dev/null | wc -l)
total_md=$(find . -name "*.md" -type f 2>/dev/null | wc -l)
total_sh=$(find . -name "*.sh" -type f 2>/dev/null | wc -l)
total_dockerfile=$(find . -name "Dockerfile" -type f 2>/dev/null | wc -l)
echo " YAML 配置文件: $total_yaml"
echo " Markdown 文档: $total_md"
echo " Shell 脚本: $total_sh"
echo " Dockerfile: $total_dockerfile"
echo ""
echo "💡 提示:"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo " 📄 = 有 README 文档"
echo " 🐳 = 有 Dockerfile"
echo " ☸️ = 有 Kubernetes 配置"
echo ""
echo " 查看详细信息: cat PROJECT-INDEX.md"
echo " 查看目录结构: ./scripts/project-tree.sh"
echo " 查看集群状态: make status"
echo ""

View File

@@ -0,0 +1,59 @@
#!/bin/bash
# 目录树生成脚本
# 生成清晰的项目目录结构,过滤掉不必要的文件
echo "╔════════════════════════════════════════════════════════════════╗"
echo "║ K3s Monorepo - 目录结构 ║"
echo "╚════════════════════════════════════════════════════════════════╝"
echo ""
# 检查 tree 命令是否存在
if ! command -v tree &> /dev/null; then
echo "⚠️ tree 命令未安装"
echo ""
echo "安装方法:"
echo " Ubuntu/Debian: sudo apt-get install tree"
echo " CentOS/RHEL: sudo yum install tree"
echo " macOS: brew install tree"
echo ""
echo "使用 ls 命令替代显示:"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo ""
ls -lh --group-directories-first 2>/dev/null || ls -lh
exit 0
fi
echo "📁 根目录结构 (2层):"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
tree -L 2 -I 'node_modules|.git|__pycache__|*.pyc|.claude' \
--dirsfirst \
-F \
/home/fei/k3s
echo ""
echo "📱 应用项目详情:"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo ""
echo "php-test/ (PHP 应用):"
tree -L 2 -I 'node_modules|.git' php-test 2>/dev/null || echo " 目录不存在"
echo ""
echo "go01/ (Go 应用):"
tree -L 2 -I 'node_modules|.git' go01 2>/dev/null || echo " 目录不存在"
echo ""
echo "🏗️ 基础设施配置:"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo ""
echo "k3s/ (基础设施):"
tree -L 2 -I 'node_modules|.git' k3s 2>/dev/null || echo " 目录不存在"
echo ""
echo "💡 提示:"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo " 查看完整结构: tree -I 'node_modules|.git'"
echo " 查看项目索引: cat PROJECT-INDEX.md"
echo " 查看项目状态: ./scripts/project-status.sh"
echo ""

228
005-docs/notes/biji.md Normal file
View File

@@ -0,0 +1,228 @@
k3s集群部署完成!
📊 部署总结
✅ 集群状态
- k3s版本: v1.34.3+k3s1
- 节点状态: Ready (control-plane)
- 运行中的Pods: 35个
- 存储系统: Longhorn (分布式存储)
- 证书管理: cert-manager + Let's Encrypt
🚀 已部署的服务
所有服务都已实现无状态化部署,数据和存储完全解耦:
┌─────────────────┬───────────────────────────────┬────────────┬─────────────┬─────────────────┐
│ 服务 │ 域名 │ 状态 │ 存储 │ HTTPS │
├─────────────────┼───────────────────────────────┼────────────┼─────────────┼─────────────────┤
│ WordPress博客 │ https://blog.u6.net3w.com │ ✅ Running │ 2Gi (MySQL) │ ✅ │
├─────────────────┼───────────────────────────────┼────────────┼─────────────┼─────────────────┤
│ Docker Registry │ https://registry.u6.net3w.com │ ✅ Running │ 20Gi │ ⏳ (证书申请中) │
├─────────────────┼───────────────────────────────┼────────────┼─────────────┼─────────────────┤
│ n8n工作流 │ https://n8n.u6.net3w.com │ ✅ Running │ 5Gi │ ✅ │
├─────────────────┼───────────────────────────────┼────────────┼─────────────┼─────────────────┤
│ Gitea代码仓库 │ https://git.u6.net3w.com │ ✅ Running │ 10Gi │ ✅ │
├─────────────────┼───────────────────────────────┼────────────┼─────────────┼─────────────────┤
│ Uptime Kuma监控 │ https://status.u6.net3w.com │ ✅ Running │ 2Gi │ ✅ │
├─────────────────┼───────────────────────────────┼────────────┼─────────────┼─────────────────┤
│ 导航页 │ https://nav.u6.net3w.com │ ✅ Running │ - │ ✅ │
└─────────────────┴───────────────────────────────┴────────────┴─────────────┴─────────────────┘
💾 存储配置
所有数据都使用Longhorn持久化存储,实现了容器与数据的完全解耦:
- MySQL数据库: 2Gi独立存储卷
- Docker镜像仓库: 20Gi独立存储卷
- Gitea代码仓库: 10Gi独立存储卷
- n8n工作流数据: 5Gi独立存储卷
- Uptime Kuma监控数据: 2Gi独立存储卷
🔐 安全配置
- ✅ 自动HTTPS证书(Let's Encrypt)
- ✅ Traefik Ingress Controller
- ✅ Docker Registry认证(用户名: admin, 密码: 123456)
📝 常用命令
# 查看所有服务
kubectl get pods -A
# 查看存储卷
kubectl get pvc -A
# 查看域名和证书
kubectl get ingress -A
kubectl get certificate -A
# 查看Longhorn UI (可选)
kubectl get svc -n longhorn-system
⚠️ 注意事项
1. Registry证书: registry.u6.net3w.com的SSL证书还在申请中,稍等几分钟会自动完成
2. DNS解析: 确保所有*.u6.net3w.com子域名都已正确解析到 134.195.210.237
3. 数据持久化: 所有数据都存储在Longhorn卷中,容器重启不会丢失数据
所有服务已成功实现无状态化部署,可以随时重启、扩容或迁移! 🎊
● ✅ 部署完成!
Registry 账号密码
- 用户名: admin
- 密码: 123456
Web 界面访问地址
🌐 https://registry-ui.u6.net3w.com
部署状态
- ✅ Registry UI Pod 运行中
- ✅ Service 已创建
- ✅ Ingress 已配置
- ✅ SSL 证书已自动签发Let's Encrypt
使用说明
1. 访问 Web 界面
- 打开浏览器访问: https://registry-ui.u6.net3w.com
- 现在可以看到漂亮的 Web 界面,不再是空白页面
2. 浏览镜像
- UI 会自动连接到您的 Registry
- 可以查看所有已推送的镜像
- 支持搜索和过滤功能
3. Docker 命令使用
# 登录 Registry
docker login registry.u6.net3w.com
# 用户名: admin
# 密码: 123456
# 推送镜像
docker tag my-image:latest registry.u6.net3w.com/my-image:latest
docker push registry.u6.net3w.com/my-image:latest
# 拉取镜像
docker pull registry.u6.net3w.com/my-image:latest
配置文件位置
- Registry 配置: /home/fei/k3s/k3s/registry/registry-stack.yaml
- UI 配置: /home/fei/k3s/k3s/registry/registry-ui.yaml
- 密码文件: /home/fei/k3s/k3s/registry/hardcode-secret.yaml
现在您可以通过 Web 界面方便地管理 Docker 镜像了!
认证已经成功!
我刚才测试了 docker login登录成功了
docker login registry.u6.net3w.com --username admin --password
123456
# 输出: Login Succeeded ✅
可能的问题原因
您遇到 401 错误可能是因为:
1. 密码输入错误 - 在交互式输入时可能有隐藏字符或空格
2. Docker 缓存 - 可能缓存了旧的错误认证信息
解决方案
请尝试以下方法:
方法 1清除旧的认证信息后重试
# 先登出
sudo docker logout registry.u6.net3w.com
# 再登录
sudo docker login registry.u6.net3w.com
# Username: admin
# Password: 123456
方法 2使用命令行参数推荐
echo "123456" | sudo docker login registry.u6.net3w.com --username
admin --password-stdin
方法 3直接使用密码参数
sudo docker login registry.u6.net3w.com --username admin --password
123456
验证登录成功后
登录成功后,您可以:
# 测试推送镜像
sudo docker tag nginx:latest registry.u6.net3w.com/nginx:test
sudo docker push registry.u6.net3w.com/nginx:test
# 查看 Web UI
# 访问: https://registry-ui.u6.net3w.com
请尝试上述方法之一,应该就能成功登录了!如果还有问题,请告诉我具体的
错误信息。
✻ Crunched for 1m 17s
● Background command "Monitor registry logs in real-time" completed
(exit code 0)
● 我看到后台日志监控任务已完成。让我读取完整的日志输出。
● Read agent output bb664e0
⎿ Read 16 lines
● 完美!从日志中我可以看到 Docker 登录已经成功了!
日志分析
查看日志的第 35-39 行,显示了完整的 Docker 登录流程:
1. 第一次请求401
"GET /v2/" HTTP/1.1" 401 87 "" "Docker-Client/28.2.2 (linux)"
1. Docker 客户端先发送不带认证的请求,收到 401 要求认证
2. 第二次请求200 成功):
level=info msg="authorized request" ... "Docker-Client/28.2.2
(linux)"
"GET /v2/" HTTP/1.1" 200 2 "" "Docker-Client/28.2.2 (linux)"
2. Docker 客户端发送带认证的请求,认证成功!
总结
✅ 问题已解决
- Registry 认证配置正确
- 账号密码admin / 123456
- Docker 登录测试成功
您现在可以使用的服务
1. Docker Registry API
sudo docker login registry.u6.net3w.com
# Username: admin
# Password: 123456
2. Web UI 界面
- 访问https://registry-ui.u6.net3w.com
- 可视化管理您的镜像
使用示例
# 推送镜像
sudo docker tag my-app:latest registry.u6.net3w.com/my-app:latest
sudo docker push registry.u6.net3w.com/my-app:latest
# 拉取镜像
sudo docker pull registry.u6.net3w.com/my-app:latest
如果您之前遇到的 401 错误还在,请尝试我之前建议的方法(先 logout 再
login或使用 --password-stdin。Registry 本身已经完全正常工作了!

94
005-docs/notes/gitops.md Normal file
View File

@@ -0,0 +1,94 @@
完美GitOps 自动部署环境已经完全配置好了!
🎉 GitOps 实现条件评估结果
✅ 所有条件已满足!
┌────────────────┬──────┬──────────────────────────────┐
│ 条件 │ 状态 │ 说明 │
├────────────────┼──────┼──────────────────────────────┤
│ k3s 集群 │ ✅ │ v1.34.3+k3s1 运行中 │
├────────────────┼──────┼──────────────────────────────┤
│ Git 仓库 │ ✅ │ 本地仓库已初始化,配置已提交 │
├────────────────┼──────┼──────────────────────────────┤
│ Gitea 代码托管 │ ✅ │ https://git.u6.net3w.com │
├────────────────┼──────┼──────────────────────────────┤
│ ArgoCD │ ✅ │ https://argocd.u6.net3w.com │
├────────────────┼──────┼──────────────────────────────┤
│ 自动同步配置 │ ✅ │ argocd-app.yaml 已创建 │
├────────────────┼──────┼──────────────────────────────┤
│ HTTPS 证书 │ ✅ │ Let's Encrypt 自动签发 │
└────────────────┴──────┴──────────────────────────────┘
🚀 下一步操作指南
1⃣ 访问 Gitea 创建仓库
访问: https://git.u6.net3w.com
1. 注册账号(首次访问)
2. 创建新仓库: k3s-configs
3. 记录仓库 URL
2⃣ 推送代码到 Gitea
# 添加远程仓库(替换为你的用户名)
git remote add origin
https://git.u6.net3w.com/你的用户名/k3s-configs.git
# 推送代码
git push -u origin master
3⃣ 配置 ArgoCD
访问: https://argocd.u6.net3w.com
用户名: admin
密码: R5ALO8NIRi4KoN8Q
步骤:
1. Settings → Repositories → Connect Repo
2. 填入 Gitea 仓库信息
3. 应用 ArgoCD Application:
kubectl apply -f k3s/argocd-app.yaml
4⃣ 测试自动部署
# 修改任意配置文件
vim k3s/my-blog/02-wordpress.yaml
# 提交并推送
git add .
git commit -m "Test auto-sync"
git push
# ArgoCD 会在 3 分钟内自动同步到集群!
📊 当前部署状态
✅ k3s 集群: 运行中 (35+ pods)
✅ ArgoCD: https://argocd.u6.net3w.com
✅ Gitea: https://git.u6.net3w.com
✅ WordPress: https://blog.u6.net3w.com
✅ Docker Registry: https://registry.u6.net3w.com
✅ n8n: https://n8n.u6.net3w.com
✅ Uptime Kuma: https://status.u6.net3w.com
✅ 导航页: https://nav.u6.net3w.com
🎯 GitOps 工作流
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ 修改 YAML │ ───> │ Git Push │ ───> │ Gitea │
└─────────────┘ └─────────────┘
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ k3s 集群 │ <─── │ ArgoCD │ <─── │ 自动检测 │
│ 更新完成 │ │ 自动同步 │ │ 变更 │
└─────────────┘ └─────────────┘
所有实现条件已满足! 你现在可以实现:
- ✅ 更新 YAML 到 Git
- ✅ 自动同步到生产环境
- ✅ 版本控制和快速回滚
- ✅ 完整的审计追踪
详细使用说明请查看 README.md 文件!🎊
✻ Churned for 4m 47s

View File

@@ -0,0 +1,844 @@
# K3s 集群扩展指南
## 📋 目录
- [当前集群状态](#当前集群状态)
- [前置条件](#前置条件)
- [架构设计方案](#架构设计方案)
- [2节点集群1 Master + 2 Worker](#2节点集群1-master--2-worker)
- [4节点集群3 Master + 4 Worker](#4节点集群3-master--4-worker)
- [6节点集群3 Master + 6 Worker](#6节点集群3-master--6-worker)
- [节点加入步骤](#节点加入步骤)
- [高可用配置](#高可用配置)
- [存储配置](#存储配置)
- [验证和测试](#验证和测试)
- [故障排查](#故障排查)
---
## 📊 当前集群状态
```
Master 节点: vmus9
IP 地址: 134.195.210.237
k3s 版本: v1.34.3+k3s1
节点令牌: K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d
```
**重要**: 请妥善保管节点令牌,这是其他节点加入集群的凭证!
---
## ✅ 前置条件
### 所有新节点需要满足:
#### 1. 硬件要求
```
最低配置:
- CPU: 2 核
- 内存: 2GB (建议 4GB+)
- 磁盘: 20GB (Longhorn 存储建议 50GB+)
推荐配置:
- CPU: 4 核
- 内存: 8GB
- 磁盘: 100GB SSD
```
#### 2. 操作系统
```bash
# 支持的系统
- Ubuntu 20.04/22.04/24.04
- Debian 10/11/12
- CentOS 7/8
- RHEL 7/8
# 检查系统版本
cat /etc/os-release
```
#### 3. 网络要求
```bash
# 所有节点之间需要能够互相访问
# 需要开放的端口:
Master 节点:
- 6443: Kubernetes API Server
- 10250: Kubelet metrics
- 2379-2380: etcd (仅 HA 模式)
Worker 节点:
- 10250: Kubelet metrics
- 30000-32767: NodePort Services
所有节点:
- 8472: Flannel VXLAN (UDP)
- 51820: Flannel WireGuard (UDP)
```
#### 4. 系统准备
在每个新节点上执行:
```bash
# 1. 更新系统
sudo apt update && sudo apt upgrade -y
# 2. 禁用 swap (k8s 要求)
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
# 3. 配置主机名 (每个节点不同)
sudo hostnamectl set-hostname worker-node-1
# 4. 配置时间同步
sudo apt install -y chrony
sudo systemctl enable --now chrony
# 5. 安装必要工具
sudo apt install -y curl wget git
# 6. 配置防火墙 (如果启用)
# Ubuntu/Debian
sudo ufw allow 6443/tcp
sudo ufw allow 10250/tcp
sudo ufw allow 8472/udp
sudo ufw allow 51820/udp
```
---
## 🏗️ 架构设计方案
### 方案一2节点集群1 Master + 2 Worker
**适用场景**: 开发/测试环境,小型应用
```
┌─────────────────────────────────────────────────┐
│ 负载均衡 (可选) │
│ *.u6.net3w.com (Traefik) │
└─────────────────────────────────────────────────┘
┌─────────────┼─────────────┐
│ │ │
┌───────▼──────┐ ┌────▼─────┐ ┌────▼─────┐
│ Master │ │ Worker-1 │ │ Worker-2 │
│ vmus9 │ │ │ │ │
│ Control Plane│ │ 应用负载 │ │ 应用负载 │
│ + etcd │ │ │ │ │
│ 134.195.x.x │ │ 新节点1 │ │ 新节点2 │
└──────────────┘ └──────────┘ └──────────┘
```
**特点**:
- ✅ 简单易维护
- ✅ 成本低
- ❌ Master 单点故障
- ❌ 不适合生产环境
**资源分配建议**:
- Master: 4C8G (运行控制平面 + 部分应用)
- Worker-1: 4C8G (运行应用负载)
- Worker-2: 4C8G (运行应用负载)
---
### 方案二4节点集群3 Master + 4 Worker
**适用场景**: 生产环境,中等规模应用
```
┌──────────────────────────────────────────────────┐
│ 外部负载均衡 (必需) │
│ HAProxy/Nginx/云厂商 LB │
│ *.u6.net3w.com │
└──────────────────────────────────────────────────┘
┌─────────────┼─────────────┬─────────────┐
│ │ │ │
┌───────▼──────┐ ┌────▼─────┐ ┌────▼─────┐ ┌─────▼────┐
│ Master-1 │ │ Master-2 │ │ Master-3 │ │ Worker-1 │
│ vmus9 │ │ │ │ │ │ │
│ Control Plane│ │ Control │ │ Control │ │ 应用负载 │
│ + etcd │ │ + etcd │ │ + etcd │ │ │
└──────────────┘ └──────────┘ └──────────┘ └──────────┘
┌──────────┐
│ Worker-2 │
│ 应用负载 │
└──────────┘
┌──────────┐
│ Worker-3 │
│ 应用负载 │
└──────────┘
┌──────────┐
│ Worker-4 │
│ 应用负载 │
└──────────┘
```
**特点**:
- ✅ 高可用 (HA)
- ✅ Master 节点冗余
- ✅ 适合生产环境
- ✅ 可承载中等规模应用
- ⚠️ 需要外部负载均衡
**资源分配建议**:
- Master-1/2/3: 4C8G (仅运行控制平面)
- Worker-1/2/3/4: 8C16G (运行应用负载)
**etcd 集群**: 3 个 Master 节点组成 etcd 集群,可容忍 1 个节点故障
---
### 方案三6节点集群3 Master + 6 Worker
**适用场景**: 大规模生产环境,高负载应用
```
┌──────────────────────────────────────────────────┐
│ 外部负载均衡 (必需) │
│ HAProxy/Nginx/云厂商 LB │
│ *.u6.net3w.com │
└──────────────────────────────────────────────────┘
┌─────────────┼─────────────┬─────────────┐
│ │ │ │
┌───────▼──────┐ ┌────▼─────┐ ┌────▼─────┐ │
│ Master-1 │ │ Master-2 │ │ Master-3 │ │
│ vmus9 │ │ │ │ │ │
│ Control Plane│ │ Control │ │ Control │ │
│ + etcd │ │ + etcd │ │ + etcd │ │
└──────────────┘ └──────────┘ └──────────┘ │
┌─────────────┬─────────────┬─────────────┘
│ │ │
┌───────▼──────┐ ┌────▼─────┐ ┌────▼─────┐
│ Worker-1 │ │ Worker-2 │ │ Worker-3 │
│ Web 应用层 │ │ Web 层 │ │ Web 层 │
└──────────────┘ └──────────┘ └──────────┘
┌──────────────┐ ┌──────────┐ ┌──────────┐
│ Worker-4 │ │ Worker-5 │ │ Worker-6 │
│ 数据库层 │ │ 缓存层 │ │ 存储层 │
└──────────────┘ └──────────┘ └──────────┘
```
**特点**:
- ✅ 高可用 + 高性能
- ✅ 可按功能分层部署
- ✅ 支持大规模应用
- ✅ Longhorn 存储性能最佳
- ⚠️ 管理复杂度较高
- ⚠️ 成本较高
**资源分配建议**:
- Master-1/2/3: 4C8G (专用控制平面)
- Worker-1/2/3: 8C16G (Web 应用层)
- Worker-4: 8C32G (数据库层,高内存)
- Worker-5: 8C16G (缓存层)
- Worker-6: 4C8G + 200GB SSD (存储层)
**节点标签策略**:
```bash
# Web 层
kubectl label nodes worker-1 node-role=web
kubectl label nodes worker-2 node-role=web
kubectl label nodes worker-3 node-role=web
# 数据库层
kubectl label nodes worker-4 node-role=database
# 缓存层
kubectl label nodes worker-5 node-role=cache
# 存储层
kubectl label nodes worker-6 node-role=storage
```
---
## 🚀 节点加入步骤
### 场景 A: 加入 Worker 节点(适用于 2 节点方案)
#### 在新节点上执行:
```bash
# 1. 设置 Master 节点信息
export MASTER_IP="134.195.210.237"
export NODE_TOKEN="K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d"
# 2. 安装 k3s agent (Worker 节点)
curl -sfL https://get.k3s.io | K3S_URL=https://${MASTER_IP}:6443 \
K3S_TOKEN=${NODE_TOKEN} \
sh -
# 3. 验证安装
sudo systemctl status k3s-agent
# 4. 检查节点是否加入
# (在 Master 节点执行)
kubectl get nodes
```
#### 为 Worker 节点添加标签:
```bash
# 在 Master 节点执行
kubectl label nodes <worker-node-name> node-role.kubernetes.io/worker=worker
kubectl label nodes <worker-node-name> workload=application
```
---
### 场景 B: 加入 Master 节点(适用于 4/6 节点 HA 方案)
#### 前提条件:需要外部负载均衡器
##### 1. 配置外部负载均衡器
**选项 1: 使用 HAProxy**
在一台独立服务器上安装 HAProxy
```bash
# 安装 HAProxy
sudo apt install -y haproxy
# 配置 HAProxy
sudo tee /etc/haproxy/haproxy.cfg > /dev/null <<EOF
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
log global
mode tcp
option tcplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
frontend k3s-api
bind *:6443
mode tcp
default_backend k3s-masters
backend k3s-masters
mode tcp
balance roundrobin
option tcp-check
server master-1 134.195.210.237:6443 check fall 3 rise 2
server master-2 <MASTER-2-IP>:6443 check fall 3 rise 2
server master-3 <MASTER-3-IP>:6443 check fall 3 rise 2
EOF
# 重启 HAProxy
sudo systemctl restart haproxy
sudo systemctl enable haproxy
```
**选项 2: 使用 Nginx**
```bash
# 安装 Nginx
sudo apt install -y nginx
# 配置 Nginx Stream
sudo tee /etc/nginx/nginx.conf > /dev/null <<EOF
stream {
upstream k3s_servers {
server 134.195.210.237:6443 max_fails=3 fail_timeout=5s;
server <MASTER-2-IP>:6443 max_fails=3 fail_timeout=5s;
server <MASTER-3-IP>:6443 max_fails=3 fail_timeout=5s;
}
server {
listen 6443;
proxy_pass k3s_servers;
}
}
EOF
# 重启 Nginx
sudo systemctl restart nginx
```
##### 2. 在第一个 Master 节点(当前节点)启用 HA
```bash
# 在当前 Master 节点执行
export LB_IP="<负载均衡器IP>"
# 重新安装 k3s 为 HA 模式
curl -sfL https://get.k3s.io | sh -s - server \
--cluster-init \
--tls-san=${LB_IP} \
--write-kubeconfig-mode 644
# 获取新的 token
sudo cat /var/lib/rancher/k3s/server/node-token
```
##### 3. 加入第二个 Master 节点
```bash
# 在新的 Master 节点执行
export MASTER_IP="134.195.210.237" # 第一个 Master
export LB_IP="<负载均衡器IP>"
export NODE_TOKEN="<新的 token>"
curl -sfL https://get.k3s.io | sh -s - server \
--server https://${MASTER_IP}:6443 \
--token ${NODE_TOKEN} \
--tls-san=${LB_IP} \
--write-kubeconfig-mode 644
```
##### 4. 加入第三个 Master 节点
```bash
# 在第三个 Master 节点执行(同上)
export MASTER_IP="134.195.210.237"
export LB_IP="<负载均衡器IP>"
export NODE_TOKEN="<token>"
curl -sfL https://get.k3s.io | sh -s - server \
--server https://${MASTER_IP}:6443 \
--token ${NODE_TOKEN} \
--tls-san=${LB_IP} \
--write-kubeconfig-mode 644
```
##### 5. 验证 HA 集群
```bash
# 检查所有 Master 节点
kubectl get nodes
# 检查 etcd 集群状态
kubectl get pods -n kube-system | grep etcd
# 检查 etcd 成员
sudo k3s etcd-snapshot save --etcd-s3=false
```
---
### 场景 C: 混合加入(先加 Master 再加 Worker
**推荐顺序**:
1. 配置外部负载均衡器
2. 转换第一个节点为 HA 模式
3. 加入第 2、3 个 Master 节点
4. 验证 Master 集群正常
5. 依次加入 Worker 节点
---
## 💾 存储配置
### Longhorn 多节点配置
当集群有 3+ 节点时Longhorn 可以提供分布式存储和数据冗余。
#### 1. 在所有节点安装依赖
```bash
# 在每个节点执行
sudo apt install -y open-iscsi nfs-common
# 启动 iscsid
sudo systemctl enable --now iscsid
```
#### 2. 配置 Longhorn 副本数
```bash
# 在 Master 节点执行
kubectl edit settings.longhorn.io default-replica-count -n longhorn-system
# 修改为:
# value: "3" # 3 副本(需要至少 3 个节点)
# value: "2" # 2 副本(需要至少 2 个节点)
```
#### 3. 为节点添加存储标签
```bash
# 标记哪些节点用于存储
kubectl label nodes worker-1 node.longhorn.io/create-default-disk=true
kubectl label nodes worker-2 node.longhorn.io/create-default-disk=true
kubectl label nodes worker-3 node.longhorn.io/create-default-disk=true
# 排除某些节点(如纯计算节点)
kubectl label nodes worker-4 node.longhorn.io/create-default-disk=false
```
#### 4. 配置存储路径
```bash
# 在每个存储节点创建目录
sudo mkdir -p /var/lib/longhorn
sudo chmod 700 /var/lib/longhorn
```
#### 5. 访问 Longhorn UI
```bash
# 创建 Ingress (如果还没有)
kubectl apply -f k3s/my-blog/longhorn-ingress.yaml
# 访问: https://longhorn.u6.net3w.com
```
---
## ✅ 验证和测试
### 1. 检查节点状态
```bash
# 查看所有节点
kubectl get nodes -o wide
# 查看节点详细信息
kubectl describe node <node-name>
# 查看节点资源使用
kubectl top nodes
```
### 2. 测试 Pod 调度
```bash
# 创建测试 Deployment
kubectl create deployment nginx-test --image=nginx --replicas=6
# 查看 Pod 分布
kubectl get pods -o wide
# 清理测试
kubectl delete deployment nginx-test
```
### 3. 测试存储
```bash
# 创建测试 PVC
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 1Gi
EOF
# 检查 PVC 状态
kubectl get pvc test-pvc
# 清理
kubectl delete pvc test-pvc
```
### 4. 测试高可用(仅 HA 集群)
```bash
# 模拟 Master 节点故障
# 在一个 Master 节点执行
sudo systemctl stop k3s
# 在另一个节点检查集群是否正常
kubectl get nodes
# 恢复节点
sudo systemctl start k3s
```
### 5. 测试网络连通性
```bash
# 在 Master 节点创建测试 Pod
kubectl run test-pod --image=busybox --restart=Never -- sleep 3600
# 进入 Pod 测试网络
kubectl exec -it test-pod -- sh
# 在 Pod 内测试
ping 8.8.8.8
nslookup kubernetes.default
# 清理
kubectl delete pod test-pod
```
---
## 🔧 故障排查
### 问题 1: 节点无法加入集群
**症状**: `k3s-agent` 服务启动失败
**排查步骤**:
```bash
# 1. 检查服务状态
sudo systemctl status k3s-agent
# 2. 查看日志
sudo journalctl -u k3s-agent -f
# 3. 检查网络连通性
ping <MASTER_IP>
telnet <MASTER_IP> 6443
# 4. 检查 token 是否正确
echo $NODE_TOKEN
# 5. 检查防火墙
sudo ufw status
```
**解决方案**:
```bash
# 重新安装
sudo /usr/local/bin/k3s-agent-uninstall.sh
curl -sfL https://get.k3s.io | K3S_URL=https://${MASTER_IP}:6443 \
K3S_TOKEN=${NODE_TOKEN} sh -
```
---
### 问题 2: 节点状态为 NotReady
**症状**: `kubectl get nodes` 显示节点 NotReady
**排查步骤**:
```bash
# 1. 检查节点详情
kubectl describe node <node-name>
# 2. 检查 kubelet 日志
# 在问题节点执行
sudo journalctl -u k3s-agent -n 100
# 3. 检查网络插件
kubectl get pods -n kube-system | grep flannel
```
**解决方案**:
```bash
# 重启 k3s 服务
sudo systemctl restart k3s-agent
# 如果是网络问题,检查 CNI 配置
sudo ls -la /etc/cni/net.d/
```
---
### 问题 3: Pod 无法调度到新节点
**症状**: Pod 一直 Pending 或只调度到旧节点
**排查步骤**:
```bash
# 1. 检查节点污点
kubectl describe node <node-name> | grep Taints
# 2. 检查节点标签
kubectl get nodes --show-labels
# 3. 检查 Pod 的调度约束
kubectl describe pod <pod-name>
```
**解决方案**:
```bash
# 移除污点
kubectl taint nodes <node-name> node.kubernetes.io/not-ready:NoSchedule-
# 添加标签
kubectl label nodes <node-name> node-role.kubernetes.io/worker=worker
```
---
### 问题 4: Longhorn 存储无法使用
**症状**: PVC 一直 Pending
**排查步骤**:
```bash
# 1. 检查 Longhorn 组件
kubectl get pods -n longhorn-system
# 2. 检查节点是否满足要求
kubectl get nodes -o jsonpath='{.items[*].status.conditions[?(@.type=="Ready")].status}'
# 3. 检查 iscsid 服务
sudo systemctl status iscsid
```
**解决方案**:
```bash
# 在新节点安装依赖
sudo apt install -y open-iscsi
sudo systemctl enable --now iscsid
# 重启 Longhorn manager
kubectl rollout restart deployment longhorn-driver-deployer -n longhorn-system
```
---
### 问题 5: etcd 集群不健康HA 模式)
**症状**: Master 节点无法正常工作
**排查步骤**:
```bash
# 1. 检查 etcd 成员
sudo k3s etcd-snapshot ls
# 2. 检查 etcd 日志
sudo journalctl -u k3s -n 100 | grep etcd
# 3. 检查 etcd 端口
sudo netstat -tlnp | grep 2379
```
**解决方案**:
```bash
# 从快照恢复(谨慎操作)
sudo k3s server \
--cluster-reset \
--cluster-reset-restore-path=/var/lib/rancher/k3s/server/db/snapshots/<snapshot-name>
```
---
## 📚 快速参考
### 常用命令
```bash
# 查看集群信息
kubectl cluster-info
kubectl get nodes -o wide
kubectl get pods -A
# 查看节点资源
kubectl top nodes
kubectl describe node <node-name>
# 管理节点
kubectl cordon <node-name> # 标记为不可调度
kubectl drain <node-name> # 驱逐 Pod
kubectl uncordon <node-name> # 恢复调度
# 删除节点
kubectl delete node <node-name>
# 在节点上卸载 k3s
# Worker 节点
sudo /usr/local/bin/k3s-agent-uninstall.sh
# Master 节点
sudo /usr/local/bin/k3s-uninstall.sh
```
### 节点标签示例
```bash
# 角色标签
kubectl label nodes <node> node-role.kubernetes.io/worker=worker
kubectl label nodes <node> node-role.kubernetes.io/master=master
# 功能标签
kubectl label nodes <node> workload=database
kubectl label nodes <node> workload=web
kubectl label nodes <node> workload=cache
# 区域标签
kubectl label nodes <node> topology.kubernetes.io/zone=zone-a
kubectl label nodes <node> topology.kubernetes.io/region=us-east
```
---
## 🎯 最佳实践
### 1. 节点命名规范
```
master-1, master-2, master-3
worker-1, worker-2, worker-3, ...
```
### 2. 逐步扩展
- 先加入 1 个节点测试
- 验证正常后再批量加入
- 避免同时加入多个节点
### 3. 监控和告警
```bash
# 部署 Prometheus + Grafana
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/setup/
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/kube-prometheus/main/manifests/
```
### 4. 定期备份
```bash
# 备份 etcd
sudo k3s etcd-snapshot save --name backup-$(date +%Y%m%d-%H%M%S)
# 查看备份
sudo k3s etcd-snapshot ls
```
### 5. 资源预留
```bash
# 为系统组件预留资源
kubectl apply -f - <<EOF
apiVersion: v1
kind: ResourceQuota
metadata:
name: system-quota
namespace: kube-system
spec:
hard:
requests.cpu: "2"
requests.memory: 4Gi
EOF
```
---
## 📞 获取帮助
- k3s 官方文档: https://docs.k3s.io
- Longhorn 文档: https://longhorn.io/docs
- Kubernetes 文档: https://kubernetes.io/docs
---
**文档版本**: v1.0
**最后更新**: 2026-01-21
**适用于**: k3s v1.34.3+k3s1

View File

@@ -0,0 +1,161 @@
# K3s 集群扩展快速参考
## 🚀 快速开始
### 当前集群信息
```
Master IP: 134.195.210.237
Token: K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d
```
### 一键加入脚本
#### Worker 节点(最简单)
```bash
# 在新节点上执行
sudo bash scripts/join-worker.sh
```
#### Master 节点HA 模式)
```bash
# 在新节点上执行
sudo bash scripts/join-master.sh
```
---
## 📊 扩展方案对比
| 方案 | 节点配置 | 适用场景 | 高可用 | 成本 |
|------|---------|---------|--------|------|
| **2节点** | 1M + 2W | 开发/测试 | ❌ | 💰 |
| **4节点** | 3M + 4W | 生产环境 | ✅ | 💰💰💰 |
| **6节点** | 3M + 6W | 大规模生产 | ✅ | 💰💰💰💰 |
M = Master, W = Worker
---
## 🔧 手动加入命令
### Worker 节点
```bash
export MASTER_IP="134.195.210.237"
export NODE_TOKEN="K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d"
curl -sfL https://get.k3s.io | K3S_URL=https://${MASTER_IP}:6443 \
K3S_TOKEN=${NODE_TOKEN} sh -
```
### Master 节点(需要先配置负载均衡器)
```bash
export FIRST_MASTER="134.195.210.237"
export LB_IP="<负载均衡器IP>"
export NODE_TOKEN="K109d35a131f48b4d40b162398a828b766d60735f29dd7b4a37b030c1d1c0e26b23::server:72e04c3a9e3e762cbdefffc96f348a2d"
curl -sfL https://get.k3s.io | sh -s - server \
--server https://${FIRST_MASTER}:6443 \
--token ${NODE_TOKEN} \
--tls-san=${LB_IP} \
--write-kubeconfig-mode 644
```
---
## ✅ 验证命令
```bash
# 查看节点
kubectl get nodes -o wide
# 健康检查
bash scripts/check-node-health.sh
# 查看节点详情
kubectl describe node <node-name>
# 查看资源使用
kubectl top nodes
```
---
## 🏷️ 节点标签
```bash
# Worker 节点
kubectl label nodes <node> node-role.kubernetes.io/worker=worker
# 功能标签
kubectl label nodes <node> workload=web
kubectl label nodes <node> workload=database
kubectl label nodes <node> workload=cache
```
---
## 🔥 常见问题
### 节点无法加入?
```bash
# 检查网络
ping 134.195.210.237
telnet 134.195.210.237 6443
# 查看日志
sudo journalctl -u k3s-agent -f
```
### 节点 NotReady
```bash
# 重启服务
sudo systemctl restart k3s-agent
# 检查详情
kubectl describe node <node-name>
```
### 删除节点
```bash
# 在 Master 节点
kubectl drain <node-name> --ignore-daemonsets
kubectl delete node <node-name>
# 在要删除的节点
sudo /usr/local/bin/k3s-agent-uninstall.sh
```
---
## 📚 详细文档
- 完整扩展指南: [K3S-CLUSTER-EXPANSION.md](K3S-CLUSTER-EXPANSION.md)
- GitOps 自动部署: [README.md](README.md)
---
## 🎯 推荐流程
### 2节点扩展开发环境
1. ✅ 准备 2 台新 VPS
2. ✅ 执行 `join-worker.sh`
3. ✅ 验证节点状态
4. ✅ 测试应用部署
### 4节点扩展生产环境
1. ✅ 配置外部负载均衡器
2. ✅ 转换第一个节点为 HA 模式
3. ✅ 加入 2 个 Master 节点
4. ✅ 加入 4 个 Worker 节点
5. ✅ 配置 Longhorn 多副本
6. ✅ 测试高可用
### 6节点扩展大规模
1. ✅ 同 4 节点方案
2. ✅ 按功能分层部署
3. ✅ 配置节点亲和性
4. ✅ 优化资源分配
---
**快速帮助**: 遇到问题?运行 `bash scripts/check-node-health.sh` 进行诊断

39
006-其他/code-server.md Normal file
View File

@@ -0,0 +1,39 @@
我已经准备好了服务文件。现在请执行以下命令来完成设置:
# 1. 将服务文件复制到系统目录(需要输入密码)
sudo cp /tmp/code-server.service /etc/systemd/system/
# 2. 重载 systemd 配置
sudo systemctl daemon-reload
# 3. 启用开机自启动
sudo systemctl enable code-server
# 4. 立即启动服务
sudo systemctl start code-server
# 5. 查看服务状态
sudo systemctl status code-server
或者你可以一次性执行:
sudo cp /tmp/code-server.service /etc/systemd/system/ && \
sudo systemctl daemon-reload && \
sudo systemctl enable --now code-server && \
sudo systemctl status code-server
服务配置说明:
- 监听地址0.0.0.0:8080所有网络接口
- 工作目录:/home/fei
- 运行用户fei
- 自动重启:服务崩溃后 10 秒自动重启
常用管理命令:
- 查看状态sudo systemctl status code-server
- 停止服务sudo systemctl stop code-server
- 重启服务sudo systemctl restart code-server
- 查看日志sudo journalctl -u code-server -f
- 禁用开机启动sudo systemctl disable code-server
如果你需要修改配置(如端口号或其他参数),可以编辑 /etc/systemd/system/code-server.service 文件,然后执行 sudo systemctl daemon-reload 和 sudo systemctl restart code-server

View File

@@ -0,0 +1,429 @@
# PostgreSQL 16 K3s 部署指南
本目录包含在 K3s 集群中部署 PostgreSQL 16 数据库的完整配置文件。
## 📋 目录结构
```
001-pg16/
├── README.md # 本文件 - 部署说明
└── k8s/ # K8s 配置文件目录
├── namespace.yaml # infrastructure 命名空间
├── secret.yaml # 数据库密码
├── configmap.yaml # 初始化脚本
├── pvc.yaml # 持久化存储卷声明
├── deployment.yaml # PostgreSQL 部署配置
├── service.yaml # 服务配置
└── README.md # K8s 配置详细说明
```
## 🚀 快速部署
### 前置条件
1. **已安装 K3s**
```bash
# 检查 K3s 是否运行
sudo systemctl status k3s
# 检查节点状态
sudo kubectl get nodes
```
2. **配置 kubectl 权限**(可选,避免每次使用 sudo
```bash
# 方法1复制配置到用户目录推荐
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $USER:$USER ~/.kube/config
chmod 600 ~/.kube/config
# 验证配置
kubectl get nodes
```
### 一键部署
```bash
# 进入配置目录
cd /path/to/001-pg16/k8s
# 部署所有资源
kubectl apply -f .
# 或者使用 sudo如果未配置 kubectl 权限)
sudo kubectl apply -f .
```
### 查看部署状态
```bash
# 查看 Pod 状态
kubectl get pods -n infrastructure
# 查看 Pod 详细信息
kubectl describe pod -n infrastructure -l app=pg16
# 查看初始化日志(实时)
kubectl logs -n infrastructure -l app=pg16 -f
# 查看服务状态
kubectl get svc -n infrastructure
# 查看 PVC 状态
kubectl get pvc -n infrastructure
```
## ✅ 验证部署
### 1. 检查 Pod 是否运行
```bash
kubectl get pods -n infrastructure
```
期望输出:
```
NAME READY STATUS RESTARTS AGE
pg16-xxxxxxxxx-xxxxx 1/1 Running 0 2m
```
### 2. 验证数据库创建
```bash
# 统计数据库总数(应该是 303 个)
kubectl exec -n infrastructure -l app=pg16 -- psql -U postgres -c "SELECT count(*) FROM pg_database;"
# 查看前 10 个数据库
kubectl exec -n infrastructure -l app=pg16 -- psql -U postgres -c "SELECT datname FROM pg_database WHERE datname LIKE 'pg0%' ORDER BY datname LIMIT 10;"
# 查看最后 10 个数据库
kubectl exec -n infrastructure -l app=pg16 -- psql -U postgres -c "SELECT datname FROM pg_database WHERE datname LIKE 'pg2%' ORDER BY datname DESC LIMIT 10;"
```
期望结果:
- 总数据库数303 个300 个业务数据库 + postgres + template0 + template1
- 数据库命名pg001, pg002, ..., pg300
- 数据库所有者fei
### 3. 测试数据库连接
```bash
# 方法1直接在 Pod 内执行 SQL
kubectl exec -n infrastructure -l app=pg16 -- psql -U fei -d pg001 -c "SELECT current_database(), version();"
# 方法2进入 Pod 交互式操作
kubectl exec -it -n infrastructure -l app=pg16 -- bash
# 在 Pod 内执行
psql -U fei -d pg001
# 退出
\q
exit
```
## 🔌 连接数据库
### 集群内部连接
从集群内其他 Pod 连接:
```
主机: pg16.infrastructure.svc.cluster.local
端口: 5432
用户: fei
密码: feiks..
数据库: pg001 ~ pg300
```
连接字符串示例:
```
postgresql://fei:feiks..@pg16.infrastructure.svc.cluster.local:5432/pg001
```
### 集群外部连接
#### 方法1使用 NodePort推荐
```bash
# 获取节点 IP
kubectl get nodes -o wide
# 使用 NodePort 连接
psql -h <节点IP> -U fei -d pg001 -p 30432
```
连接信息:
- 主机:节点 IP 地址
- 端口30432
- 用户fei
- 密码feiks..
#### 方法2使用 Port Forward
```bash
# 转发端口到本地
kubectl port-forward -n infrastructure svc/pg16 5432:5432
# 在另一个终端连接
psql -h localhost -U fei -d pg001 -p 5432
```
## 📊 数据库信息
### 默认配置
- **PostgreSQL 版本**: 16
- **命名空间**: infrastructure
- **数据库数量**: 300 个pg001 ~ pg300
- **超级用户**: fei密码feiks..
- **系统用户**: postgres密码adminks..
- **持久化存储**: 20Gi使用 K3s 默认 local-path StorageClass
### 资源配置
- **CPU 请求**: 500m
- **CPU 限制**: 2000m
- **内存请求**: 512Mi
- **内存限制**: 2Gi
### 服务端口
- **ClusterIP 服务**: pg16端口 5432
- **NodePort 服务**: pg16-nodeport端口 30432
## 🔧 常用操作
### 查看日志
```bash
# 查看最近 50 行日志
kubectl logs -n infrastructure -l app=pg16 --tail=50
# 实时查看日志
kubectl logs -n infrastructure -l app=pg16 -f
# 查看上一次容器的日志(如果 Pod 重启过)
kubectl logs -n infrastructure -l app=pg16 --previous
```
### 进入容器
```bash
# 进入 PostgreSQL 容器
kubectl exec -it -n infrastructure -l app=pg16 -- bash
# 直接进入 psql
kubectl exec -it -n infrastructure -l app=pg16 -- psql -U postgres
```
### 重启 Pod
```bash
# 删除 PodDeployment 会自动重建)
kubectl delete pod -n infrastructure -l app=pg16
# 或者重启 Deployment
kubectl rollout restart deployment pg16 -n infrastructure
```
### 扩缩容(不推荐用于数据库)
```bash
# 查看当前副本数
kubectl get deployment pg16 -n infrastructure
# 注意PostgreSQL 不支持多副本,保持 replicas=1
```
## 🗑️ 卸载
### 删除部署(保留数据)
```bash
# 删除 Deployment 和 Service
kubectl delete deployment pg16 -n infrastructure
kubectl delete svc pg16 pg16-nodeport -n infrastructure
# PVC 和数据会保留
```
### 完全卸载(包括数据)
```bash
# 删除所有资源
kubectl delete -f k8s/
# 或者逐个删除
kubectl delete deployment pg16 -n infrastructure
kubectl delete svc pg16 pg16-nodeport -n infrastructure
kubectl delete pvc pg16-data -n infrastructure
kubectl delete configmap pg16-init-script -n infrastructure
kubectl delete secret pg16-secret -n infrastructure
kubectl delete namespace infrastructure
```
**⚠️ 警告**: 删除 PVC 会永久删除所有数据库数据,无法恢复!
## 🔐 安全建议
### 修改默认密码
部署后建议立即修改默认密码:
```bash
# 进入 Pod
kubectl exec -it -n infrastructure -l app=pg16 -- psql -U postgres
# 修改 fei 用户密码
ALTER USER fei WITH PASSWORD '新密码';
# 修改 postgres 用户密码
ALTER USER postgres WITH PASSWORD '新密码';
# 退出
\q
```
然后更新 Secret
```bash
# 编辑 secret.yaml修改密码需要 base64 编码)
echo -n "新密码" | base64
# 更新 Secret
kubectl apply -f k8s/secret.yaml
```
### 网络安全
- 默认配置使用 NodePort 30432 暴露服务
- 生产环境建议:
- 使用防火墙限制访问 IP
- 或者删除 NodePort 服务,仅使用集群内部访问
- 配置 NetworkPolicy 限制访问
```bash
# 删除 NodePort 服务(仅保留集群内访问)
kubectl delete svc pg16-nodeport -n infrastructure
```
## 🐛 故障排查
### Pod 无法启动
```bash
# 查看 Pod 状态
kubectl describe pod -n infrastructure -l app=pg16
# 查看事件
kubectl get events -n infrastructure --sort-by='.lastTimestamp'
# 查看日志
kubectl logs -n infrastructure -l app=pg16
```
常见问题:
- **ImagePullBackOff**: 无法拉取镜像,检查网络连接
- **CrashLoopBackOff**: 容器启动失败,查看日志
- **Pending**: PVC 无法绑定,检查存储类
### PVC 无法绑定
```bash
# 查看 PVC 状态
kubectl describe pvc pg16-data -n infrastructure
# 查看 StorageClass
kubectl get storageclass
# 检查 local-path-provisioner
kubectl get pods -n kube-system | grep local-path
```
### 数据库连接失败
```bash
# 检查服务是否正常
kubectl get svc -n infrastructure
# 检查 Pod 是否就绪
kubectl get pods -n infrastructure
# 测试集群内连接
kubectl run -it --rm debug --image=postgres:16 --restart=Never -- psql -h pg16.infrastructure.svc.cluster.local -U fei -d pg001
```
### 初始化脚本未执行
如果发现数据库未创建 300 个数据库:
```bash
# 查看初始化日志
kubectl logs -n infrastructure -l app=pg16 | grep -i "init\|create database"
# 检查 ConfigMap 是否正确挂载
kubectl exec -n infrastructure -l app=pg16 -- ls -la /docker-entrypoint-initdb.d/
# 查看脚本内容
kubectl exec -n infrastructure -l app=pg16 -- cat /docker-entrypoint-initdb.d/01-init.sh
```
**注意**: PostgreSQL 初始化脚本只在首次启动且数据目录为空时执行。如果需要重新初始化:
```bash
# 删除 Deployment 和 PVC
kubectl delete deployment pg16 -n infrastructure
kubectl delete pvc pg16-data -n infrastructure
# 重新部署
kubectl apply -f k8s/
```
## 📝 备份与恢复
### 备份单个数据库
```bash
# 备份 pg001 数据库
kubectl exec -n infrastructure -l app=pg16 -- pg_dump -U fei pg001 > pg001_backup.sql
# 备份所有数据库
kubectl exec -n infrastructure -l app=pg16 -- pg_dumpall -U postgres > all_databases_backup.sql
```
### 恢复数据库
```bash
# 恢复单个数据库
cat pg001_backup.sql | kubectl exec -i -n infrastructure -l app=pg16 -- psql -U fei pg001
# 恢复所有数据库
cat all_databases_backup.sql | kubectl exec -i -n infrastructure -l app=pg16 -- psql -U postgres
```
### 数据持久化
数据存储在 K3s 的 local-path 存储中,默认路径:
```
/var/lib/rancher/k3s/storage/pvc-<uuid>_infrastructure_pg16-data/
```
## 📚 更多信息
- PostgreSQL 官方文档: https://www.postgresql.org/docs/16/
- K3s 官方文档: https://docs.k3s.io/
- Kubernetes 官方文档: https://kubernetes.io/docs/
## 🆘 获取帮助
如有问题,请检查:
1. Pod 日志: `kubectl logs -n infrastructure -l app=pg16`
2. Pod 状态: `kubectl describe pod -n infrastructure -l app=pg16`
3. 事件记录: `kubectl get events -n infrastructure`
---
**版本信息**
- PostgreSQL: 16
- 创建日期: 2026-01-29
- 最后更新: 2026-01-29

View File

@@ -0,0 +1,112 @@
# PostgreSQL 16 K3s 部署配置
## 文件说明
- `namespace.yaml` - 创建 infrastructure 命名空间
- `secret.yaml` - 存储 PostgreSQL 密码等敏感信息
- `configmap.yaml` - 存储初始化脚本(创建用户和 300 个数据库)
- `pvc.yaml` - 持久化存储声明20Gi
- `deployment.yaml` - PostgreSQL 16 部署配置
- `service.yaml` - 服务暴露ClusterIP + NodePort
## 部署步骤
### 1. 部署所有资源
```bash
kubectl apply -f namespace.yaml
kubectl apply -f secret.yaml
kubectl apply -f configmap.yaml
kubectl apply -f pvc.yaml
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
```
或者一次性部署:
```bash
kubectl apply -f .
```
### 2. 查看部署状态
```bash
# 查看 Pod 状态
kubectl get pods -n infrastructure
# 查看 Pod 日志
kubectl logs -n infrastructure -l app=pg16 -f
# 查看服务
kubectl get svc -n infrastructure
```
### 3. 访问数据库
**集群内访问:**
```bash
# 使用 ClusterIP 服务
psql -h pg16.infrastructure.svc.cluster.local -U postgres -p 5432
```
**集群外访问:**
```bash
# 使用 NodePort端口 30432
psql -h <节点IP> -U postgres -p 30432
```
**使用 kubectl port-forward**
```bash
kubectl port-forward -n infrastructure svc/pg16 5432:5432
psql -h localhost -U postgres -p 5432
```
## 配置说明
### 存储
- 使用 k3s 默认的 `local-path` StorageClass
- 默认申请 20Gi 存储空间
- 数据存储在 `/var/lib/postgresql/data/pgdata`
### 资源限制
- 请求512Mi 内存0.5 核 CPU
- 限制2Gi 内存2 核 CPU
### 初始化
- 自动创建超级用户 `fei`
- 自动创建 300 个数据库pg001 到 pg300
### 服务暴露
- **ClusterIP 服务**:集群内部访问,服务名 `pg16`
- **NodePort 服务**:集群外部访问,端口 `30432`
## 数据迁移
### 从现有 Docker 数据迁移
如果你有现有的 pgdata 数据,可以:
1. 先部署不带数据的 PostgreSQL
2. 停止 Pod
3. 将数据复制到 PVC 对应的主机路径
4. 重启 Pod
```bash
# 查找 PVC 对应的主机路径
kubectl get pv
# 停止 Pod
kubectl scale deployment pg16 -n infrastructure --replicas=0
# 复制数据到主机路径(通常在 /var/lib/rancher/k3s/storage/
# 然后重启
kubectl scale deployment pg16 -n infrastructure --replicas=1
```
## 卸载
```bash
kubectl delete -f .
```
注意:删除 PVC 会删除所有数据,请谨慎操作。

View File

@@ -0,0 +1,19 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: pg16-init-script
namespace: infrastructure
data:
01-init.sh: |
#!/bin/bash
set -e
# 创建超级用户 fei
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
CREATE USER fei WITH SUPERUSER PASSWORD 'feiks..';
EOSQL
# 创建 300 个数据库
for i in $(seq -w 1 300); do
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" -c "CREATE DATABASE pg${i} OWNER fei;"
done

View File

@@ -0,0 +1,76 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: pg16
namespace: infrastructure
labels:
app: pg16
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: pg16
template:
metadata:
labels:
app: pg16
spec:
containers:
- name: postgres
image: postgres:16
ports:
- containerPort: 5432
name: postgres
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: pg16-secret
key: POSTGRES_USER
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: pg16-secret
key: POSTGRES_PASSWORD
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
- name: init-scripts
mountPath: /docker-entrypoint-initdb.d
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
livenessProbe:
exec:
command:
- pg_isready
- -U
- postgres
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
readinessProbe:
exec:
command:
- pg_isready
- -U
- postgres
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
volumes:
- name: postgres-data
persistentVolumeClaim:
claimName: pg16-data
- name: init-scripts
configMap:
name: pg16-init-script
defaultMode: 0755

View File

@@ -0,0 +1,4 @@
apiVersion: v1
kind: Namespace
metadata:
name: infrastructure

View File

@@ -0,0 +1,12 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pg16-data
namespace: infrastructure
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: local-path

View File

@@ -0,0 +1,10 @@
apiVersion: v1
kind: Secret
metadata:
name: pg16-secret
namespace: infrastructure
type: Opaque
stringData:
POSTGRES_PASSWORD: "adminks.."
POSTGRES_USER: "postgres"
FEI_PASSWORD: "feiks.."

View File

@@ -0,0 +1,34 @@
apiVersion: v1
kind: Service
metadata:
name: pg16
namespace: infrastructure
labels:
app: pg16
spec:
type: ClusterIP
ports:
- port: 5432
targetPort: 5432
protocol: TCP
name: postgres
selector:
app: pg16
---
apiVersion: v1
kind: Service
metadata:
name: pg16-nodeport
namespace: infrastructure
labels:
app: pg16
spec:
type: NodePort
ports:
- port: 5432
targetPort: 5432
nodePort: 30432
protocol: TCP
name: postgres
selector:
app: pg16

View File

@@ -0,0 +1,131 @@
# MinIO S3 对象存储部署
## 功能特性
- ✅ MinIO 对象存储服务
- ✅ 自动 SSL 证书(通过 Caddy
- ✅ 自动设置新存储桶为公开只读权限
- ✅ Web 管理控制台
- ✅ S3 兼容 API
## 部署前准备
### 1. 修改配置
编辑 `minio.yaml`,替换以下内容:
**域名配置3 处):**
- `s3.u6.net3w.com` → 你的 S3 API 域名
- `console.s3.u6.net3w.com` → 你的控制台域名
**凭证配置4 处):**
- `MINIO_ROOT_USER: "admin"` → 你的管理员账号
- `MINIO_ROOT_PASSWORD: "adminks.."` → 你的管理员密码(建议至少 8 位)
**架构配置1 处):**
- `linux-arm64` → 根据你的 CPU 架构选择:
- ARM64: `linux-arm64`
- x86_64: `linux-amd64`
### 2. 配置 DNS
将域名解析到你的服务器 IP
```
s3.yourdomain.com A your-server-ip
console.s3.yourdomain.com A your-server-ip
```
### 3. 配置 Caddy
在 Caddy 配置中添加(如果使用 Caddy 做 SSL
```
s3.yourdomain.com {
reverse_proxy traefik.kube-system.svc.cluster.local:80
}
console.s3.yourdomain.com {
reverse_proxy traefik.kube-system.svc.cluster.local:80
}
```
## 部署步骤
```bash
# 1. 部署 MinIO
kubectl apply -f minio.yaml
# 2. 检查部署状态
kubectl get pods -n minio
# 3. 查看日志
kubectl logs -n minio -l app=minio -c minio
kubectl logs -n minio -l app=minio -c policy-manager
```
## 访问服务
- **Web 控制台**: https://console.s3.yourdomain.com
- **S3 API 端点**: https://s3.yourdomain.com
- **登录凭证**: 使用你配置的 MINIO_ROOT_USER 和 MINIO_ROOT_PASSWORD
## 自动权限策略
新创建的存储桶会在 30 秒内自动设置为 **公开只读download** 权限:
- ✅ 任何人可以下载文件(无需认证)
- ✅ 上传/删除需要认证
如需保持某个桶为私有,在控制台手动改回 PRIVATE 即可。
## 存储配置
默认使用 50Gi 存储空间,修改方法:
编辑 `minio.yaml` 中的 PersistentVolumeClaim
```yaml
resources:
requests:
storage: 50Gi # 修改为你需要的大小
```
## 故障排查
### Pod 无法启动
```bash
kubectl describe pod -n minio <pod-name>
```
### 查看详细日志
```bash
# MinIO 主容器
kubectl logs -n minio <pod-name> -c minio
# 策略管理器
kubectl logs -n minio <pod-name> -c policy-manager
```
### 检查 Ingress
```bash
kubectl get ingress -n minio
```
## 架构说明
```
用户 HTTPS 请求
Caddy (SSL 终止)
↓ HTTP
Traefik (路由)
MinIO Service
├─ MinIO 容器 (9000: API, 9001: Console)
└─ Policy Manager 容器 (自动设置桶权限)
```
## 卸载
```bash
kubectl delete -f minio.yaml
```
注意:这会删除所有数据,请先备份重要文件。

View File

@@ -0,0 +1,169 @@
apiVersion: v1
kind: Namespace
metadata:
name: minio
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: minio-data
namespace: minio
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: local-path
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: minio
namespace: minio
spec:
replicas: 1
selector:
matchLabels:
app: minio
template:
metadata:
labels:
app: minio
spec:
containers:
- name: minio
image: minio/minio:latest
command:
- /bin/sh
- -c
- minio server /data --console-address ":9001"
ports:
- containerPort: 9000
name: api
- containerPort: 9001
name: console
env:
- name: MINIO_ROOT_USER
value: "admin"
- name: MINIO_ROOT_PASSWORD
value: "adminks.."
- name: MINIO_SERVER_URL
value: "https://s3.u6.net3w.com"
- name: MINIO_BROWSER_REDIRECT_URL
value: "https://console.s3.u6.net3w.com"
volumeMounts:
- name: data
mountPath: /data
livenessProbe:
httpGet:
path: /minio/health/live
port: 9000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /minio/health/ready
port: 9000
initialDelaySeconds: 10
periodSeconds: 5
- name: policy-manager
image: alpine:latest
command:
- /bin/sh
- -c
- |
# 安装 MinIO Client
wget https://dl.min.io/client/mc/release/linux-arm64/mc -O /usr/local/bin/mc
chmod +x /usr/local/bin/mc
# 等待 MinIO 启动
sleep 10
# 配置 mc 客户端
mc alias set myminio http://localhost:9000 ${MINIO_ROOT_USER} ${MINIO_ROOT_PASSWORD}
echo "Policy manager started. Monitoring buckets..."
# 持续监控并设置新桶的策略
while true; do
# 获取所有存储桶
mc ls myminio 2>/dev/null | awk '{print $NF}' | sed 's/\///' | while read -r BUCKET; do
if [ -n "$BUCKET" ]; then
# 检查当前策略
POLICY_OUTPUT=$(mc anonymous get myminio/${BUCKET} 2>&1)
# 如果是私有的(包含 "Access permission for" 且不包含 "download"
if echo "$POLICY_OUTPUT" | grep -q "Access permission for" && ! echo "$POLICY_OUTPUT" | grep -q "download"; then
echo "Setting download policy for bucket: ${BUCKET}"
mc anonymous set download myminio/${BUCKET}
fi
fi
done
sleep 30
done
env:
- name: MINIO_ROOT_USER
value: "admin"
- name: MINIO_ROOT_PASSWORD
value: "adminks.."
volumes:
- name: data
persistentVolumeClaim:
claimName: minio-data
---
apiVersion: v1
kind: Service
metadata:
name: minio
namespace: minio
spec:
type: ClusterIP
ports:
- port: 9000
targetPort: 9000
name: api
- port: 9001
targetPort: 9001
name: console
selector:
app: minio
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: minio-api
namespace: minio
spec:
ingressClassName: traefik
rules:
- host: s3.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: minio
port:
number: 9000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: minio-console
namespace: minio
spec:
ingressClassName: traefik
rules:
- host: console.s3.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: minio
port:
number: 9001

View File

@@ -0,0 +1,65 @@
1. MinIO 配置已更新
- MINIO_SERVER_URL: https://s3.u6.net3w.com
- MINIO_BROWSER_REDIRECT_URL: https://console.s3.u6.net3w.com
2. MinIO 日志确认
API: https://s3.u6.net3w.com
WebUI: https://console.s3.u6.net3w.com
3. 访问测试通过
- https://s3.u6.net3w.com - 正确重定向到 Console不再显示端口号
- https://console.s3.u6.net3w.com - 直接访问 Console
访问方式:
- Web 控制台: https://console.s3.u6.net3w.com
- S3 API 端点: https://s3.u6.net3w.com
- 登录凭证:
- 用户名: admin
- 密码: adminks..
配置文件位置:
- MinIO: /home/fei/k3s/009-基础设施/002-s3/minio.yaml
- Caddy: /home/fei/k3s/caddy-ssl-deployment.yaml
所有服务已正常运行,可以开始使用 MinIO 对象存储了!
不如固定桶名?
在对象存储的使用习惯中“频繁创建桶”其实是一种反模式Anti-pattern
建议方案:创建一个名为 public-assets 的固定桶,并将其设为 download 权限。
管理方式:不同的业务或用户,通过 文件夹Prefix 来区分,例如 public-assets/user1/public-assets/user2/。
结果:你只需要在部署 Pod 时设置一次权限,之后所有上传到该桶下的文件自动就能下载。
配置完成: ─
从日志可以看到:
Policy manager started. Monitoring buckets...
Setting download policy for bucket: test
Access permission for `myminio/test` is set to `download`
功能说明:
1. 自动检测 - 每 30 秒扫描一次所有存储桶
2. 自动设置 - 发现私有PRIVATE存储桶时自动设置为 download公开只读
3. 已处理 - 你的 test 存储桶已自动设置为公开只读权限
权限效果:
- ✅ 任何人都可以下载文件(无需认证)
- ✅ 上传/删除需要认证Access Key + Secret Key
测试方法:
# 现在可以直接访问 test 桶中的文件(无需认证)
curl https://s3.u6.net3w.com/test/your-file.txt
后续使用:
- 新创建的任何存储桶都会在 30 秒内自动设置为公开只读
- 如果需要保持某个桶为私有,可以在 Web 控制台手动改回 PRIVATE
- Policy manager 只会处理完全私有的桶,不会覆盖已设置的策略
配置已生效,可以继续使用了!

View File

@@ -0,0 +1,25 @@
#!/bin/sh
# 自动为新创建的存储桶设置 download公开只读策略
# 配置 mc 客户端
mc alias set myminio http://localhost:9000 ${MINIO_ROOT_USER} ${MINIO_ROOT_PASSWORD}
# 持续监控并设置新桶的策略
while true; do
# 获取所有存储桶
BUCKETS=$(mc ls myminio 2>/dev/null | awk '{print $NF}' | sed 's/\///')
for BUCKET in $BUCKETS; do
# 检查当前策略
CURRENT_POLICY=$(mc anonymous get myminio/${BUCKET} 2>/dev/null | grep -o "download\|upload\|public" || echo "none")
# 如果策略为 none私有则设置为 download
if [ "$CURRENT_POLICY" = "none" ]; then
echo "Setting download policy for bucket: ${BUCKET}"
mc anonymous set download myminio/${BUCKET}
fi
done
# 每 30 秒检查一次
sleep 30
done

View File

@@ -0,0 +1,4 @@
# 写入以下内容
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# 记录 K3s 环境变量
echo "export KUBECONFIG=/etc/rancher/k3s/k3s.yaml" >> ~/.bashrc

View File

View File

@@ -0,0 +1,8 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: longhorn-backup-config
namespace: longhorn-system
data:
backup-target: "s3://longhorn-backup@us-east-1/"
backup-target-credential-secret: "longhorn-crypto"

View File

@@ -0,0 +1,10 @@
# 1. 创建命名空间
kubectl create namespace longhorn-system
# 2. 应用 S3 密钥
kubectl apply -f s3-secret.yaml
# 3. 使用本地 values.yaml 安装远程仓库的应用
helm install longhorn longhorn/longhorn \
--namespace longhorn-system \
-f values.yaml

View File

@@ -0,0 +1,130 @@
Installation Summary:
✅ Namespace created: longhorn-system
✅ S3 Secret created: longhorn-crypto with MinIO credentials
✅ Helm deployment: Longhorn v1 deployed successfully
✅ Components starting:
- longhorn-manager: Running (2/2)
- longhorn-ui: Running (2 replicas)
- longhorn-driver-deployer: Initializing
S3 Configuration Verified:
- AWS_ACCESS_KEY_ID: admin ✓
- AWS_SECRET_ACCESS_KEY: adminks.. ✓
- AWS_ENDPOINTS: https://s3.u6.net3w.com ✓
Next Steps:
The Longhorn components are still initializing. It will take a few minutes for all components (CSI drivers, Engine Images, Instance Managers) to be fully ready.
You can monitor the progress with:
kubectl get pods -n longhorn-system -w
这是一个非常棒的习惯!在生产实践中,这被称为 **IaC (基础设施即代码)**。即使我们使用 Helm 远程安装,也应该在本地保留一份“配置清单”,以便日后维护、迁移或重建集群。
为了满足你的需求,建议在 `/home/fei/k3s/009-基础设施/` 下创建两个标准化的文件夹。
---
## 1. 推荐的目录结构
你可以执行以下命令创建目录:
```bash
mkdir -p /home/fei/k3s/009-基础设施/003-helm
mkdir -p /home/fei/k3s/009-基础设施/004-longhorn
```
结构如下:
* **003-helm**: 存放 Helm 的安装脚本、环境变量配置。
* **004-longhorn**: 存放 Longhorn 的自定义配置文件(`values.yaml`)和 S3 密钥配置。
---
## 2. 文件夹内应该放什么内容?
### 文件夹003-helm
由于 Helm 是一个工具,这个文件夹主要存放**安装记录**和**环境配置**。
* **创建文件:`install_helm.sh**` (记录你是怎么装的)
```bash
# 写入以下内容
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# 记录 K3s 环境变量
echo "export KUBECONFIG=/etc/rancher/k3s/k3s.yaml" >> ~/.bashrc
```
* **创建文件:`readme.md**`: 写明当前 Helm 的版本和用途。
---
### 文件夹004-longhorn
这是核心。虽然我们用远程仓库安装,但我们要把**配置参数**固化在本地。
**第一步:导出远程默认配置**
在文件夹下执行:
```bash
helm repo add longhorn https://charts.longhorn.io
helm repo update
helm show values longhorn/longhorn > values.yaml
```
现在,你拥有了一个几百行的 `values.yaml`,这就是 Longhorn 的“灵魂”。
**第二步:创建 S3 密钥文件 `s3-secret.yaml**`
因为你要对接 S3你需要把秘钥写成 K8s 的 Secret
```yaml
apiVersion: v1
kind: Secret
metadata:
name: longhorn-crypto
namespace: longhorn-system
type: Opaque
stringData:
AWS_ACCESS_KEY_ID: "你的ID"
AWS_SECRET_ACCESS_KEY: "你的密钥"
AWS_ENDPOINTS: "你的S3连接地址"
```
**第三步:创建一键安装脚本 `deploy.sh**`
```bash
# 1. 创建命名空间
kubectl create namespace longhorn-system
# 2. 应用 S3 密钥
kubectl apply -f s3-secret.yaml
# 3. 使用本地 values.yaml 安装远程仓库的应用
helm install longhorn longhorn/longhorn \
--namespace longhorn-system \
-f values.yaml
```
---
## 3. 这样做的好处
1. **版本可控**:以后你想改 S3 地址或副本数,只需要改 `/004-longhorn/values.yaml`,然后执行 `helm upgrade`。
2. **一目了然**:当你打开 `009-基础设施` 文件夹,你能清楚看到你的 S3、Helm 和 Longhorn 是如何相互关联的。
3. **快速重建**:如果哪天服务器系统重装了,你只需要进入这些文件夹,按顺序执行 `deploy.sh`,几分钟内整个基础设施就能起死回生。
---
## 4. 特别提示Redis 放在哪?
按照你的逻辑,你应该在 `009-基础设施` 之外(或者创建一个 `010-中间件` 文件夹)专门放 Redis 的 YAML。因为 Redis 是属于**应用层**,而 Longhorn 属于**存储层**。
**接下来,需要我帮你把上述的 `values.yaml` 中关于 S3 备份的关键配置项找出来,并告诉你如何修改吗?**

View File

@@ -0,0 +1,10 @@
apiVersion: v1
kind: Secret
metadata:
name: longhorn-crypto
namespace: longhorn-system
type: Opaque
stringData:
AWS_ACCESS_KEY_ID: "admin"
AWS_SECRET_ACCESS_KEY: "adminks.."
AWS_ENDPOINTS: "https://s3.u6.net3w.com"

View File

@@ -0,0 +1,660 @@
# Default values for longhorn.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
global:
# -- Global override for container image registry.
imageRegistry: "docker.io"
# -- Global override for image pull secrets for container registry.
imagePullSecrets: []
# -- Set container timezone (TZ env) for all Longhorn workloads. Leave empty to use container default.
timezone: ""
# -- Toleration for nodes allowed to run user-deployed components such as Longhorn Manager, Longhorn UI, and Longhorn Driver Deployer.
tolerations: []
# -- Node selector for nodes allowed to run user-deployed components such as Longhorn Manager, Longhorn UI, and Longhorn Driver Deployer.
nodeSelector: {}
cattle:
# -- Default system registry.
systemDefaultRegistry: ""
windowsCluster:
# -- Setting that allows Longhorn to run on a Rancher Windows cluster.
enabled: false
# -- Toleration for Linux nodes that can run user-deployed Longhorn components.
tolerations:
- key: "cattle.io/os"
value: "linux"
effect: "NoSchedule"
operator: "Equal"
# -- Node selector for Linux nodes that can run user-deployed Longhorn components.
nodeSelector:
kubernetes.io/os: "linux"
defaultSetting:
# -- Toleration for system-managed Longhorn components.
taintToleration: cattle.io/os=linux:NoSchedule
# -- Node selector for system-managed Longhorn components.
systemManagedComponentsNodeSelector: kubernetes.io/os:linux
networkPolicies:
# -- Setting that allows you to enable network policies that control access to Longhorn pods.
enabled: false
# -- Distribution that determines the policy for allowing access for an ingress. (Options: "k3s", "rke2", "rke1")
type: "k3s"
image:
longhorn:
engine:
# -- Registry for the Longhorn Engine image.
registry: ""
# -- Repository for the Longhorn Engine image.
repository: longhornio/longhorn-engine
# -- Tag for the Longhorn Engine image.
tag: v1.11.0
manager:
# -- Registry for the Longhorn Manager image.
registry: ""
# -- Repository for the Longhorn Manager image.
repository: longhornio/longhorn-manager
# -- Tag for the Longhorn Manager image.
tag: v1.11.0
ui:
# -- Registry for the Longhorn UI image.
registry: ""
# -- Repository for the Longhorn UI image.
repository: longhornio/longhorn-ui
# -- Tag for the Longhorn UI image.
tag: v1.11.0
instanceManager:
# -- Registry for the Longhorn Instance Manager image.
registry: ""
# -- Repository for the Longhorn Instance Manager image.
repository: longhornio/longhorn-instance-manager
# -- Tag for the Longhorn Instance Manager image.
tag: v1.11.0
shareManager:
# -- Registry for the Longhorn Share Manager image.
registry: ""
# -- Repository for the Longhorn Share Manager image.
repository: longhornio/longhorn-share-manager
# -- Tag for the Longhorn Share Manager image.
tag: v1.11.0
backingImageManager:
# -- Registry for the Backing Image Manager image. When unspecified, Longhorn uses the default value.
registry: ""
# -- Repository for the Backing Image Manager image. When unspecified, Longhorn uses the default value.
repository: longhornio/backing-image-manager
# -- Tag for the Backing Image Manager image. When unspecified, Longhorn uses the default value.
tag: v1.11.0
supportBundleKit:
# -- Registry for the Longhorn Support Bundle Manager image.
registry: ""
# -- Repository for the Longhorn Support Bundle Manager image.
repository: longhornio/support-bundle-kit
# -- Tag for the Longhorn Support Bundle Manager image.
tag: v0.0.79
csi:
attacher:
# -- Registry for the CSI attacher image. When unspecified, Longhorn uses the default value.
registry: ""
# -- Repository for the CSI attacher image. When unspecified, Longhorn uses the default value.
repository: longhornio/csi-attacher
# -- Tag for the CSI attacher image. When unspecified, Longhorn uses the default value.
tag: v4.10.0-20251226
provisioner:
# -- Registry for the CSI Provisioner image. When unspecified, Longhorn uses the default value.
registry: ""
# -- Repository for the CSI Provisioner image. When unspecified, Longhorn uses the default value.
repository: longhornio/csi-provisioner
# -- Tag for the CSI Provisioner image. When unspecified, Longhorn uses the default value.
tag: v5.3.0-20251226
nodeDriverRegistrar:
# -- Registry for the CSI Node Driver Registrar image. When unspecified, Longhorn uses the default value.
registry: ""
# -- Repository for the CSI Node Driver Registrar image. When unspecified, Longhorn uses the default value.
repository: longhornio/csi-node-driver-registrar
# -- Tag for the CSI Node Driver Registrar image. When unspecified, Longhorn uses the default value.
tag: v2.15.0-20251226
resizer:
# -- Registry for the CSI Resizer image. When unspecified, Longhorn uses the default value.
registry: ""
# -- Repository for the CSI Resizer image. When unspecified, Longhorn uses the default value.
repository: longhornio/csi-resizer
# -- Tag for the CSI Resizer image. When unspecified, Longhorn uses the default value.
tag: v2.0.0-20251226
snapshotter:
# -- Registry for the CSI Snapshotter image. When unspecified, Longhorn uses the default value.
registry: ""
# -- Repository for the CSI Snapshotter image. When unspecified, Longhorn uses the default value.
repository: longhornio/csi-snapshotter
# -- Tag for the CSI Snapshotter image. When unspecified, Longhorn uses the default value.
tag: v8.4.0-20251226
livenessProbe:
# -- Registry for the CSI liveness probe image. When unspecified, Longhorn uses the default value.
registry: ""
# -- Repository for the CSI liveness probe image. When unspecified, Longhorn uses the default value.
repository: longhornio/livenessprobe
# -- Tag for the CSI liveness probe image. When unspecified, Longhorn uses the default value.
tag: v2.17.0-20251226
openshift:
oauthProxy:
# -- Registry for the OAuth Proxy image. Specify the upstream image (for example, "quay.io/openshift/origin-oauth-proxy"). This setting applies only to OpenShift users.
registry: ""
# -- Repository for the OAuth Proxy image. Specify the upstream image (for example, "quay.io/openshift/origin-oauth-proxy"). This setting applies only to OpenShift users.
repository: ""
# -- Tag for the OAuth Proxy image. Specify OCP/OKD version 4.1 or later (including version 4.18, which is available at quay.io/openshift/origin-oauth-proxy:4.18). This setting applies only to OpenShift users.
tag: ""
# -- Image pull policy that applies to all user-deployed Longhorn components, such as Longhorn Manager, Longhorn driver, and Longhorn UI.
pullPolicy: IfNotPresent
service:
ui:
# -- Service type for Longhorn UI. (Options: "ClusterIP", "NodePort", "LoadBalancer", "Rancher-Proxy")
type: ClusterIP
# -- NodePort port number for Longhorn UI. When unspecified, Longhorn selects a free port between 30000 and 32767.
nodePort: null
# -- Class of a load balancer implementation
loadBalancerClass: ""
# -- Annotation for the Longhorn UI service.
annotations: {}
## If you want to set annotations for the Longhorn UI service, delete the `{}` in the line above
## and uncomment this example block
# annotation-key1: "annotation-value1"
# annotation-key2: "annotation-value2"
labels: {}
## If you want to set additional labels for the Longhorn UI service, delete the `{}` in the line above
## and uncomment this example block
# label-key1: "label-value1"
# label-key2: "label-value2"
manager:
# -- Service type for Longhorn Manager.
type: ClusterIP
# -- NodePort port number for Longhorn Manager. When unspecified, Longhorn selects a free port between 30000 and 32767.
nodePort: ""
persistence:
# -- Setting that allows you to specify the default Longhorn StorageClass.
defaultClass: true
# -- Filesystem type of the default Longhorn StorageClass.
defaultFsType: ext4
# -- mkfs parameters of the default Longhorn StorageClass.
defaultMkfsParams: ""
# -- Replica count of the default Longhorn StorageClass.
defaultClassReplicaCount: 3
# -- Data locality of the default Longhorn StorageClass. (Options: "disabled", "best-effort")
defaultDataLocality: disabled
# -- Reclaim policy that provides instructions for handling of a volume after its claim is released. (Options: "Retain", "Delete")
reclaimPolicy: Delete
# -- VolumeBindingMode controls when volume binding and dynamic provisioning should occur. (Options: "Immediate", "WaitForFirstConsumer") (Defaults to "Immediate")
volumeBindingMode: "Immediate"
# -- Setting that allows you to enable live migration of a Longhorn volume from one node to another.
migratable: false
# -- Setting that disables the revision counter and thereby prevents Longhorn from tracking all write operations to a volume. When salvaging a volume, Longhorn uses properties of the volume-head-xxx.img file (the last file size and the last time the file was modified) to select the replica to be used for volume recovery.
disableRevisionCounter: "true"
# -- Set NFS mount options for Longhorn StorageClass for RWX volumes
nfsOptions: ""
recurringJobSelector:
# -- Setting that allows you to enable the recurring job selector for a Longhorn StorageClass.
enable: false
# -- Recurring job selector for a Longhorn StorageClass. Ensure that quotes are used correctly when specifying job parameters. (Example: `[{"name":"backup", "isGroup":true}]`)
jobList: []
backingImage:
# -- Setting that allows you to use a backing image in a Longhorn StorageClass.
enable: false
# -- Backing image to be used for creating and restoring volumes in a Longhorn StorageClass. When no backing images are available, specify the data source type and parameters that Longhorn can use to create a backing image.
name: ~
# -- Data source type of a backing image used in a Longhorn StorageClass.
# If the backing image exists in the cluster, Longhorn uses this setting to verify the image.
# If the backing image does not exist, Longhorn creates one using the specified data source type.
dataSourceType: ~
# -- Data source parameters of a backing image used in a Longhorn StorageClass.
# You can specify a JSON string of a map. (Example: `'{\"url\":\"https://backing-image-example.s3-region.amazonaws.com/test-backing-image\"}'`)
dataSourceParameters: ~
# -- Expected SHA-512 checksum of a backing image used in a Longhorn StorageClass.
expectedChecksum: ~
defaultDiskSelector:
# -- Setting that allows you to enable the disk selector for the default Longhorn StorageClass.
enable: false
# -- Disk selector for the default Longhorn StorageClass. Longhorn uses only disks with the specified tags for storing volume data. (Examples: "nvme,sata")
selector: ""
defaultNodeSelector:
# -- Setting that allows you to enable the node selector for the default Longhorn StorageClass.
enable: false
# -- Node selector for the default Longhorn StorageClass. Longhorn uses only nodes with the specified tags for storing volume data. (Examples: "storage,fast")
selector: ""
# -- Setting that allows you to enable automatic snapshot removal during filesystem trim for a Longhorn StorageClass. (Options: "ignored", "enabled", "disabled")
unmapMarkSnapChainRemoved: ignored
# -- Setting that allows you to specify the data engine version for the default Longhorn StorageClass. (Options: "v1", "v2")
dataEngine: v1
# -- Setting that allows you to specify the backup target for the default Longhorn StorageClass.
backupTargetName: default
preUpgradeChecker:
# -- Setting that allows Longhorn to perform pre-upgrade checks. Disable this setting when installing Longhorn using Argo CD or other GitOps solutions.
jobEnabled: true
# -- Setting that allows Longhorn to perform upgrade version checks after starting the Longhorn Manager DaemonSet Pods. Disabling this setting also disables `preUpgradeChecker.jobEnabled`. Longhorn recommends keeping this setting enabled.
upgradeVersionCheck: true
csi:
# -- kubelet root directory. When unspecified, Longhorn uses the default value.
kubeletRootDir: ~
# -- Configures Pod anti-affinity to prevent multiple instances on the same node. Use soft (tries to separate) or hard (must separate). When unspecified, Longhorn uses the default value ("soft").
podAntiAffinityPreset: ~
# -- Replica count of the CSI Attacher. When unspecified, Longhorn uses the default value ("3").
attacherReplicaCount: ~
# -- Replica count of the CSI Provisioner. When unspecified, Longhorn uses the default value ("3").
provisionerReplicaCount: ~
# -- Replica count of the CSI Resizer. When unspecified, Longhorn uses the default value ("3").
resizerReplicaCount: ~
# -- Replica count of the CSI Snapshotter. When unspecified, Longhorn uses the default value ("3").
snapshotterReplicaCount: ~
defaultSettings:
# -- Setting that allows Longhorn to automatically attach a volume and create snapshots or backups when recurring jobs are run.
allowRecurringJobWhileVolumeDetached: ~
# -- Setting that allows Longhorn to automatically create a default disk only on nodes with the label "node.longhorn.io/create-default-disk=true" (if no other disks exist). When this setting is disabled, Longhorn creates a default disk on each node that is added to the cluster.
createDefaultDiskLabeledNodes: ~
# -- Default path to use for storing data on a host. An absolute directory path indicates a filesystem-type disk used by the V1 Data Engine, while a path to a block device indicates a block-type disk used by the V2 Data Engine. The default value is "/var/lib/longhorn/".
defaultDataPath: ~
# -- Default data locality. A Longhorn volume has data locality if a local replica of the volume exists on the same node as the pod that is using the volume.
defaultDataLocality: ~
# -- Setting that allows scheduling on nodes with healthy replicas of the same volume. This setting is disabled by default.
replicaSoftAntiAffinity: ~
# -- Setting that automatically rebalances replicas when an available node is discovered.
replicaAutoBalance: ~
# -- Percentage of storage that can be allocated relative to hard drive capacity. The default value is "100".
storageOverProvisioningPercentage: ~
# -- Percentage of minimum available disk capacity. When the minimum available capacity exceeds the total available capacity, the disk becomes unschedulable until more space is made available for use. The default value is "25".
storageMinimalAvailablePercentage: ~
# -- Percentage of disk space that is not allocated to the default disk on each new Longhorn node.
storageReservedPercentageForDefaultDisk: ~
# -- Upgrade Checker that periodically checks for new Longhorn versions. When a new version is available, a notification appears on the Longhorn UI. This setting is enabled by default
upgradeChecker: ~
# -- The Upgrade Responder sends a notification whenever a new Longhorn version that you can upgrade to becomes available. The default value is https://longhorn-upgrade-responder.rancher.io/v1/checkupgrade.
upgradeResponderURL: ~
# -- The external URL used to access the Longhorn Manager API. When set, this URL is returned in API responses (the actions and links fields) instead of the internal pod IP. This is useful when accessing the API through Ingress or Gateway API HTTPRoute. Format: scheme://host[:port] (for example, https://longhorn.example.com or https://longhorn.example.com:8443). Leave it empty to use the default behavior.
managerUrl: ~
# -- Default number of replicas for volumes created using the Longhorn UI. For Kubernetes configuration, modify the `numberOfReplicas` field in the StorageClass. The default value is "{"v1":"3","v2":"3"}".
defaultReplicaCount: ~
# -- Default name of Longhorn static StorageClass. "storageClassName" is assigned to PVs and PVCs that are created for an existing Longhorn volume. "storageClassName" can also be used as a label, so it is possible to use a Longhorn StorageClass to bind a workload to an existing PV without creating a Kubernetes StorageClass object. "storageClassName" needs to be an existing StorageClass. The default value is "longhorn-static".
defaultLonghornStaticStorageClass: ~
# -- Number of minutes that Longhorn keeps a failed backup resource. When the value is "0", automatic deletion is disabled.
failedBackupTTL: ~
# -- Number of minutes that Longhorn allows for the backup execution. The default value is "1".
backupExecutionTimeout: ~
# -- Setting that restores recurring jobs from a backup volume on a backup target and creates recurring jobs if none exist during backup restoration.
restoreVolumeRecurringJobs: ~
# -- Maximum number of successful recurring backup and snapshot jobs to be retained. When the value is "0", a history of successful recurring jobs is not retained.
recurringSuccessfulJobsHistoryLimit: ~
# -- Maximum number of failed recurring backup and snapshot jobs to be retained. When the value is "0", a history of failed recurring jobs is not retained.
recurringFailedJobsHistoryLimit: ~
# -- Maximum number of snapshots or backups to be retained.
recurringJobMaxRetention: ~
# -- Maximum number of failed support bundles that can exist in the cluster. When the value is "0", Longhorn automatically purges all failed support bundles.
supportBundleFailedHistoryLimit: ~
# -- Taint or toleration for system-managed Longhorn components.
# Specify values using a semicolon-separated list in `kubectl taint` syntax (Example: key1=value1:effect; key2=value2:effect).
taintToleration: ~
# -- Node selector for system-managed Longhorn components.
systemManagedComponentsNodeSelector: ~
# -- Resource limits for system-managed CSI components.
# This setting allows you to configure CPU and memory requests/limits for CSI attacher, provisioner, resizer, snapshotter, and plugin components.
# Supported components: csi-attacher, csi-provisioner, csi-resizer, csi-snapshotter, longhorn-csi-plugin, node-driver-registrar, longhorn-liveness-probe.
# Notice that changing resource limits will cause CSI components to restart, which may temporarily affect volume provisioning and attach/detach operations until the components are ready. The value should be a JSON object with component names as keys and ResourceRequirements as values.
systemManagedCSIComponentsResourceLimits: ~
# -- PriorityClass for system-managed Longhorn components.
# This setting can help prevent Longhorn components from being evicted under Node Pressure.
# Notice that this will be applied to Longhorn user-deployed components by default if there are no priority class values set yet, such as `longhornManager.priorityClass`.
priorityClass: &defaultPriorityClassNameRef "longhorn-critical"
# -- Setting that allows Longhorn to automatically salvage volumes when all replicas become faulty (for example, when the network connection is interrupted). Longhorn determines which replicas are usable and then uses these replicas for the volume. This setting is enabled by default.
autoSalvage: ~
# -- Setting that allows Longhorn to automatically delete a workload pod that is managed by a controller (for example, daemonset) whenever a Longhorn volume is detached unexpectedly (for example, during Kubernetes upgrades). After deletion, the controller restarts the pod and then Kubernetes handles volume reattachment and remounting.
autoDeletePodWhenVolumeDetachedUnexpectedly: ~
# -- Blacklist of controller api/kind values for the setting Automatically Delete Workload Pod when the Volume Is Detached Unexpectedly. If a workload pod is managed by a controller whose api/kind is listed in this blacklist, Longhorn will not automatically delete the pod when its volume is unexpectedly detached. Multiple controller api/kind entries can be specified, separated by semicolons. For example: `apps/StatefulSet;apps/DaemonSet`. Note that the controller api/kind is case sensitive and must exactly match the api/kind in the workload pod's owner reference.
blacklistForAutoDeletePodWhenVolumeDetachedUnexpectedly: ~
# -- Setting that prevents Longhorn Manager from scheduling replicas on a cordoned Kubernetes node. This setting is enabled by default.
disableSchedulingOnCordonedNode: ~
# -- Setting that allows Longhorn to schedule new replicas of a volume to nodes in the same zone as existing healthy replicas. Nodes that do not belong to any zone are treated as existing in the zone that contains healthy replicas. When identifying zones, Longhorn relies on the label "topology.kubernetes.io/zone=<Zone name of the node>" in the Kubernetes node object.
replicaZoneSoftAntiAffinity: ~
# -- Setting that allows scheduling on disks with existing healthy replicas of the same volume. This setting is enabled by default.
replicaDiskSoftAntiAffinity: ~
# -- Policy that defines the action Longhorn takes when a volume is stuck with a StatefulSet or Deployment pod on a node that failed.
nodeDownPodDeletionPolicy: ~
# -- Policy that defines the action Longhorn takes when a node with the last healthy replica of a volume is drained.
nodeDrainPolicy: ~
# -- Setting that allows automatic detaching of manually-attached volumes when a node is cordoned.
detachManuallyAttachedVolumesWhenCordoned: ~
# -- Number of seconds that Longhorn waits before reusing existing data on a failed replica instead of creating a new replica of a degraded volume.
replicaReplenishmentWaitInterval: ~
# -- Maximum number of replicas that can be concurrently rebuilt on each node.
concurrentReplicaRebuildPerNodeLimit: ~
# -- Maximum number of file synchronization operations that can run concurrently during a single replica rebuild. Right now, it's for v1 data engine only.
rebuildConcurrentSyncLimit: ~
# -- Maximum number of volumes that can be concurrently restored on each node using a backup. When the value is "0", restoration of volumes using a backup is disabled.
concurrentVolumeBackupRestorePerNodeLimit: ~
# -- Setting that disables the revision counter and thereby prevents Longhorn from tracking all write operations to a volume. When salvaging a volume, Longhorn uses properties of the "volume-head-xxx.img" file (the last file size and the last time the file was modified) to select the replica to be used for volume recovery. This setting applies only to volumes created using the Longhorn UI.
disableRevisionCounter: '{"v1":"true"}'
# -- Image pull policy for system-managed pods, such as Instance Manager, engine images, and CSI Driver. Changes to the image pull policy are applied only after the system-managed pods restart.
systemManagedPodsImagePullPolicy: ~
# -- Setting that allows you to create and attach a volume without having all replicas scheduled at the time of creation.
allowVolumeCreationWithDegradedAvailability: ~
# -- Setting that allows Longhorn to automatically clean up the system-generated snapshot after replica rebuilding is completed.
autoCleanupSystemGeneratedSnapshot: ~
# -- Setting that allows Longhorn to automatically clean up the snapshot generated by a recurring backup job.
autoCleanupRecurringJobBackupSnapshot: ~
# -- Maximum number of engines that are allowed to concurrently upgrade on each node after Longhorn Manager is upgraded. When the value is "0", Longhorn does not automatically upgrade volume engines to the new default engine image version.
concurrentAutomaticEngineUpgradePerNodeLimit: ~
# -- Number of minutes that Longhorn waits before cleaning up the backing image file when no replicas in the disk are using it.
backingImageCleanupWaitInterval: ~
# -- Number of seconds that Longhorn waits before downloading a backing image file again when the status of all image disk files changes to "failed" or "unknown".
backingImageRecoveryWaitInterval: ~
# -- Percentage of the total allocatable CPU resources on each node to be reserved for each instance manager pod. The default value is {"v1":"12","v2":"12"}.
guaranteedInstanceManagerCPU: ~
# -- Setting that notifies Longhorn that the cluster is using the Kubernetes Cluster Autoscaler.
kubernetesClusterAutoscalerEnabled: ~
# -- Enables Longhorn to automatically delete orphaned resources and their associated data or processes (e.g., stale replicas). Orphaned resources on failed or unknown nodes are not automatically cleaned up.
# You need to specify the resource types to be deleted using a semicolon-separated list (e.g., `replica-data;instance`). Available items are: `replica-data`, `instance`.
orphanResourceAutoDeletion: ~
# -- Specifies the wait time, in seconds, before Longhorn automatically deletes an orphaned Custom Resource (CR) and its associated resources.
# Note that if a user manually deletes an orphaned CR, the deletion occurs immediately and does not respect this grace period.
orphanResourceAutoDeletionGracePeriod: ~
# -- Storage network for in-cluster traffic. When unspecified, Longhorn uses the Kubernetes cluster network.
storageNetwork: ~
# -- Specifies a dedicated network for mounting RWX (ReadWriteMany) volumes. Leave this blank to use the default Kubernetes cluster network. **Caution**: This setting should change after all RWX volumes are detached because some Longhorn component pods must be recreated to apply the setting. You cannot modify this setting while RWX volumes are still attached.
endpointNetworkForRWXVolume: ~
# -- Flag that prevents accidental uninstallation of Longhorn.
deletingConfirmationFlag: ~
# -- Timeout between the Longhorn Engine and replicas. Specify a value between "8" and "30" seconds. The default value is "8".
engineReplicaTimeout: ~
# -- Setting that allows you to enable and disable snapshot hashing and data integrity checks.
snapshotDataIntegrity: ~
# -- Setting that allows disabling of snapshot hashing after snapshot creation to minimize impact on system performance.
snapshotDataIntegrityImmediateCheckAfterSnapshotCreation: ~
# -- Setting that defines when Longhorn checks the integrity of data in snapshot disk files. You must use the Unix cron expression format.
snapshotDataIntegrityCronjob: ~
# -- Setting that controls how many snapshot heavy task operations (such as purge and clone) can run concurrently per node. This is a best-effort mechanism: due to the distributed nature of the system, temporary oversubscription may occur. The limiter reduces worst-case overload but does not guarantee perfect enforcement.
snapshotHeavyTaskConcurrentLimit: ~
# -- Setting that allows Longhorn to automatically mark the latest snapshot and its parent files as removed during a filesystem trim. Longhorn does not remove snapshots containing multiple child files.
removeSnapshotsDuringFilesystemTrim: ~
# -- Setting that allows fast rebuilding of replicas using the checksum of snapshot disk files. Before enabling this setting, you must set the snapshot-data-integrity value to "enable" or "fast-check".
fastReplicaRebuildEnabled: ~
# -- Number of seconds that an HTTP client waits for a response from a File Sync server before considering the connection to have failed.
replicaFileSyncHttpClientTimeout: ~
# -- Number of seconds that Longhorn allows for the completion of replica rebuilding and snapshot cloning operations.
longGRPCTimeOut: ~
# -- Log levels that indicate the type and severity of logs in Longhorn Manager. The default value is "Info". (Options: "Panic", "Fatal", "Error", "Warn", "Info", "Debug", "Trace")
logLevel: ~
# -- Specifies the directory on the host where Longhorn stores log files for the instance manager pod. Currently, it is only used for instance manager pods in the v2 data engine.
logPath: ~
# -- Setting that allows you to specify a backup compression method.
backupCompressionMethod: ~
# -- Maximum number of worker threads that can concurrently run for each backup.
backupConcurrentLimit: ~
# -- Specifies the default backup block size, in MiB, used when creating a new volume. Supported values are 2 or 16.
defaultBackupBlockSize: ~
# -- Maximum number of worker threads that can concurrently run for each restore operation.
restoreConcurrentLimit: ~
# -- Setting that allows you to enable the V1 Data Engine.
v1DataEngine: ~
# -- Setting that allows you to enable the V2 Data Engine, which is based on the Storage Performance Development Kit (SPDK). The V2 Data Engine is an experimental feature and should not be used in production environments.
v2DataEngine: ~
# -- Applies only to the V2 Data Engine. Enables hugepages for the Storage Performance Development Kit (SPDK) target daemon. If disabled, legacy memory is used. Allocation size is set via the Data Engine Memory Size setting.
dataEngineHugepageEnabled: ~
# -- Applies only to the V2 Data Engine. Specifies the hugepage size, in MiB, for the Storage Performance Development Kit (SPDK) target daemon. The default value is "{"v2":"2048"}"
dataEngineMemorySize: ~
# -- Applies only to the V2 Data Engine. Specifies the CPU cores on which the Storage Performance Development Kit (SPDK) target daemon runs. The daemon is deployed in each Instance Manager pod. Ensure that the number of assigned cores does not exceed the guaranteed Instance Manager CPUs for the V2 Data Engine. The default value is "{"v2":"0x1"}".
dataEngineCPUMask: ~
# -- This setting specifies the default write bandwidth limit (in megabytes per second) for volume replica rebuilding when using the v2 data engine (SPDK). If this value is set to 0, there will be no write bandwidth limitation. Individual volumes can override this setting by specifying their own rebuilding bandwidth limit.
replicaRebuildingBandwidthLimit: ~
# -- This setting specifies the default depth of each queue for Ublk frontend. This setting applies to volumes using the V2 Data Engine with Ublk front end. Individual volumes can override this setting by specifying their own Ublk queue depth.
defaultUblkQueueDepth: ~
# -- This setting specifies the default the number of queues for ublk frontend. This setting applies to volumes using the V2 Data Engine with Ublk front end. Individual volumes can override this setting by specifying their own number of queues for ublk.
defaultUblkNumberOfQueue: ~
# -- In seconds. The setting specifies the timeout for the instance manager pod liveness probe. The default value is 10 seconds.
instanceManagerPodLivenessProbeTimeout: ~
# -- Setting that allows scheduling of empty node selector volumes to any node.
allowEmptyNodeSelectorVolume: ~
# -- Setting that allows scheduling of empty disk selector volumes to any disk.
allowEmptyDiskSelectorVolume: ~
# -- Setting that allows Longhorn to periodically collect anonymous usage data for product improvement purposes. Longhorn sends collected data to the [Upgrade Responder](https://github.com/longhorn/upgrade-responder) server, which is the data source of the Longhorn Public Metrics Dashboard (https://metrics.longhorn.io). The Upgrade Responder server does not store data that can be used to identify clients, including IP addresses.
allowCollectingLonghornUsageMetrics: ~
# -- Setting that temporarily prevents all attempts to purge volume snapshots.
disableSnapshotPurge: ~
# -- Maximum snapshot count for a volume. The value should be between 2 to 250
snapshotMaxCount: ~
# -- Applies only to the V2 Data Engine. Specifies the log level for the Storage Performance Development Kit (SPDK) target daemon. Supported values are: Error, Warning, Notice, Info, and Debug. The default is Notice.
dataEngineLogLevel: ~
# -- Applies only to the V2 Data Engine. Specifies the log flags for the Storage Performance Development Kit (SPDK) target daemon.
dataEngineLogFlags: ~
# -- Setting that freezes the filesystem on the root partition before a snapshot is created.
freezeFilesystemForSnapshot: ~
# -- Setting that automatically cleans up the snapshot when the backup is deleted.
autoCleanupSnapshotWhenDeleteBackup: ~
# -- Setting that automatically cleans up the snapshot after the on-demand backup is completed.
autoCleanupSnapshotAfterOnDemandBackupCompleted: ~
# -- Setting that allows Longhorn to detect node failure and immediately migrate affected RWX volumes.
rwxVolumeFastFailover: ~
# -- Enables automatic rebuilding of degraded replicas while the volume is detached. This setting only takes effect if the individual volume setting is set to `ignored` or `enabled`.
offlineReplicaRebuilding: ~
# -- Controls whether Longhorn monitors and records health information for node disks. When disabled, disk health checks and status updates are skipped.
nodeDiskHealthMonitoring: ~
# -- Setting that allows you to update the default backupstore.
defaultBackupStore:
# -- Endpoint used to access the default backupstore. (Options: "NFS", "CIFS", "AWS", "GCP", "AZURE")
backupTarget: "s3://longhorn-backup@us-east-1/"
# -- Name of the Kubernetes secret associated with the default backup target.
backupTargetCredentialSecret: "longhorn-crypto"
# -- Number of seconds that Longhorn waits before checking the default backupstore for new backups. The default value is "300". When the value is "0", polling is disabled.
pollInterval: 300
privateRegistry:
# -- Set to `true` to automatically create a new private registry secret.
createSecret: ~
# -- URL of a private registry. When unspecified, Longhorn uses the default system registry.
registryUrl: ~
# -- User account used for authenticating with a private registry.
registryUser: ~
# -- Password for authenticating with a private registry.
registryPasswd: ~
# -- If create a new private registry secret is true, create a Kubernetes secret with this name; else use the existing secret of this name. Use it to pull images from your private registry.
registrySecret: ~
longhornManager:
log:
# -- Format of Longhorn Manager logs. (Options: "plain", "json")
format: plain
# -- PriorityClass for Longhorn Manager.
priorityClass: *defaultPriorityClassNameRef
# -- Toleration for Longhorn Manager on nodes allowed to run Longhorn components.
tolerations: []
## If you want to set tolerations for Longhorn Manager DaemonSet, delete the `[]` in the line above
## and uncomment this example block
# - key: "key"
# operator: "Equal"
# value: "value"
# effect: "NoSchedule"
# -- Resource requests and limits for Longhorn Manager pods.
resources: ~
# -- Node selector for Longhorn Manager. Specify the nodes allowed to run Longhorn Manager.
nodeSelector: {}
## If you want to set node selector for Longhorn Manager DaemonSet, delete the `{}` in the line above
## and uncomment this example block
# label-key1: "label-value1"
# label-key2: "label-value2"
# -- Annotation for the Longhorn Manager service.
serviceAnnotations: {}
## If you want to set annotations for the Longhorn Manager service, delete the `{}` in the line above
## and uncomment this example block
# annotation-key1: "annotation-value1"
# annotation-key2: "annotation-value2"
serviceLabels: {}
## If you want to set labels for the Longhorn Manager service, delete the `{}` in the line above
## and uncomment this example block
# label-key1: "label-value1"
# label-key2: "label-value2"
## DaemonSet update strategy. Default "100% unavailable" matches the upgrade
## flow (old managers removed before new start); override for rolling updates
## if you prefer that behavior.
updateStrategy:
rollingUpdate:
maxUnavailable: "100%"
longhornDriver:
log:
# -- Format of longhorn-driver logs. (Options: "plain", "json")
format: plain
# -- PriorityClass for Longhorn Driver.
priorityClass: *defaultPriorityClassNameRef
# -- Toleration for Longhorn Driver on nodes allowed to run Longhorn components.
tolerations: []
## If you want to set tolerations for Longhorn Driver Deployer Deployment, delete the `[]` in the line above
## and uncomment this example block
# - key: "key"
# operator: "Equal"
# value: "value"
# effect: "NoSchedule"
# -- Node selector for Longhorn Driver. Specify the nodes allowed to run Longhorn Driver.
nodeSelector: {}
## If you want to set node selector for Longhorn Driver Deployer Deployment, delete the `{}` in the line above
## and uncomment this example block
# label-key1: "label-value1"
# label-key2: "label-value2"
longhornUI:
# -- Replica count for Longhorn UI.
replicas: 2
# -- PriorityClass for Longhorn UI.
priorityClass: *defaultPriorityClassNameRef
# -- Affinity for Longhorn UI pods. Specify the affinity you want to use for Longhorn UI.
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- longhorn-ui
topologyKey: kubernetes.io/hostname
# -- Toleration for Longhorn UI on nodes allowed to run Longhorn components.
tolerations: []
## If you want to set tolerations for Longhorn UI Deployment, delete the `[]` in the line above
## and uncomment this example block
# - key: "key"
# operator: "Equal"
# value: "value"
# effect: "NoSchedule"
# -- Node selector for Longhorn UI. Specify the nodes allowed to run Longhorn UI.
nodeSelector: {}
## If you want to set node selector for Longhorn UI Deployment, delete the `{}` in the line above
## and uncomment this example block
# label-key1: "label-value1"
# label-key2: "label-value2"
ingress:
# -- Setting that allows Longhorn to generate ingress records for the Longhorn UI service.
enabled: false
# -- IngressClass resource that contains ingress configuration, including the name of the Ingress controller.
# ingressClassName can replace the kubernetes.io/ingress.class annotation used in earlier Kubernetes releases.
ingressClassName: ~
# -- Hostname of the Layer 7 load balancer.
host: sslip.io
# -- Extra hostnames for TLS (Subject Alternative Names - SAN). Used when you need multiple FQDNs for the same ingress.
# Example:
# extraHosts:
# - longhorn.example.com
# - longhorn-ui.internal.local
extraHosts: []
# -- Setting that allows you to enable TLS on ingress records.
tls: false
# -- Setting that allows you to enable secure connections to the Longhorn UI service via port 443.
secureBackends: false
# -- TLS secret that contains the private key and certificate to be used for TLS. This setting applies only when TLS is enabled on ingress records.
tlsSecret: longhorn.local-tls
# -- Default ingress path. You can access the Longhorn UI by following the full ingress path {{host}}+{{path}}.
path: /
# -- Ingress path type. To maintain backward compatibility, the default value is "ImplementationSpecific".
pathType: ImplementationSpecific
## If you're using kube-lego, you will want to add:
## kubernetes.io/tls-acme: true
##
## For a full list of possible ingress annotations, please see
## ref: https://github.com/kubernetes/ingress-nginx/blob/master/docs/annotations.md
##
## If tls is set to true, annotation ingress.kubernetes.io/secure-backends: "true" will automatically be set
# -- Ingress annotations in the form of key-value pairs.
annotations:
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: true
# -- Secret that contains a TLS private key and certificate. Use secrets if you want to use your own certificates to secure ingresses.
secrets:
## If you're providing your own certificates, please use this to add the certificates as secrets
## key and certificate should start with -----BEGIN CERTIFICATE----- or
## -----BEGIN RSA PRIVATE KEY-----
##
## name should line up with a tlsSecret set further up
## If you're using kube-lego, this is unneeded, as it will create the secret for you if it is not set
##
## It is also possible to create and manage the certificates outside of this helm chart
## Please see README.md for more information
# - name: longhorn.local-tls
# key:
# certificate:
httproute:
# -- Setting that allows Longhorn to generate HTTPRoute records for the Longhorn UI service using Gateway API.
enabled: false
# -- Gateway references for HTTPRoute. Specify which Gateway(s) should handle this route.
parentRefs: []
## Example:
# - name: gateway-name
# namespace: gateway-namespace
# # Optional fields with defaults:
# # group: gateway.networking.k8s.io # default
# # kind: Gateway # default
# # sectionName: https # optional, targets a specific listener
# -- List of hostnames for the HTTPRoute. Multiple hostnames are supported.
hostnames: []
## Example:
# - longhorn.example.com
# - longhorn.example.org
# -- Default path for HTTPRoute. You can access the Longhorn UI by following the full path.
path: /
# -- Path match type for HTTPRoute. (Options: "Exact", "PathPrefix")
pathType: PathPrefix
# -- Annotations for the HTTPRoute resource in the form of key-value pairs.
annotations: {}
## Example:
# annotation-key1: "annotation-value1"
# -- Setting that allows you to enable pod security policies (PSPs) that allow privileged Longhorn pods to start. This setting applies only to clusters running Kubernetes 1.25 and earlier, and with the built-in Pod Security admission controller enabled.
enablePSP: false
# -- Specify override namespace, specifically this is useful for using longhorn as sub-chart and its release namespace is not the `longhorn-system`.
namespaceOverride: ""
# -- Annotation for the Longhorn Manager DaemonSet pods. This setting is optional.
annotations: {}
serviceAccount:
# -- Annotations to add to the service account
annotations: {}
metrics:
serviceMonitor:
# -- Setting that allows the creation of a Prometheus ServiceMonitor resource for Longhorn Manager components.
enabled: false
# -- Additional labels for the Prometheus ServiceMonitor resource.
additionalLabels: {}
# -- Annotations for the Prometheus ServiceMonitor resource.
annotations: {}
# -- Interval at which Prometheus scrapes the metrics from the target.
interval: ""
# -- Timeout after which Prometheus considers the scrape to be failed.
scrapeTimeout: ""
# -- Configures the relabeling rules to apply the targets metadata labels. See the [Prometheus Operator
# documentation](https://prometheus-operator.dev/docs/api-reference/api/#monitoring.coreos.com/v1.Endpoint) for
# formatting details.
relabelings: []
# -- Configures the relabeling rules to apply to the samples before ingestion. See the [Prometheus Operator
# documentation](https://prometheus-operator.dev/docs/api-reference/api/#monitoring.coreos.com/v1.Endpoint) for
# formatting details.
metricRelabelings: []
## openshift settings
openshift:
# -- Setting that allows Longhorn to integrate with OpenShift.
enabled: false
ui:
# -- Route for connections between Longhorn and the OpenShift web console.
route: "longhorn-ui"
# -- Port for accessing the OpenShift web console.
port: 443
# -- Port for proxy that provides access to the OpenShift web console.
proxy: 8443
# -- Setting that allows Longhorn to generate code coverage profiles.
enableGoCoverDir: false
# -- Add extra objects manifests
extraObjects: []

View File

@@ -0,0 +1,60 @@
这是一个非常敏锐的问题。既然 Longhorn 已经通过多副本Replica实现了节点级的高可用为什么还要“多此一举”去对接 S3 呢?
简单来说:**副本是“容灾”,而 S3 是“避难所”。**
以下是 Longhorn 对接 S3 的核心理由:
---
## 1. 应对“全集群覆灭”风险
Longhorn 的默认副本(比如你设定的 3 副本)通常都分布在同一个 K3s 集群的物理节点上。
* **如果不接 S3** 如果发生极端情况(如机房断电导致文件系统损坏、人为误删整个 `longhorn-system` 命名空间、或者由于某种 Bug 导致数据卷损坏),你的数据就彻底丢了。
* **对接 S3 后:** S3 是**独立于 K3s 集群之外**的存储。即使你把整个 K3s 集群删掉再重装,只要 S3 里的备份还在,你就能在新集群里瞬间恢复 Redis 或 PG16 的数据。
---
## 2. 异地备份与低成本归档
Longhorn 将数据存在高性能的 SSD 上,价格较贵。而 S3尤其是自建的 MinIO 或云端 S3适合存放不经常访问的长久数据。
* **快照Snapshot**:存在本地,方便快速回滚。
* **备份Backup**:传到 S3。
你可以配置策略:每小时做一个快照,每天将快照同步到 S3 一次。这符合数据备份的 **3-2-1 原则**3个副本2种介质1个异地
---
## 3. 跨集群迁移与灾备 (DR)
如果你想把业务从“开发环境”迁移到“生产环境”,或者在另一个城市搭建一个备份集群:
1. 在 A 集群中,将 Longhorn 卷备份到 S3。
2. 在 B 集群中,连接同一个 S3。
3. 点击 **Restore**几分钟内B 集群就长出了一个一模一样的 Redis 卷。
Longhorn 甚至支持 **DR Volume灾备卷**,它会持续从 S3 拉取更新,处于“热备”状态。
---
## 4. 节省本地磁盘空间
Longhorn 的快照Snapshot如果积攒多了会占用大量的本地磁盘空间。
对接 S3 后,你可以将旧的备份保留在 S3 中而定期清理本地磁盘。S3 的存储空间几乎是无限的。
---
## 5. 你的 Redis 场景如何受益?
对于你的 Redis 独立 Pod
1. **崩溃恢复:** 如果你的 Redis AOF 文件因为某种原因在本地损坏了,你可以从 S3 找回昨天下午 4 点的备份。
2. **版本回滚:** 如果由于代码错误导致 Redis 里的数据被污染了,你可以通过 S3 的备份快速“穿越”回事故发生前。
---
### 配置建议
在你的 `/home/fei/k3s/009-基础设施/004-longhorn/values.yaml` 中,你会看到 `defaultSetting` 下有 `backupTarget`。你应该将其指向你的 S3 桶地址,例如:
`s3://longhorn-backup@us-east-1/`
**由于你已经有了 S3 服务,这等于是“免费”的数据保险。你需要我提供在 Longhorn 中配置 S3 的具体参数格式吗?**

View File

@@ -0,0 +1,43 @@
{
email admin@u6.net3w.com
}
# 示例域名配置
test.u6.net3w.com {
reverse_proxy traefik.kube-system.svc.cluster.local:80
}
# MinIO S3 API
s3.u6.net3w.com {
reverse_proxy traefik.kube-system.svc.cluster.local:80
}
# MinIO Console
console.s3.u6.net3w.com {
reverse_proxy traefik.kube-system.svc.cluster.local:80
}
# Longhorn 存储管理
longhorn.u6.net3w.com {
reverse_proxy traefik.kube-system.svc.cluster.local:80
}
# Grafana 监控仪表板
grafana.u6.net3w.com {
reverse_proxy traefik.kube-system.svc.cluster.local:80
}
# Prometheus 监控
prometheus.u6.net3w.com {
reverse_proxy traefik.kube-system.svc.cluster.local:80
}
# Alertmanager 告警管理
alertmanager.u6.net3w.com {
reverse_proxy traefik.kube-system.svc.cluster.local:80
}
# 导航页面
dh.u6.net3w.com {
reverse_proxy traefik.kube-system.svc.cluster.local:80
}

View File

@@ -0,0 +1,16 @@
#!/bin/bash
# 应用 Longhorn Ingress
echo "创建 Longhorn Ingress..."
kubectl apply -f longhorn-ingress.yaml
# 显示 Ingress 状态
echo ""
echo "Ingress 状态:"
kubectl get ingress -n longhorn-system
echo ""
echo "访问 Longhorn UI"
echo " URL: http://longhorn.local"
echo " 需要在 /etc/hosts 中添加:"
echo " <节点IP> longhorn.local"

View File

@@ -0,0 +1,19 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: longhorn-ingress
namespace: longhorn-system
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web
spec:
rules:
- host: longhorn.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: longhorn-frontend
port:
number: 80

View File

@@ -0,0 +1,202 @@
# Traefik Ingress 控制器配置
## 当前状态
K3s 默认已安装 Traefik 作为 Ingress 控制器。
- **命名空间**: kube-system
- **服务类型**: ClusterIP
- **端口**: 80 (HTTP), 443 (HTTPS)
## Traefik 配置信息
查看 Traefik 配置:
```bash
kubectl get deployment traefik -n kube-system -o yaml
```
查看 Traefik 服务:
```bash
kubectl get svc traefik -n kube-system
```
## 使用 Ingress
### 基本 HTTP Ingress 示例
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: example-ingress
namespace: default
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web
spec:
rules:
- host: example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: example-service
port:
number: 80
```
### HTTPS Ingress 示例(使用 TLS
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: example-ingress-tls
namespace: default
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: websecure
traefik.ingress.kubernetes.io/router.tls: "true"
spec:
tls:
- hosts:
- example.com
secretName: example-tls-secret
rules:
- host: example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: example-service
port:
number: 80
```
## 创建 TLS 证书
### 使用 Let's Encrypt (cert-manager)
1. 安装 cert-manager
```bash
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml
```
2. 创建 ClusterIssuer
```yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: your-email@example.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: traefik
```
### 使用自签名证书
```bash
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout tls.key -out tls.crt \
-subj "/CN=example.com/O=example"
kubectl create secret tls example-tls-secret \
--key tls.key --cert tls.crt -n default
```
## Traefik Dashboard
访问 Traefik Dashboard
```bash
kubectl port-forward -n kube-system $(kubectl get pods -n kube-system -l app.kubernetes.io/name=traefik -o name) 9000:9000
```
然后访问: http://localhost:9000/dashboard/
## 常用注解
### 重定向 HTTP 到 HTTPS
```yaml
annotations:
traefik.ingress.kubernetes.io/redirect-entry-point: https
traefik.ingress.kubernetes.io/redirect-permanent: "true"
```
### 设置超时
```yaml
annotations:
traefik.ingress.kubernetes.io/router.middlewares: default-timeout@kubernetescrd
```
### 启用 CORS
```yaml
annotations:
traefik.ingress.kubernetes.io/router.middlewares: default-cors@kubernetescrd
```
## 中间件示例
### 创建超时中间件
```yaml
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
name: timeout
namespace: default
spec:
forwardAuth:
address: http://auth-service
trustForwardHeader: true
```
## 监控和日志
查看 Traefik 日志:
```bash
kubectl logs -n kube-system -l app.kubernetes.io/name=traefik -f
```
## 故障排查
### 检查 Ingress 状态
```bash
kubectl get ingress -A
kubectl describe ingress <ingress-name> -n <namespace>
```
### 检查 Traefik 配置
```bash
kubectl get ingressroute -A
kubectl get middleware -A
```
## 外部访问配置
如果需要从外部访问,可以:
1. **使用 NodePort**
```bash
kubectl patch svc traefik -n kube-system -p '{"spec":{"type":"NodePort"}}'
```
2. **使用 LoadBalancer**(需要云环境或 MetalLB
```bash
kubectl patch svc traefik -n kube-system -p '{"spec":{"type":"LoadBalancer"}}'
```
3. **使用 HostPort**(直接绑定到节点端口 80/443
## 参考资源
- Traefik 官方文档: https://doc.traefik.io/traefik/
- K3s Traefik 配置: https://docs.k3s.io/networking#traefik-ingress-controller

View File

@@ -0,0 +1,34 @@
#!/bin/bash
# 添加 Prometheus 社区 Helm 仓库
echo "添加 Prometheus Helm 仓库..."
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# 创建命名空间
echo "创建 monitoring 命名空间..."
kubectl create namespace monitoring
# 安装 kube-prometheus-stack (包含 Prometheus, Grafana, Alertmanager)
echo "安装 kube-prometheus-stack..."
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring \
-f values.yaml
# 等待部署完成
echo "等待 Prometheus 和 Grafana 启动..."
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=grafana -n monitoring --timeout=300s
# 显示状态
echo ""
echo "监控系统部署完成!"
kubectl get pods -n monitoring
kubectl get svc -n monitoring
echo ""
echo "访问信息:"
echo " Grafana: http://grafana.local (需要配置 Ingress)"
echo " 默认用户名: admin"
echo " 默认密码: prom-operator"
echo ""
echo " Prometheus: http://prometheus.local (需要配置 Ingress)"

View File

@@ -0,0 +1,59 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana-ingress
namespace: monitoring
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web
spec:
rules:
- host: grafana.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: kube-prometheus-stack-grafana
port:
number: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: prometheus-ingress
namespace: monitoring
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web
spec:
rules:
- host: prometheus.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: kube-prometheus-stack-prometheus
port:
number: 9090
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: alertmanager-ingress
namespace: monitoring
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web
spec:
rules:
- host: alertmanager.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: kube-prometheus-stack-alertmanager
port:
number: 9093

View File

@@ -0,0 +1,241 @@
# Prometheus + Grafana 监控系统
## 组件说明
### Prometheus
- **功能**: 时间序列数据库,收集和存储指标数据
- **存储**: 20Gi Longhorn 卷
- **数据保留**: 15 天
- **访问**: http://prometheus.local
### Grafana
- **功能**: 可视化仪表板
- **存储**: 5Gi Longhorn 卷
- **默认用户**: admin
- **默认密码**: prom-operator
- **访问**: http://grafana.local
### Alertmanager
- **功能**: 告警管理和通知
- **存储**: 5Gi Longhorn 卷
- **访问**: http://alertmanager.local
### Node Exporter
- **功能**: 收集节点级别的系统指标CPU、内存、磁盘等
### Kube State Metrics
- **功能**: 收集 Kubernetes 资源状态指标
## 部署方式
```bash
bash deploy.sh
```
## 部署后配置
### 1. 应用 Ingress
```bash
kubectl apply -f ingress.yaml
```
### 2. 配置 /etc/hosts
```
<节点IP> grafana.local
<节点IP> prometheus.local
<节点IP> alertmanager.local
```
### 3. 访问 Grafana
1. 打开浏览器访问: http://grafana.local
2. 使用默认凭证登录:
- 用户名: admin
- 密码: prom-operator
3. 首次登录后建议修改密码
## 预置仪表板
Grafana 已预装多个仪表板:
1. **Kubernetes / Compute Resources / Cluster**
- 集群整体资源使用情况
2. **Kubernetes / Compute Resources / Namespace (Pods)**
- 按命名空间查看 Pod 资源使用
3. **Kubernetes / Compute Resources / Node (Pods)**
- 按节点查看 Pod 资源使用
4. **Kubernetes / Networking / Cluster**
- 集群网络流量统计
5. **Node Exporter / Nodes**
- 节点详细指标CPU、内存、磁盘、网络
## 监控目标
系统会自动监控:
- ✅ Kubernetes API Server
- ✅ Kubelet
- ✅ Node Exporter (节点指标)
- ✅ Kube State Metrics (K8s 资源状态)
- ✅ CoreDNS
- ✅ Prometheus 自身
- ✅ Grafana
## 添加自定义监控
### 监控 Redis
创建 ServiceMonitor
```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: redis-monitor
namespace: monitoring
spec:
selector:
matchLabels:
app: redis
namespaceSelector:
matchNames:
- redis
endpoints:
- port: redis
interval: 30s
```
### 监控 PostgreSQL
需要部署 postgres-exporter
```bash
helm install postgres-exporter prometheus-community/prometheus-postgres-exporter \
--namespace postgresql \
--set config.datasource.host=postgresql-service.postgresql.svc.cluster.local \
--set config.datasource.user=postgres \
--set config.datasource.password=postgres123
```
## 告警配置
### 查看告警规则
```bash
kubectl get prometheusrules -n monitoring
```
### 自定义告警规则
创建 PrometheusRule
```yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: custom-alerts
namespace: monitoring
spec:
groups:
- name: custom
interval: 30s
rules:
- alert: HighMemoryUsage
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "节点内存使用率超过 90%"
description: "节点 {{ $labels.instance }} 内存使用率为 {{ $value | humanizePercentage }}"
```
## 配置告警通知
编辑 Alertmanager 配置:
```bash
kubectl edit secret alertmanager-kube-prometheus-stack-alertmanager -n monitoring
```
添加邮件、Slack、钉钉等通知渠道。
## 数据持久化
所有数据都存储在 Longhorn 卷上:
- Prometheus 数据: 20Gi
- Grafana 配置: 5Gi
- Alertmanager 数据: 5Gi
可以通过 Longhorn UI 创建快照和备份到 S3。
## 常用操作
### 查看 Prometheus 目标
访问: http://prometheus.local/targets
### 查看告警
访问: http://alertmanager.local
### 导入自定义仪表板
1. 访问 Grafana
2. 点击 "+" -> "Import"
3. 输入仪表板 ID 或上传 JSON
推荐仪表板:
- Node Exporter Full: 1860
- Kubernetes Cluster Monitoring: 7249
- Longhorn: 13032
### 查看日志
```bash
# Prometheus 日志
kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus -f
# Grafana 日志
kubectl logs -n monitoring -l app.kubernetes.io/name=grafana -f
```
## 性能优化
### 调整数据保留时间
编辑 values.yaml 中的 `retention` 参数,然后:
```bash
helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring -f values.yaml
```
### 调整采集间隔
默认采集间隔为 30 秒,可以在 ServiceMonitor 中调整。
## 故障排查
### Prometheus 无法采集数据
```bash
# 检查 ServiceMonitor
kubectl get servicemonitor -A
# 检查 Prometheus 配置
kubectl get prometheus -n monitoring -o yaml
```
### Grafana 无法连接 Prometheus
检查 Grafana 数据源配置:
1. 登录 Grafana
2. Configuration -> Data Sources
3. 确认 Prometheus URL 正确
## 卸载
```bash
helm uninstall kube-prometheus-stack -n monitoring
kubectl delete namespace monitoring
```
## 参考资源
- Prometheus 文档: https://prometheus.io/docs/
- Grafana 文档: https://grafana.com/docs/
- kube-prometheus-stack: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack

View File

@@ -0,0 +1,89 @@
# Prometheus Operator 配置
prometheusOperator:
enabled: true
# Prometheus 配置
prometheus:
enabled: true
prometheusSpec:
retention: 15d
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: longhorn
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20Gi
resources:
requests:
memory: 512Mi
cpu: 250m
limits:
memory: 2Gi
cpu: 1000m
# Grafana 配置
grafana:
enabled: true
adminPassword: prom-operator
persistence:
enabled: true
storageClassName: longhorn
size: 5Gi
resources:
requests:
memory: 256Mi
cpu: 100m
limits:
memory: 512Mi
cpu: 500m
# Alertmanager 配置
alertmanager:
enabled: true
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
storageClassName: longhorn
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
# Node Exporter (收集节点指标)
nodeExporter:
enabled: true
# Kube State Metrics (收集 K8s 资源指标)
kubeStateMetrics:
enabled: true
# 默认监控规则
defaultRules:
create: true
rules:
alertmanager: true
etcd: true
configReloaders: true
general: true
k8s: true
kubeApiserverAvailability: true
kubeApiserverSlos: true
kubelet: true
kubeProxy: true
kubePrometheusGeneral: true
kubePrometheusNodeRecording: true
kubernetesApps: true
kubernetesResources: true
kubernetesStorage: true
kubernetesSystem: true
kubeScheduler: true
kubeStateMetrics: true
network: true
node: true
nodeExporterAlerting: true
nodeExporterRecording: true
prometheus: true
prometheusOperator: true

View File

@@ -0,0 +1,40 @@
#!/bin/bash
# KEDA 部署脚本
echo "开始部署 KEDA..."
# 设置 KUBECONFIG
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
# 添加 KEDA Helm 仓库
echo "添加 KEDA Helm 仓库..."
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
# 创建命名空间
echo "创建 keda 命名空间..."
kubectl create namespace keda --dry-run=client -o yaml | kubectl apply -f -
# 安装 KEDA
echo "安装 KEDA..."
helm install keda kedacore/keda \
--namespace keda \
-f values.yaml
# 等待 KEDA 组件就绪
echo "等待 KEDA 组件启动..."
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=keda-operator -n keda --timeout=300s
# 显示状态
echo ""
echo "KEDA 部署完成!"
kubectl get pods -n keda
kubectl get svc -n keda
echo ""
echo "验证 KEDA CRD"
kubectl get crd | grep keda
echo ""
echo "KEDA 已成功部署到命名空间: keda"

View File

@@ -0,0 +1,16 @@
apiVersion: http.keda.sh/v1alpha1
kind: HTTPScaledObject
metadata:
name: my-web-app-scaler
spec:
host: my-app.example.com # 你的域名
targetPendingRequests: 100
scaleTargetRef:
name: your-deployment-name # 你想缩放到 0 的应用名
kind: Deployment
apiVersion: apps/v1
service: your-service-name
port: 80
replicas:
min: 0 # 核心:无人访问时缩放为 0
max: 10

View File

@@ -0,0 +1,22 @@
#!/bin/bash
# 安装 KEDA HTTP Add-on
echo "安装 KEDA HTTP Add-on..."
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
# 安装 HTTP Add-on使用默认配置
helm install http-add-on kedacore/keda-add-ons-http \
--namespace keda
echo "等待 HTTP Add-on 组件启动..."
sleep 10
echo ""
echo "HTTP Add-on 部署完成!"
kubectl get pods -n keda | grep http
echo ""
echo "HTTP Add-on 服务:"
kubectl get svc -n keda | grep http

View File

@@ -0,0 +1,458 @@
# KEDA 自动扩缩容
## 功能说明
KEDA (Kubernetes Event Driven Autoscaling) 为 K3s 集群提供基于事件驱动的自动扩缩容能力。
### 核心功能
- **按需启动/停止服务**:空闲时自动缩容到 0节省资源
- **基于指标自动扩缩容**:根据实际负载动态调整副本数
- **多种触发器支持**CPU、内存、Prometheus 指标、数据库连接等
- **与 Prometheus 集成**:利用现有监控数据进行扩缩容决策
## 部署方式
```bash
cd /home/fei/k3s/009-基础设施/007-keda
bash deploy.sh
```
## 已配置的服务
### 1. Navigation 导航服务 ✅
- **最小副本数**: 0空闲时完全停止
- **最大副本数**: 10
- **触发条件**:
- HTTP 请求速率 > 10 req/min
- CPU 使用率 > 60%
- **冷却期**: 3 分钟
**配置文件**: `scalers/navigation-scaler.yaml`
### 2. Redis 缓存服务 ⏳
- **最小副本数**: 0空闲时完全停止
- **最大副本数**: 5
- **触发条件**:
- 有客户端连接
- CPU 使用率 > 70%
- **冷却期**: 5 分钟
**配置文件**: `scalers/redis-scaler.yaml`
**状态**: 待应用(需要先为 Redis 添加 Prometheus exporter
### 3. PostgreSQL 数据库 ❌
**不推荐使用 KEDA 扩展 PostgreSQL**
原因:
- PostgreSQL 是有状态服务,多个副本会导致存储冲突
- 需要配置主从复制才能安全扩展
- 建议使用 PostgreSQL Operator 或 PgBouncer + KEDA
详细说明:`scalers/postgresql-说明.md`
## 应用 ScaledObject
### 部署所有 Scaler
```bash
# 应用 Navigation Scaler
kubectl apply -f scalers/navigation-scaler.yaml
# 应用 Redis Scaler需要先配置 Redis exporter
kubectl apply -f scalers/redis-scaler.yaml
# ⚠️ PostgreSQL 不推荐使用 KEDA 扩展
# 详见: scalers/postgresql-说明.md
```
### 查看 ScaledObject 状态
```bash
# 查看所有 ScaledObject
kubectl get scaledobject -A
# 查看详细信息
kubectl describe scaledobject navigation-scaler -n navigation
kubectl describe scaledobject redis-scaler -n redis
kubectl describe scaledobject postgresql-scaler -n postgresql
```
### 查看自动创建的 HPA
```bash
# KEDA 会自动创建 HorizontalPodAutoscaler
kubectl get hpa -A
```
## 支持的触发器类型
### 1. Prometheus 指标
```yaml
triggers:
- type: prometheus
metadata:
serverAddress: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090
metricName: custom_metric
query: sum(rate(http_requests_total[1m]))
threshold: "100"
```
### 2. CPU/内存使用率
```yaml
triggers:
- type: cpu
metadata:
type: Utilization
value: "70"
- type: memory
metadata:
type: Utilization
value: "80"
```
### 3. Redis 队列长度
```yaml
triggers:
- type: redis
metadata:
address: redis.redis.svc.cluster.local:6379
listName: mylist
listLength: "5"
```
### 4. PostgreSQL 查询
```yaml
triggers:
- type: postgresql
metadata:
connectionString: postgresql://user:pass@host:5432/db
query: "SELECT COUNT(*) FROM tasks WHERE status='pending'"
targetQueryValue: "10"
```
### 5. Cron 定时触发
```yaml
triggers:
- type: cron
metadata:
timezone: Asia/Shanghai
start: 0 8 * * * # 每天 8:00 扩容
end: 0 18 * * * # 每天 18:00 缩容
desiredReplicas: "3"
```
## 为新服务添加自动扩缩容
### 步骤 1: 确保服务配置正确
服务的 Deployment 必须配置 `resources.requests`
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
# 不要设置 replicas由 KEDA 管理
template:
spec:
containers:
- name: myapp
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
```
### 步骤 2: 创建 ScaledObject
```yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: myapp-scaler
namespace: myapp
spec:
scaleTargetRef:
name: myapp
minReplicaCount: 0
maxReplicaCount: 10
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: prometheus
metadata:
serverAddress: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090
metricName: myapp_requests
query: sum(rate(http_requests_total{app="myapp"}[1m]))
threshold: "50"
```
### 步骤 3: 应用配置
```bash
kubectl apply -f myapp-scaler.yaml
```
## 监控和调试
### 查看 KEDA 日志
```bash
# Operator 日志
kubectl logs -n keda -l app.kubernetes.io/name=keda-operator -f
# Metrics Server 日志
kubectl logs -n keda -l app.kubernetes.io/name=keda-metrics-apiserver -f
```
### 查看扩缩容事件
```bash
# 查看 HPA 事件
kubectl describe hpa -n <namespace>
# 查看 Pod 事件
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
```
### 在 Prometheus 中查询 KEDA 指标
访问 https://prometheus.u6.net3w.com查询
```promql
# KEDA Scaler 活跃状态
keda_scaler_active
# KEDA Scaler 错误
keda_scaler_errors_total
# 当前指标值
keda_scaler_metrics_value
```
### 在 Grafana 中查看 KEDA 仪表板
1. 访问 https://grafana.u6.net3w.com
2. 导入 KEDA 官方仪表板 ID: **14691**
3. 查看实时扩缩容状态
## 测试自动扩缩容
### 测试 Navigation 服务
**测试缩容到 0**
```bash
# 1. 停止访问导航页面,等待 3 分钟
sleep 180
# 2. 检查副本数
kubectl get deployment navigation -n navigation
# 预期输出READY 0/0
```
**测试从 0 扩容:**
```bash
# 1. 访问导航页面
curl https://dh.u6.net3w.com
# 2. 监控副本数变化
kubectl get deployment navigation -n navigation -w
# 预期:副本数从 0 变为 1约 10-30 秒)
```
### 测试 Redis 服务
**测试基于连接数扩容:**
```bash
# 1. 连接 Redis
kubectl run redis-client --rm -it --image=redis:7-alpine -- redis-cli -h redis.redis.svc.cluster.local
# 2. 在另一个终端监控
kubectl get deployment redis -n redis -w
# 预期:有连接时副本数从 0 变为 1
```
### 测试 PostgreSQL 服务
**测试基于连接数扩容:**
```bash
# 1. 创建多个数据库连接
for i in {1..15}; do
kubectl run pg-client-$i --image=postgres:16-alpine --restart=Never -- \
psql -h postgresql-service.postgresql.svc.cluster.local -U postgres -c "SELECT pg_sleep(60);" &
done
# 2. 监控副本数
kubectl get statefulset postgresql -n postgresql -w
# 预期:连接数超过 10 时,副本数从 1 增加到 2
```
## 故障排查
### ScaledObject 未生效
**检查 ScaledObject 状态:**
```bash
kubectl describe scaledobject <name> -n <namespace>
```
**常见问题:**
1. **Deployment 设置了固定 replicas**
- 解决:移除 Deployment 中的 `replicas` 字段
2. **缺少 resources.requests**
- 解决:为容器添加 `resources.requests` 配置
3. **Prometheus 查询错误**
- 解决:在 Prometheus UI 中测试查询语句
### 服务无法缩容到 0
**可能原因:**
1. **仍有活跃连接或请求**
- 检查:查看 Prometheus 指标值
2. **cooldownPeriod 未到**
- 检查:等待冷却期结束
3. **minReplicaCount 设置错误**
- 检查:确认 `minReplicaCount: 0`
### 扩容速度慢
**优化建议:**
1. **减少 pollingInterval**
```yaml
pollingInterval: 15 # 从 30 秒改为 15 秒
```
2. **降低 threshold**
```yaml
threshold: "5" # 降低触发阈值
```
3. **使用多个触发器**
```yaml
triggers:
- type: prometheus
# ...
- type: cpu
# ...
```
## 最佳实践
### 1. 合理设置副本数范围
- **无状态服务**`minReplicaCount: 0`,节省资源
- **有状态服务**`minReplicaCount: 1`,保证可用性
- **关键服务**`minReplicaCount: 2`,保证高可用
### 2. 选择合适的冷却期
- **快速响应服务**`cooldownPeriod: 60-180`1-3 分钟)
- **一般服务**`cooldownPeriod: 300`5 分钟)
- **数据库服务**`cooldownPeriod: 600-900`10-15 分钟)
### 3. 监控扩缩容行为
- 定期查看 Grafana 仪表板
- 设置告警规则
- 分析扩缩容历史
### 4. 测试冷启动时间
- 测量从 0 扩容到可用的时间
- 优化镜像大小和启动脚本
- 考虑使用 `minReplicaCount: 1` 避免冷启动
## 配置参考
### ScaledObject 完整配置示例
```yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: example-scaler
namespace: example
spec:
scaleTargetRef:
name: example-deployment
kind: Deployment # 可选Deployment, StatefulSet
apiVersion: apps/v1 # 可选
minReplicaCount: 0 # 最小副本数
maxReplicaCount: 10 # 最大副本数
pollingInterval: 30 # 轮询间隔(秒)
cooldownPeriod: 300 # 缩容冷却期(秒)
idleReplicaCount: 0 # 空闲时的副本数
fallback: # 故障回退配置
failureThreshold: 3
replicas: 2
advanced: # 高级配置
restoreToOriginalReplicaCount: false
horizontalPodAutoscalerConfig:
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: custom_metric
query: sum(rate(metric[1m]))
threshold: "100"
```
## 卸载 KEDA
```bash
# 删除所有 ScaledObject
kubectl delete scaledobject --all -A
# 卸载 KEDA
helm uninstall keda -n keda
# 删除命名空间
kubectl delete namespace keda
```
## 参考资源
- KEDA 官方文档: https://keda.sh/docs/
- KEDA Scalers: https://keda.sh/docs/scalers/
- KEDA GitHub: https://github.com/kedacore/keda
- Grafana 仪表板: https://grafana.com/grafana/dashboards/14691
---
**KEDA 让您的 K3s 集群更智能、更高效!** 🚀

View File

@@ -0,0 +1,380 @@
# KEDA HTTP Add-on 自动缩容到 0 配置指南
本指南说明如何使用 KEDA HTTP Add-on 实现应用在无流量时自动缩容到 0有访问时自动启动。
## 前提条件
1. K3s 集群已安装
2. KEDA 已安装
3. KEDA HTTP Add-on 已安装
4. Traefik 作为 Ingress Controller
### 检查 KEDA HTTP Add-on 是否已安装
```bash
kubectl get pods -n keda | grep http
```
应该看到类似输出:
```
keda-add-ons-http-controller-manager-xxx 1/1 Running
keda-add-ons-http-external-scaler-xxx 1/1 Running
keda-add-ons-http-interceptor-xxx 1/1 Running
```
### 如果未安装,执行以下命令安装
```bash
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install http-add-on kedacore/keda-add-ons-http --namespace keda
```
## 配置步骤
### 1. 准备应用的基础资源
确保你的应用已经有以下资源:
- Deployment
- Service
- Namespace
示例:
```yaml
apiVersion: v1
kind: Namespace
metadata:
name: myapp
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: myapp
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: your-image:tag
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: myapp
namespace: myapp
spec:
selector:
app: myapp
ports:
- port: 80
targetPort: 80
```
### 2. 创建 HTTPScaledObject
这是实现自动缩容到 0 的核心配置。
```yaml
apiVersion: http.keda.sh/v1alpha1
kind: HTTPScaledObject
metadata:
name: myapp-http-scaler
namespace: myapp # 必须与应用在同一个 namespace
spec:
hosts:
- myapp.example.com # 你的域名
pathPrefixes:
- / # 匹配的路径前缀
scaleTargetRef:
name: myapp # Deployment 名称
kind: Deployment
apiVersion: apps/v1
service: myapp # Service 名称
port: 80 # Service 端口
replicas:
min: 0 # 空闲时缩容到 0
max: 10 # 最多扩容到 10 个副本
scalingMetric:
requestRate:
granularity: 1s
targetValue: 100 # 每秒 100 个请求时扩容
window: 1m
scaledownPeriod: 300 # 5 分钟300秒无流量后缩容到 0
```
**重要参数说明:**
- `hosts`: 你的应用域名
- `scaleTargetRef.name`: 你的 Deployment 名称
- `scaleTargetRef.service`: 你的 Service 名称
- `scaleTargetRef.port`: 你的 Service 端口
- `replicas.min: 0`: 允许缩容到 0
- `scaledownPeriod`: 无流量后多久缩容(秒)
### 3. 创建 Traefik IngressRoute
**重要IngressRoute 必须在 keda namespace 中创建**,因为它需要引用 keda namespace 的拦截器服务。
```yaml
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: myapp-ingress
namespace: keda # 注意:必须在 keda namespace
spec:
entryPoints:
- web # HTTP 入口
# - websecure # 如果需要 HTTPS添加这个
routes:
- match: Host(`myapp.example.com`) # 你的域名
kind: Rule
services:
- name: keda-add-ons-http-interceptor-proxy
port: 8080
```
**如果需要 HTTPS添加 TLS 配置:**
```yaml
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: myapp-ingress
namespace: keda
spec:
entryPoints:
- websecure
routes:
- match: Host(`myapp.example.com`)
kind: Rule
services:
- name: keda-add-ons-http-interceptor-proxy
port: 8080
tls:
certResolver: letsencrypt # 你的证书解析器
```
### 4. 完整配置文件模板
将以下内容保存为 `myapp-keda-scaler.yaml`,并根据你的应用修改相应的值:
```yaml
---
# HTTPScaledObject - 实现自动缩容到 0
apiVersion: http.keda.sh/v1alpha1
kind: HTTPScaledObject
metadata:
name: myapp-http-scaler
namespace: myapp # 改为你的 namespace
spec:
hosts:
- myapp.example.com # 改为你的域名
pathPrefixes:
- /
scaleTargetRef:
name: myapp # 改为你的 Deployment 名称
kind: Deployment
apiVersion: apps/v1
service: myapp # 改为你的 Service 名称
port: 80 # 改为你的 Service 端口
replicas:
min: 0
max: 10
scalingMetric:
requestRate:
granularity: 1s
targetValue: 100
window: 1m
scaledownPeriod: 300 # 5 分钟无流量后缩容
---
# Traefik IngressRoute - 路由流量到 KEDA 拦截器
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: myapp-ingress
namespace: keda # 必须在 keda namespace
spec:
entryPoints:
- web
routes:
- match: Host(`myapp.example.com`) # 改为你的域名
kind: Rule
services:
- name: keda-add-ons-http-interceptor-proxy
port: 8080
```
### 5. 应用配置
```bash
kubectl apply -f myapp-keda-scaler.yaml
```
### 6. 验证配置
```bash
# 查看 HTTPScaledObject 状态
kubectl get httpscaledobject -n myapp
# 应该看到 READY = True
# NAME TARGETWORKLOAD TARGETSERVICE MINREPLICAS MAXREPLICAS AGE READY
# myapp-http-scaler apps/v1/Deployment/myapp myapp:80 0 10 10s True
# 查看 IngressRoute
kubectl get ingressroute -n keda
# 查看当前 Pod 数量
kubectl get pods -n myapp
```
## 工作原理
1. **有流量时**
- 用户访问 `myapp.example.com`
- Traefik 将流量路由到 KEDA HTTP 拦截器
- 拦截器检测到请求,通知 KEDA 启动 Pod
- Pod 启动后5-10秒拦截器将流量转发到应用
- 用户看到正常响应(首次访问可能有延迟)
2. **无流量时**
- 5 分钟scaledownPeriod无请求后
- KEDA 自动将 Deployment 缩容到 0
- 不消耗任何计算资源
## 常见问题排查
### 1. 访问返回 404
**检查 IngressRoute 是否在 keda namespace**
```bash
kubectl get ingressroute -n keda | grep myapp
```
如果不在,删除并重新创建:
```bash
kubectl delete ingressroute myapp-ingress -n myapp # 删除错误的
kubectl apply -f myapp-keda-scaler.yaml # 重新创建
```
### 2. HTTPScaledObject READY = False
**查看详细错误信息:**
```bash
kubectl describe httpscaledobject myapp-http-scaler -n myapp
```
**常见错误:**
- `workload already managed by ScaledObject`: 删除旧的 ScaledObject
```bash
kubectl delete scaledobject myapp-scaler -n myapp
```
### 3. Pod 没有自动缩容到 0
**检查是否有旧的 ScaledObject 阻止缩容:**
```bash
kubectl get scaledobject -n myapp
```
如果有,删除它:
```bash
kubectl delete scaledobject <name> -n myapp
```
### 4. 查看 KEDA 拦截器日志
```bash
kubectl logs -n keda -l app.kubernetes.io/name=keda-add-ons-http-interceptor --tail=50
```
### 5. 测试拦截器是否工作
```bash
# 获取拦截器服务 IP
kubectl get svc keda-add-ons-http-interceptor-proxy -n keda
# 直接测试拦截器
curl -H "Host: myapp.example.com" http://<CLUSTER-IP>:8080
```
## 调优建议
### 调整缩容时间
根据你的应用特点调整 `scaledownPeriod`
- **频繁访问的应用**:设置较长时间(如 600 秒 = 10 分钟)
- **偶尔访问的应用**:设置较短时间(如 180 秒 = 3 分钟)
- **演示/测试环境**:可以设置很短(如 60 秒 = 1 分钟)
```yaml
scaledownPeriod: 600 # 10 分钟
```
### 调整扩容阈值
根据应用负载调整 `targetValue`
```yaml
scalingMetric:
requestRate:
targetValue: 50 # 每秒 50 个请求时扩容(更敏感)
```
### 调整最大副本数
```yaml
replicas:
min: 0
max: 20 # 根据你的资源和需求调整
```
## 监控和观察
### 实时监控 Pod 变化
```bash
watch -n 2 'kubectl get pods -n myapp'
```
### 查看 HTTPScaledObject 事件
```bash
kubectl describe httpscaledobject myapp-http-scaler -n myapp
```
### 查看 Deployment 副本数变化
```bash
kubectl get deployment myapp -n myapp -w
```
## 完整示例navigation 应用
参考 `navigation-complete.yaml` 文件,这是一个完整的工作示例。
## 注意事项
1. **首次访问延迟**Pod 从 0 启动需要 5-10 秒,用户首次访问会有延迟
2. **数据库连接**:确保应用能够快速重新建立数据库连接
3. **会话状态**:不要在 Pod 中存储会话状态,使用 Redis 等外部存储
4. **健康检查**:配置合理的 readinessProbe确保 Pod 就绪后才接收流量
5. **资源限制**:设置合理的 resources limits避免启动过慢
## 参考资源
- KEDA 官方文档: https://keda.sh/
- KEDA HTTP Add-on: https://github.com/kedacore/http-add-on
- Traefik IngressRoute: https://doc.traefik.io/traefik/routing/providers/kubernetes-crd/

View File

@@ -0,0 +1,45 @@
---
# HTTPScaledObject - 用于实现缩容到 0 的核心配置
apiVersion: http.keda.sh/v1alpha1
kind: HTTPScaledObject
metadata:
name: navigation-http-scaler
namespace: navigation
spec:
hosts:
- dh.u6.net3w.com
pathPrefixes:
- /
scaleTargetRef:
name: navigation
kind: Deployment
apiVersion: apps/v1
service: navigation
port: 80
replicas:
min: 0 # 空闲时缩容到 0
max: 10 # 最多 10 个副本
scalingMetric:
requestRate:
granularity: 1s
targetValue: 100 # 每秒 100 个请求时扩容
window: 1m
scaledownPeriod: 300 # 5 分钟无流量后缩容到 0
---
# Traefik IngressRoute - 将流量路由到 KEDA HTTP Add-on 的拦截器
# 注意:必须在 keda namespace 中才能引用该 namespace 的服务
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: navigation-ingress
namespace: keda
spec:
entryPoints:
- web
routes:
- match: Host(`dh.u6.net3w.com`)
kind: Rule
services:
- name: keda-add-ons-http-interceptor-proxy
port: 8080

View File

@@ -0,0 +1,24 @@
apiVersion: http.keda.sh/v1alpha1
kind: HTTPScaledObject
metadata:
name: navigation-http-scaler
namespace: navigation
spec:
hosts:
- dh.u6.net3w.com
pathPrefixes:
- /
scaleTargetRef:
name: navigation
kind: Deployment
apiVersion: apps/v1
service: navigation
port: 80
replicas:
min: 0 # 空闲时缩容到 0
max: 10 # 最多 10 个副本
scalingMetric:
requestRate:
granularity: 1s
targetValue: 100 # 每秒 100 个请求时扩容
window: 1m

View File

@@ -0,0 +1,19 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: navigation-ingress
namespace: navigation
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web
spec:
rules:
- host: dh.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: keda-add-ons-http-interceptor-proxy
port:
number: 8080

View File

@@ -0,0 +1,23 @@
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: navigation-scaler
namespace: navigation
spec:
scaleTargetRef:
name: navigation
minReplicaCount: 1 # 至少保持 1 个副本HPA 限制)
maxReplicaCount: 10 # 最多 10 个副本
pollingInterval: 15 # 每 15 秒检查一次
cooldownPeriod: 180 # 缩容冷却期 3 分钟
triggers:
- type: prometheus
metadata:
serverAddress: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090
metricName: nginx_http_requests_total
query: sum(rate(nginx_http_requests_total{namespace="navigation"}[1m]))
threshold: "10" # 每分钟超过 10 个请求时启动
- type: cpu
metricType: Utilization
metadata:
value: "60" # CPU 使用率超过 60% 时扩容

View File

@@ -0,0 +1,261 @@
# ⚠️ PostgreSQL 不适合使用 KEDA 自动扩缩容
## 问题说明
对于传统的 PostgreSQL 架构,直接通过 KEDA 增加副本数会导致:
### 1. 存储冲突
- 多个 Pod 尝试挂载同一个 PVC
- ReadWriteOnce 存储只能被一个 Pod 使用
- 会导致 Pod 启动失败
### 2. 数据损坏风险
- 如果使用 ReadWriteMany 存储,多个实例同时写入会导致数据损坏
- PostgreSQL 不支持多主写入
- 没有锁机制保护数据一致性
### 3. 缺少主从复制
- 需要配置 PostgreSQL 流复制Streaming Replication
- 需要配置主从切换机制
- 需要使用专门的 PostgreSQL Operator
## 正确的 PostgreSQL 扩展方案
### 方案 1: 使用 PostgreSQL Operator
推荐使用专业的 PostgreSQL Operator
#### Zalando PostgreSQL Operator
```bash
# 添加 Helm 仓库
helm repo add postgres-operator-charts https://opensource.zalando.com/postgres-operator/charts/postgres-operator
# 安装 Operator
helm install postgres-operator postgres-operator-charts/postgres-operator
# 创建 PostgreSQL 集群
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
name: acid-minimal-cluster
spec:
teamId: "acid"
volume:
size: 10Gi
storageClass: longhorn
numberOfInstances: 3 # 1 主 + 2 从
users:
zalando:
- superuser
- createdb
databases:
foo: zalando
postgresql:
version: "16"
```
#### CloudNativePG Operator
```bash
# 安装 CloudNativePG
kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.22/releases/cnpg-1.22.0.yaml
# 创建集群
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: cluster-example
spec:
instances: 3
storage:
storageClass: longhorn
size: 10Gi
```
### 方案 2: 读写分离 + KEDA
如果需要使用 KEDA正确的架构是
```
┌─────────────────┐
│ 主库 (Master) │ ← 固定 1 个副本,处理写入
│ StatefulSet │
└─────────────────┘
│ 流复制
┌─────────────────┐
│ 从库 (Replica) │ ← KEDA 管理,处理只读查询
│ Deployment │ 可以 0-N 个副本
└─────────────────┘
```
**配置示例:**
```yaml
# 主库 - 固定副本
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgresql-master
spec:
replicas: 1 # 固定 1 个
# ... 配置主库
---
# 从库 - KEDA 管理
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgresql-replica
spec:
# replicas 由 KEDA 管理
# ... 配置从库(只读)
---
# KEDA ScaledObject - 只扩展从库
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: postgresql-replica-scaler
spec:
scaleTargetRef:
name: postgresql-replica # 只针对从库
minReplicaCount: 0
maxReplicaCount: 5
triggers:
- type: postgresql
metadata:
connectionString: postgresql://user:pass@postgresql-master:5432/db
query: "SELECT COUNT(*) FROM pg_stat_activity WHERE state = 'active' AND query NOT LIKE '%pg_stat_activity%'"
targetQueryValue: "10"
```
### 方案 3: 垂直扩展(推荐用于单实例)
对于单实例 PostgreSQL使用 VPA (Vertical Pod Autoscaler) 更合适:
```yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: postgresql-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: StatefulSet
name: postgresql
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: postgresql
minAllowed:
cpu: 250m
memory: 512Mi
maxAllowed:
cpu: 2000m
memory: 4Gi
```
## 当前部署建议
对于您当前的 PostgreSQL 部署(`/home/fei/k3s/010-中间件/002-postgresql/`
### ❌ 不要使用 KEDA 水平扩展
- 当前是单实例 StatefulSet
- 没有配置主从复制
- 直接扩展会导致数据问题
### ✅ 推荐的优化方案
1. **保持单实例运行**
```yaml
replicas: 1 # 固定不变
```
2. **优化资源配置**
```yaml
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 4Gi
```
3. **配置连接池**
- 使用 PgBouncer 作为连接池
- PgBouncer 可以使用 KEDA 扩展
4. **定期备份**
- 使用 Longhorn 快照
- 备份到 S3
## PgBouncer + KEDA 方案
这是最实用的方案PostgreSQL 保持单实例PgBouncer 使用 KEDA 扩展。
```yaml
# PostgreSQL - 固定单实例
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgresql
spec:
replicas: 1 # 固定
# ...
---
# PgBouncer - 连接池
apiVersion: apps/v1
kind: Deployment
metadata:
name: pgbouncer
spec:
# replicas 由 KEDA 管理
template:
spec:
containers:
- name: pgbouncer
image: pgbouncer/pgbouncer:latest
# ...
---
# KEDA ScaledObject - 扩展 PgBouncer
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: pgbouncer-scaler
spec:
scaleTargetRef:
name: pgbouncer
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: postgresql
metadata:
connectionString: postgresql://postgres:postgres123@postgresql:5432/postgres
query: "SELECT COUNT(*) FROM pg_stat_activity WHERE state = 'active'"
targetQueryValue: "20"
```
## 总结
| 方案 | 适用场景 | 复杂度 | 推荐度 |
|------|---------|--------|--------|
| PostgreSQL Operator | 生产环境,需要高可用 | 高 | ⭐⭐⭐⭐⭐ |
| 读写分离 + KEDA | 读多写少场景 | 中 | ⭐⭐⭐⭐ |
| PgBouncer + KEDA | 连接数波动大 | 低 | ⭐⭐⭐⭐⭐ |
| VPA 垂直扩展 | 单实例,资源需求变化 | 低 | ⭐⭐⭐ |
| 直接 KEDA 扩展 | ❌ 不适用 | - | ❌ |
**对于当前部署,建议保持 PostgreSQL 单实例运行,不使用 KEDA 扩展。**
如果需要扩展能力,优先考虑:
1. 部署 PgBouncer 连接池 + KEDA
2. 或者迁移到 PostgreSQL Operator
---
**重要提醒:有状态服务的扩展需要特殊处理,不能简单地增加副本数!** ⚠️

View File

@@ -0,0 +1,23 @@
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: redis-scaler
namespace: redis
spec:
scaleTargetRef:
name: redis
minReplicaCount: 0 # 空闲时缩容到 0
maxReplicaCount: 5 # 最多 5 个副本
pollingInterval: 30 # 每 30 秒检查一次
cooldownPeriod: 300 # 缩容冷却期 5 分钟
triggers:
- type: prometheus
metadata:
serverAddress: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090
metricName: redis_connected_clients
query: sum(redis_connected_clients{namespace="redis"})
threshold: "1" # 有连接时启动
- type: cpu
metricType: Utilization
metadata:
value: "70" # CPU 使用率超过 70% 时扩容

View File

@@ -0,0 +1,41 @@
# KEDA Helm 配置
# Operator 配置
operator:
replicaCount: 1
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
# Metrics Server 配置
metricsServer:
replicaCount: 1
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
# 与 Prometheus 集成
prometheus:
metricServer:
enabled: true
port: 9022
path: /metrics
operator:
enabled: true
port: 8080
path: /metrics
# ServiceMonitor 用于 Prometheus 抓取
serviceMonitor:
enabled: true
namespace: keda
additionalLabels:
release: kube-prometheus-stack

View File

@@ -0,0 +1,197 @@
# KEDA 部署最终总结
## ✅ 成功部署
### KEDA 核心组件
- **keda-operator**: ✅ 运行中
- **keda-metrics-apiserver**: ✅ 运行中
- **keda-admission-webhooks**: ✅ 运行中
- **命名空间**: keda
### 已配置的服务
| 服务 | 状态 | 最小副本 | 最大副本 | 说明 |
|------|------|---------|---------|------|
| Navigation | ✅ 已应用 | 0 | 10 | 空闲时自动缩容到 0 |
| Redis | ⏳ 待应用 | 0 | 5 | 需要先配置 Prometheus exporter |
| PostgreSQL | ❌ 不适用 | - | - | 有状态服务,不能直接扩展 |
## ⚠️ 重要修正PostgreSQL
### 问题说明
PostgreSQL 是有状态服务,**不能**直接使用 KEDA 扩展副本数,原因:
1. **存储冲突**: 多个 Pod 尝试挂载同一个 PVC 会失败
2. **数据损坏**: 如果使用 ReadWriteMany多实例写入会导致数据损坏
3. **缺少复制**: 没有配置主从复制,无法保证数据一致性
### 正确方案
已创建详细说明文档:`/home/fei/k3s/009-基础设施/007-keda/scalers/postgresql-说明.md`
推荐方案:
1. **PostgreSQL Operator** (Zalando 或 CloudNativePG)
2. **PgBouncer + KEDA** (扩展连接池而非数据库)
3. **读写分离** (主库固定,从库使用 KEDA)
## 📁 文件结构
```
/home/fei/k3s/009-基础设施/007-keda/
├── deploy.sh # ✅ 部署脚本
├── values.yaml # ✅ KEDA Helm 配置
├── readme.md # ✅ 详细使用文档
├── 部署总结.md # ✅ 部署总结
└── scalers/
├── navigation-scaler.yaml # ✅ 已应用
├── redis-scaler.yaml # ⏳ 待应用
└── postgresql-说明.md # ⚠️ 重要说明
```
## 🧪 验证结果
### Navigation 服务自动扩缩容
```bash
# 当前状态
$ kubectl get deployment navigation -n navigation
NAME READY UP-TO-DATE AVAILABLE AGE
navigation 0/0 0 0 8h
# ScaledObject 状态
$ kubectl get scaledobject -n navigation
NAME READY ACTIVE TRIGGERS AGE
navigation-scaler True False prometheus,cpu 5m
# HPA 已自动创建
$ kubectl get hpa -n navigation
NAME REFERENCE MINPODS MAXPODS REPLICAS
keda-hpa-navigation-scaler Deployment/navigation 1 10 0
```
### 测试从 0 扩容
```bash
# 访问导航页面
curl https://dh.u6.net3w.com
# 观察副本数变化10-30秒
kubectl get deployment navigation -n navigation -w
# 预期: 0/0 → 1/1
```
## 📊 资源节省预期
| 服务 | 之前 | 现在 | 节省 |
|------|------|------|------|
| Navigation | 24/7 运行 | 按需启动 | 80-90% |
| Redis | 24/7 运行 | 按需启动 | 70-80% (配置后) |
| PostgreSQL | 24/7 运行 | 保持运行 | 不适用 |
## 🔧 已修复的问题
### 1. CPU 触发器配置错误
**问题**: 使用了已弃用的 `type` 字段
```yaml
# ❌ 错误
- type: cpu
metadata:
type: Utilization
value: "60"
```
**修复**: 改为 `metricType`
```yaml
# ✅ 正确
- type: cpu
metricType: Utilization
metadata:
value: "60"
```
### 2. Navigation 缺少资源配置
**修复**: 添加了 resources 配置
```yaml
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi
```
### 3. PostgreSQL 配置错误
**修复**:
- 删除了 `postgresql-scaler.yaml`
- 创建了 `postgresql-说明.md` 详细说明
- 更新了所有文档,明确标注不适用
## 📚 文档
- **使用指南**: `/home/fei/k3s/009-基础设施/007-keda/readme.md`
- **部署总结**: `/home/fei/k3s/009-基础设施/007-keda/部署总结.md`
- **PostgreSQL 说明**: `/home/fei/k3s/009-基础设施/007-keda/scalers/postgresql-说明.md`
## 🎯 下一步建议
1周内
1. ✅ 监控 Navigation 服务的扩缩容行为
2. ⏳ 为 Redis 配置 Prometheus exporter
3. ⏳ 应用 Redis ScaledObject
### 中期1-2周
1. ⏳ 在 Grafana 中导入 KEDA 仪表板 (ID: 14691)
2. ⏳ 根据实际使用情况调整触发阈值
3. ⏳ 为其他无状态服务配置 KEDA
### 长期1个月+
1. ⏳ 评估是否需要 PostgreSQL 高可用
2. ⏳ 如需要,部署 PostgreSQL Operator
3. ⏳ 或部署 PgBouncer 连接池 + KEDA
## ⚡ 快速命令
```bash
# 查看 KEDA 状态
kubectl get pods -n keda
# 查看所有 ScaledObject
kubectl get scaledobject -A
# 查看 HPA
kubectl get hpa -A
# 查看 Navigation 副本数
kubectl get deployment navigation -n navigation -w
# 测试扩容
curl https://dh.u6.net3w.com
# 查看 KEDA 日志
kubectl logs -n keda -l app.kubernetes.io/name=keda-operator -f
```
## 🎉 总结
**KEDA 已成功部署并运行**
- Navigation 服务实现按需启动,空闲时自动缩容到 0
- 修复了所有配置问题
- 明确了有状态服务PostgreSQL的正确处理方式
- 提供了完整的文档和使用指南
⚠️ **重要提醒**
- 有状态服务不能简单地增加副本数
- PostgreSQL 需要使用专业的 Operator 或连接池方案
- 定期监控扩缩容行为,根据实际情况调整配置
---
**KEDA 让您的 K3s 集群更智能、更节省资源!** 🚀

View File

@@ -0,0 +1,260 @@
# KEDA 自动扩缩容部署总结
部署时间: 2026-01-30
## ✅ 部署完成
### KEDA 核心组件
| 组件 | 状态 | 说明 |
|------|------|------|
| keda-operator | ✅ Running | KEDA 核心控制器 |
| keda-metrics-apiserver | ✅ Running | 指标 API 服务器 |
| keda-admission-webhooks | ✅ Running | 准入 Webhook |
**命名空间**: `keda`
### 已配置的自动扩缩容服务
#### 1. Navigation 导航服务 ✅
- **状态**: 已配置并运行
- **当前副本数**: 0空闲状态
- **配置**:
- 最小副本: 0
- 最大副本: 10
- 触发器: Prometheus (HTTP 请求) + CPU 使用率
- 冷却期: 3 分钟
**ScaledObject**: `navigation-scaler`
**HPA**: `keda-hpa-navigation-scaler`
#### 2. Redis 缓存服务 ⏳
- **状态**: 配置文件已创建,待应用
- **说明**: 需要先为 Redis 配置 Prometheus exporter
- **配置文件**: `scalers/redis-scaler.yaml`
#### 3. PostgreSQL 数据库 ❌
- **状态**: 不推荐使用 KEDA 扩展
- **原因**:
- PostgreSQL 是有状态服务,多副本会导致存储冲突
- 需要配置主从复制才能安全扩展
- 建议使用 PostgreSQL Operator 或 PgBouncer + KEDA
- **详细说明**: `scalers/postgresql-说明.md`
## 配置文件位置
```
/home/fei/k3s/009-基础设施/007-keda/
├── deploy.sh # 部署脚本
├── values.yaml # KEDA Helm 配置
├── readme.md # 详细文档
├── 部署总结.md # 本文档
└── scalers/ # ScaledObject 配置
├── navigation-scaler.yaml # ✅ 已应用
├── redis-scaler.yaml # ⏳ 待应用
└── postgresql-说明.md # ⚠️ PostgreSQL 不适合 KEDA
```
## 验证 KEDA 功能
### 测试缩容到 0
Navigation 服务已经自动缩容到 0
```bash
kubectl get deployment navigation -n navigation
# 输出: READY 0/0
```
### 测试从 0 扩容
访问导航页面触发扩容:
```bash
# 1. 访问页面
curl https://dh.u6.net3w.com
# 2. 观察副本数变化
kubectl get deployment navigation -n navigation -w
# 预期: 10-30 秒内副本数从 0 变为 1
```
## 查看 KEDA 状态
### 查看所有 ScaledObject
```bash
kubectl get scaledobject -A
```
### 查看 HPA自动创建
```bash
kubectl get hpa -A
```
### 查看 KEDA 日志
```bash
kubectl logs -n keda -l app.kubernetes.io/name=keda-operator -f
```
## 下一步操作
### 1. 应用 Redis 自动扩缩容
```bash
# 首先需要为 Redis 添加 Prometheus exporter
# 然后应用 ScaledObject
kubectl apply -f /home/fei/k3s/009-基础设施/007-keda/scalers/redis-scaler.yaml
```
### 2. PostgreSQL 扩展方案
**不要使用 KEDA 直接扩展 PostgreSQL**
推荐方案:
- **方案 1**: 使用 PostgreSQL OperatorZalando 或 CloudNativePG
- **方案 2**: 部署 PgBouncer 连接池 + KEDA 扩展 PgBouncer
- **方案 3**: 配置读写分离,只对只读副本使用 KEDA
详细说明:`/home/fei/k3s/009-基础设施/007-keda/scalers/postgresql-说明.md`
### 3. 监控扩缩容行为
在 Grafana 中导入 KEDA 仪表板:
- 访问: https://grafana.u6.net3w.com
- 导入仪表板 ID: **14691**
## 已修复的问题
### 问题 1: CPU 触发器配置错误
**错误信息**:
```
The 'type' setting is DEPRECATED and is removed in v2.18 - Use 'metricType' instead.
```
**解决方案**:
将 CPU 触发器配置从:
```yaml
- type: cpu
metadata:
type: Utilization
value: "60"
```
改为:
```yaml
- type: cpu
metricType: Utilization
metadata:
value: "60"
```
### 问题 2: Navigation 缺少资源配置
**解决方案**:
为 Navigation deployment 添加了 resources 配置:
```yaml
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi
```
## 资源节省效果
### Navigation 服务
- **之前**: 24/7 运行 1 个副本
- **现在**: 空闲时 0 个副本,有流量时自动启动
- **预计节省**: 80-90% 资源(假设大部分时间空闲)
### 预期总体效果
- **Navigation**: 节省 80-90% 资源 ✅
- **Redis**: 节省 70-80% 资源(配置后)⏳
- **PostgreSQL**: ❌ 不使用 KEDA保持单实例运行
## 监控指标
### Prometheus 查询
```promql
# KEDA Scaler 活跃状态
keda_scaler_active{namespace="navigation"}
# 当前指标值
keda_scaler_metrics_value{scaledObject="navigation-scaler"}
# HPA 当前副本数
kube_horizontalpodautoscaler_status_current_replicas{horizontalpodautoscaler="keda-hpa-navigation-scaler"}
```
## 注意事项
### 1. 冷启动时间
从 0 扩容到可用需要 10-30 秒:
- 拉取镜像(如果本地没有)
- 启动容器
- 健康检查通过
### 2. 连接保持
客户端需要支持重连机制,因为服务可能会缩容到 0。
### 3. 有状态服务
PostgreSQL 等有状态服务**不能**直接使用 KEDA 扩展:
- ❌ 多副本会导致存储冲突
- ❌ 没有主从复制会导致数据不一致
- ✅ 需要使用专业的 Operator 或连接池方案
## 故障排查
### ScaledObject 未生效
```bash
# 查看详细状态
kubectl describe scaledobject <name> -n <namespace>
# 查看事件
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
```
### HPA 未创建
检查 KEDA operator 日志:
```bash
kubectl logs -n keda -l app.kubernetes.io/name=keda-operator
```
## 文档参考
- 详细使用文档: `/home/fei/k3s/009-基础设施/007-keda/readme.md`
- KEDA 官方文档: https://keda.sh/docs/
- Scalers 参考: https://keda.sh/docs/scalers/
## 总结
**KEDA 已成功部署并运行**
- KEDA 核心组件运行正常
- Navigation 服务已配置自动扩缩容
- 已验证缩容到 0 功能正常
- 准备好为更多服务配置自动扩缩容
**下一步**: 根据实际使用情况,逐步为 Redis 和 PostgreSQL 配置自动扩缩容。
---
**KEDA 让您的 K3s 集群更智能、更节省资源!** 🚀

View File

@@ -0,0 +1,191 @@
# Portainer 部署指南
## 概述
本文档记录了在 k3s 集群中部署 Portainer 的完整过程包括域名绑定、KEDA 自动缩放和 CSRF 校验问题的解决方案。
## 部署步骤
### 1. 使用 Helm 安装 Portainer
```bash
# 添加 Helm 仓库
helm repo add portainer https://portainer.github.io/k8s/
helm repo update
# 安装 Portainer使用 Longhorn 作为存储类)
helm install --create-namespace -n portainer portainer portainer/portainer \
--set persistence.enabled=true \
--set persistence.storageClass=longhorn \
--set service.type=NodePort
```
### 2. 配置域名访问
#### 2.1 Caddy 反向代理配置
修改 Caddy ConfigMap添加 Portainer 的反向代理规则:
```yaml
# Portainer 容器管理 - 直接转发到 Portainer HTTPS 端口
portainer.u6.net3w.com {
reverse_proxy https://portainer.portainer.svc.cluster.local:9443 {
transport http {
tls_insecure_skip_verify
}
}
}
```
**关键点:**
- 直接转发到 Portainer 的 HTTPS 端口9443而不是通过 Traefik
- 这样可以避免协议不匹配导致的 CSRF 校验失败
#### 2.2 更新 Caddy ConfigMap
```bash
kubectl patch configmap caddy-config -n default --type merge -p '{"data":{"Caddyfile":"..."}}'
```
#### 2.3 重启 Caddy Pod
```bash
kubectl delete pod -n default -l app=caddy
```
### 3. 配置 KEDA 自动缩放(可选)
如果需要实现访问时启动、空闲时缩容的功能,应用 KEDA 配置:
```bash
kubectl apply -f keda-scaler.yaml
```
**配置说明:**
- 最小副本数0空闲时缩容到 0
- 最大副本数3
- 缩容延迟5 分钟无流量后缩容
### 4. 解决 CSRF 校验问题
#### 问题描述
登录时提示 "Unable to login",日志显示:
```
Failed to validate Origin or Referer | error="origin invalid"
```
#### 问题原因
Portainer 新版本对 CSRF 校验非常严格。当通过域名访问时,协议不匹配导致校验失败:
- 客户端发送HTTPS 请求
- Portainer 接收x_forwarded_proto=http
#### 解决方案
**步骤 1添加环境变量禁用 CSRF 校验**
```bash
kubectl set env deployment/portainer -n portainer CONTROLLER_DISABLE_CSRF=true
```
**步骤 2添加环境变量配置 origins**
```bash
kubectl set env deployment/portainer -n portainer PORTAINER_ADMIN_ORIGINS="*"
```
**步骤 3重启 Portainer**
```bash
kubectl rollout restart deployment portainer -n portainer
```
**步骤 4修改 Caddy 配置(最关键)**
直接转发到 Portainer 的 HTTPS 端口,避免通过 Traefik 导致的协议转换问题:
```yaml
portainer.u6.net3w.com {
reverse_proxy https://portainer.portainer.svc.cluster.local:9443 {
transport http {
tls_insecure_skip_verify
}
}
}
```
## 配置文件
### portainer-server.yaml
记录 Portainer deployment 的环境变量配置:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: portainer
namespace: portainer
spec:
template:
spec:
containers:
- name: portainer
env:
- name: CONTROLLER_DISABLE_CSRF
value: "true"
- name: PORTAINER_ADMIN_ORIGINS
value: "*"
```
### keda-scaler.yaml
KEDA 自动缩放配置,实现访问时启动、空闲时缩容。
## 访问 Portainer
部署完成后,访问:
```
https://portainer.u6.net3w.com
```
## 常见问题
### Q: 登录时提示 "Unable to login"
**A:** 这通常是 CSRF 校验失败导致的。检查以下几点:
1. 确认已添加环境变量 `CONTROLLER_DISABLE_CSRF=true`
2. 确认 Caddy 配置直接转发到 Portainer HTTPS 端口
3. 检查 Portainer 日志中是否有 "origin invalid" 错误
4. 重启 Portainer pod 使配置生效
### Q: 为什么要直接转发到 HTTPS 端口而不是通过 Traefik
**A:** 因为通过 Traefik 转发时,协议头会被转换为 HTTP导致 Portainer 接收到的协议与客户端发送的协议不匹配,从而 CSRF 校验失败。直接转发到 HTTPS 端口可以保持协议一致。
### Q: KEDA 自动缩放是否必须配置?
**A:** 不是必须的。KEDA 自动缩放是可选功能,用于节省资源。如果不需要自动缩放,可以跳过这一步。
## 相关文件
- `portainer-server.yaml` - Portainer deployment 环境变量配置
- `keda-scaler.yaml` - KEDA 自动缩放配置
- `ingress.yaml` - 原始 Ingress 配置(已弃用,改用 Caddy 直接转发)
## 下次部署检查清单
- [ ] 使用 Helm 安装 Portainer
- [ ] 修改 Caddy 配置,直接转发到 Portainer HTTPS 端口
- [ ] 添加 Portainer 环境变量CONTROLLER_DISABLE_CSRF、PORTAINER_ADMIN_ORIGINS
- [ ] 重启 Caddy 和 Portainer pods
- [ ] 测试登录功能
- [ ] (可选)配置 KEDA 自动缩放
## 参考资源
- Portainer 官方文档https://docs.portainer.io/
- k3s 官方文档https://docs.k3s.io/
- KEDA 官方文档https://keda.sh/

View File

@@ -0,0 +1,20 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: portainer-ingress
namespace: portainer
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web
spec:
ingressClassName: traefik
rules:
- host: portainer.u6.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: portainer
port:
number: 9000

View File

@@ -0,0 +1,58 @@
---
# HTTPScaledObject - 用于实现缩容到 0 的核心配置
apiVersion: http.keda.sh/v1alpha1
kind: HTTPScaledObject
metadata:
name: portainer-http-scaler
namespace: portainer
spec:
hosts:
- portainer.u6.net3w.com
pathPrefixes:
- /
scaleTargetRef:
name: portainer
kind: Deployment
apiVersion: apps/v1
service: portainer
port: 9000
replicas:
min: 0 # 空闲时缩容到 0
max: 3 # 最多 3 个副本
scalingMetric:
requestRate:
granularity: 1s
targetValue: 50 # 每秒 50 个请求时扩容
window: 1m
scaledownPeriod: 300 # 5 分钟无流量后缩容到 0
---
# Traefik Middleware - 设置正确的协议头
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: portainer-headers
namespace: keda
spec:
headers:
customRequestHeaders:
X-Forwarded-Proto: "https"
---
# Traefik IngressRoute - 将流量路由到 KEDA HTTP Add-on 的拦截器
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: portainer-ingress
namespace: keda
spec:
entryPoints:
- web
routes:
- match: Host(`portainer.u6.net3w.com`)
kind: Rule
middlewares:
- name: portainer-headers
services:
- name: keda-add-ons-http-interceptor-proxy
port: 8080

View File

@@ -0,0 +1,16 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: portainer
namespace: portainer
spec:
template:
spec:
containers:
- name: portainer
env:
- name: CONTROLLER_DISABLE_CSRF
value: "true"
# 说明:禁用 CSRF 校验是因为 Portainer 新版本对 CSRF 校验非常严格
# 当使用域名访问时(如 portainer.u6.net3w.com需要禁用此校验
# 如果需要重新启用,将此值改为 "false" 或删除此环境变量

View File

@@ -0,0 +1,10 @@
# 添加 Helm 仓库
helm repo add portainer https://portainer.github.io/k8s/
helm repo update
# 安装 Portainer
# 注意:这里我们利用 Longhorn 作为默认存储类
helm install --create-namespace -n portainer portainer portainer/portainer \
--set persistence.enabled=true \
--set persistence.storageClass=longhorn \
--set service.type=NodePort

View File

@@ -0,0 +1,272 @@
# 域名绑定配置总结
## 配置完成时间
2026-01-30
## 域名配置
所有服务已绑定到 `*.u9.net3w.com` 子域名,通过 Caddy 作为前端反向代理。
### 已配置的子域名
| 服务 | 域名 | 后端服务 | 命名空间 |
|------|------|---------|---------|
| Longhorn UI | https://longhorn.u9.net3w.com | longhorn-frontend:80 | longhorn-system |
| Grafana | https://grafana.u9.net3w.com | kube-prometheus-stack-grafana:80 | monitoring |
| Prometheus | https://prometheus.u9.net3w.com | kube-prometheus-stack-prometheus:9090 | monitoring |
| Alertmanager | https://alertmanager.u9.net3w.com | kube-prometheus-stack-alertmanager:9093 | monitoring |
| MinIO S3 API | https://s3.u6.net3w.com | minio:9000 | minio |
| MinIO Console | https://console.s3.u6.net3w.com | minio:9001 | minio |
## 架构说明
```
Internet (*.u9.net3w.com)
Caddy (前端反向代理, 80/443)
Traefik Ingress Controller
Kubernetes Services
```
### 流量路径
1. **外部请求** → DNS 解析到服务器 IP
2. **Caddy** (端口 80/443) → 接收请求,自动申请 Let's Encrypt SSL 证书
3. **Traefik** → Caddy 转发到 Traefik Ingress Controller
4. **Kubernetes Service** → Traefik 根据 Ingress 规则路由到对应服务
## Caddy 配置
配置文件位置: `/home/fei/k3s/009-基础设施/005-ingress/Caddyfile`
```caddyfile
{
email admin@u6.net3w.com
}
# Longhorn 存储管理
longhorn.u9.net3w.com {
reverse_proxy traefik.kube-system.svc.cluster.local:80
}
# Grafana 监控仪表板
grafana.u9.net3w.com {
reverse_proxy traefik.kube-system.svc.cluster.local:80
}
# Prometheus 监控
prometheus.u9.net3w.com {
reverse_proxy traefik.kube-system.svc.cluster.local:80
}
# Alertmanager 告警管理
alertmanager.u9.net3w.com {
reverse_proxy traefik.kube-system.svc.cluster.local:80
}
```
## Ingress 配置
### Longhorn Ingress
- 文件: `/home/fei/k3s/009-基础设施/005-ingress/longhorn-ingress.yaml`
- Host: `longhorn.u9.net3w.com`
### 监控系统 Ingress
- 文件: `/home/fei/k3s/009-基础设施/006-monitoring/ingress.yaml`
- Hosts:
- `grafana.u9.net3w.com`
- `prometheus.u9.net3w.com`
- `alertmanager.u9.net3w.com`
## SSL/TLS 证书
Caddy 会自动为所有配置的域名申请和续期 Let's Encrypt SSL 证书。
- **证书存储**: Caddy Pod 的 `/data` 目录
- **自动续期**: Caddy 自动管理
- **邮箱**: admin@u6.net3w.com
## 访问地址
### 监控和管理
- **Longhorn 存储管理**: https://longhorn.u9.net3w.com
- **Grafana 监控**: https://grafana.u9.net3w.com
- 用户名: `admin`
- 密码: `prom-operator`
- **Prometheus**: https://prometheus.u9.net3w.com
- **Alertmanager**: https://alertmanager.u9.net3w.com
### 对象存储
- **MinIO S3 API**: https://s3.u6.net3w.com
- **MinIO Console**: https://console.s3.u6.net3w.com
## DNS 配置
确保以下 DNS 记录已配置A 记录或 CNAME
```
*.u9.net3w.com → <服务器IP>
```
或者单独配置每个子域名:
```
longhorn.u9.net3w.com → <服务器IP>
grafana.u9.net3w.com → <服务器IP>
prometheus.u9.net3w.com → <服务器IP>
alertmanager.u9.net3w.com → <服务器IP>
```
## 验证配置
### 检查 Caddy 状态
```bash
kubectl get pods -n default -l app=caddy
kubectl logs -n default -l app=caddy -f
```
### 检查 Ingress 状态
```bash
kubectl get ingress -A
```
### 测试域名访问
```bash
curl -I https://longhorn.u9.net3w.com
curl -I https://grafana.u9.net3w.com
curl -I https://prometheus.u9.net3w.com
curl -I https://alertmanager.u9.net3w.com
```
## 添加新服务
如果需要添加新的服务到 u9.net3w.com 域名:
### 1. 更新 Caddyfile
编辑 `/home/fei/k3s/009-基础设施/005-ingress/Caddyfile`,添加:
```caddyfile
newservice.u9.net3w.com {
reverse_proxy traefik.kube-system.svc.cluster.local:80
}
```
### 2. 更新 Caddy ConfigMap
```bash
kubectl create configmap caddy-config \
--from-file=Caddyfile=/home/fei/k3s/009-基础设施/005-ingress/Caddyfile \
-n default --dry-run=client -o yaml | kubectl apply -f -
```
### 3. 重启 Caddy
```bash
kubectl rollout restart deployment caddy -n default
```
### 4. 创建 Ingress
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: newservice-ingress
namespace: your-namespace
annotations:
traefik.ingress.kubernetes.io/router.entrypoints: web
spec:
rules:
- host: newservice.u9.net3w.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: your-service
port:
number: 80
```
### 5. 应用 Ingress
```bash
kubectl apply -f newservice-ingress.yaml
```
## 故障排查
### Caddy 无法启动
```bash
# 查看 Caddy 日志
kubectl logs -n default -l app=caddy
# 检查 ConfigMap
kubectl get configmap caddy-config -n default -o yaml
```
### 域名无法访问
```bash
# 检查 Ingress
kubectl describe ingress <ingress-name> -n <namespace>
# 检查 Traefik
kubectl logs -n kube-system -l app.kubernetes.io/name=traefik
# 测试内部连接
kubectl run test --rm -it --image=curlimages/curl -- curl -v http://traefik.kube-system.svc.cluster.local:80
```
### SSL 证书问题
```bash
# 查看 Caddy 证书状态
kubectl exec -n default -it <caddy-pod> -- ls -la /data/caddy/certificates/
# 强制重新申请证书
kubectl rollout restart deployment caddy -n default
```
## 安全建议
1. **启用基本认证**: 为敏感服务(如 Prometheus、Alertmanager添加认证
2. **IP 白名单**: 限制管理界面的访问 IP
3. **定期更新**: 保持 Caddy 和 Traefik 版本更新
4. **监控日志**: 定期检查访问日志,发现异常访问
## 维护命令
```bash
# 更新 Caddy 配置
kubectl create configmap caddy-config \
--from-file=Caddyfile=/home/fei/k3s/009-基础设施/005-ingress/Caddyfile \
-n default --dry-run=client -o yaml | kubectl apply -f -
kubectl rollout restart deployment caddy -n default
# 查看所有 Ingress
kubectl get ingress -A
# 查看 Caddy 日志
kubectl logs -n default -l app=caddy -f
# 查看 Traefik 日志
kubectl logs -n kube-system -l app.kubernetes.io/name=traefik -f
```
## 备份
重要配置文件已保存在:
- Caddyfile: `/home/fei/k3s/009-基础设施/005-ingress/Caddyfile`
- Longhorn Ingress: `/home/fei/k3s/009-基础设施/005-ingress/longhorn-ingress.yaml`
- 监控 Ingress: `/home/fei/k3s/009-基础设施/006-monitoring/ingress.yaml`
建议定期备份这些配置文件。
---
**配置完成!所有服务现在可以通过 *.u9.net3w.com 域名访问。** 🎉

View File

@@ -0,0 +1,225 @@
# K3s 基础设施部署总结
部署日期: 2026-01-30
## 已完成的基础设施组件
### ✅ 1. Helm 包管理工具
- **版本**: v3.20.0
- **位置**: /usr/local/bin/helm
- **配置**: KUBECONFIG 已添加到 ~/.bashrc
### ✅ 2. Longhorn 分布式存储
- **版本**: v1.11.0
- **命名空间**: longhorn-system
- **存储类**: longhorn (默认)
- **S3 备份**: 已配置 MinIO S3 备份
- 备份目标: s3://longhorn-backup@us-east-1/
- 凭证 Secret: longhorn-crypto
- **访问**: http://longhorn.local
### ✅ 3. Redis 中间件
- **版本**: Redis 7 (Alpine)
- **命名空间**: redis
- **存储**: 5Gi Longhorn 卷
- **持久化**: RDB + AOF 双重持久化
- **内存限制**: 2GB
- **访问**: redis.redis.svc.cluster.local:6379
### ✅ 4. PostgreSQL 数据库
- **版本**: PostgreSQL 16.11
- **命名空间**: postgresql
- **存储**: 10Gi Longhorn 卷
- **内存限制**: 2GB
- **访问**: postgresql-service.postgresql.svc.cluster.local:5432
- **凭证**:
- 用户: postgres
- 密码: postgres123
### ✅ 5. Traefik Ingress 控制器
- **状态**: K3s 默认已安装
- **命名空间**: kube-system
- **已配置 Ingress**:
- Longhorn UI: http://longhorn.local
- MinIO API: http://s3.u6.net3w.com
- MinIO Console: http://console.s3.u6.net3w.com
- Grafana: http://grafana.local
- Prometheus: http://prometheus.local
- Alertmanager: http://alertmanager.local
### ✅ 6. Prometheus + Grafana 监控系统
- **命名空间**: monitoring
- **组件**:
- Prometheus: 时间序列数据库 (20Gi 存储, 15天保留)
- Grafana: 可视化仪表板 (5Gi 存储)
- Alertmanager: 告警管理 (5Gi 存储)
- Node Exporter: 节点指标收集
- Kube State Metrics: K8s 资源状态
- **Grafana 凭证**:
- 用户: admin
- 密码: prom-operator
- **访问**:
- Grafana: http://grafana.local
- Prometheus: http://prometheus.local
- Alertmanager: http://alertmanager.local
## 目录结构
```
/home/fei/k3s/009-基础设施/
├── 003-helm/
│ ├── install_helm.sh
│ └── readme.md
├── 004-longhorn/
│ ├── deploy.sh
│ ├── s3-secret.yaml
│ ├── values.yaml
│ ├── readme.md
│ └── 说明.md
├── 005-ingress/
│ ├── deploy-longhorn-ingress.sh
│ ├── longhorn-ingress.yaml
│ └── readme.md
└── 006-monitoring/
├── deploy.sh
├── values.yaml
├── ingress.yaml
└── readme.md
/home/fei/k3s/010-中间件/
├── 001-redis/
│ ├── deploy.sh
│ ├── redis-deployment.yaml
│ └── readme.md
└── 002-postgresql/
├── deploy.sh
├── postgresql-deployment.yaml
└── readme.md
```
## 存储使用情况
| 组件 | 存储大小 | 存储类 |
|------|---------|--------|
| MinIO | 50Gi | local-path |
| Redis | 5Gi | longhorn |
| PostgreSQL | 10Gi | longhorn |
| Prometheus | 20Gi | longhorn |
| Grafana | 5Gi | longhorn |
| Alertmanager | 5Gi | longhorn |
| **总计** | **95Gi** | - |
## 访问地址汇总
需要在 `/etc/hosts` 中添加以下配置(将 `<节点IP>` 替换为实际 IP
```
<节点IP> longhorn.local
<节点IP> grafana.local
<节点IP> prometheus.local
<节点IP> alertmanager.local
<节点IP> s3.u6.net3w.com
<节点IP> console.s3.u6.net3w.com
```
## 快速验证命令
```bash
# 查看所有命名空间的 Pods
kubectl get pods -A
# 查看所有 PVC
kubectl get pvc -A
# 查看所有 Ingress
kubectl get ingress -A
# 查看存储类
kubectl get storageclass
# 测试 Redis
kubectl exec -n redis $(kubectl get pod -n redis -l app=redis -o jsonpath='{.items[0].metadata.name}') -- redis-cli ping
# 测试 PostgreSQL
kubectl exec -n postgresql postgresql-0 -- psql -U postgres -c "SELECT version();"
```
## 备份策略
1. **Longhorn 卷备份**:
- 所有持久化数据存储在 Longhorn 卷上
- 可通过 Longhorn UI 创建快照
- 自动备份到 MinIO S3 (s3://longhorn-backup@us-east-1/)
2. **数据库备份**:
- Redis: AOF + RDB 持久化
- PostgreSQL: 可使用 pg_dump 进行逻辑备份
3. **配置备份**:
- 所有配置文件已保存在 `/home/fei/k3s/` 目录
- 建议定期备份此目录
## 下一步建议
1. **安全加固**:
- 修改 PostgreSQL 默认密码
- 配置 TLS/SSL 证书
- 启用 RBAC 权限控制
2. **监控优化**:
- 配置告警通知邮件、Slack、钉钉
- 导入更多 Grafana 仪表板
- 为 Redis 和 PostgreSQL 添加专用监控
3. **高可用**:
- 考虑 Redis 主从复制或 Sentinel
- 考虑 PostgreSQL 主从复制
- 增加 K3s 节点实现多节点高可用
4. **日志收集**:
- 部署 Loki 或 ELK 进行日志聚合
- 配置日志持久化和查询
5. **CI/CD**:
- 部署 GitLab Runner 或 Jenkins
- 配置自动化部署流程
## 维护命令
```bash
# 更新 Helm 仓库
helm repo update
# 升级 Longhorn
helm upgrade longhorn longhorn/longhorn --namespace longhorn-system -f values.yaml
# 升级监控栈
helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack --namespace monitoring -f values.yaml
# 查看 Helm 发布
helm list -A
# 清理未使用的镜像
kubectl get pods -A -o jsonpath='{range .items[*]}{.spec.containers[*].image}{"\n"}{end}' | sort -u
```
## 故障排查
如果遇到问题,请检查:
1. Pod 状态: `kubectl get pods -A`
2. 事件日志: `kubectl get events -A --sort-by='.lastTimestamp'`
3. Pod 日志: `kubectl logs -n <namespace> <pod-name>`
4. 存储状态: `kubectl get pvc -A`
5. Longhorn 卷状态: 访问 http://longhorn.local
## 联系和支持
- Longhorn 文档: https://longhorn.io/docs/
- Prometheus 文档: https://prometheus.io/docs/
- Grafana 文档: https://grafana.com/docs/
- K3s 文档: https://docs.k3s.io/
---
**部署完成!所有基础设施组件已成功运行。** 🎉

View File

@@ -0,0 +1,17 @@
#!/bin/bash
# 创建命名空间
kubectl create namespace redis
# 部署 Redis
kubectl apply -f redis-deployment.yaml
# 等待 Redis 启动
echo "等待 Redis 启动..."
kubectl wait --for=condition=ready pod -l app=redis -n redis --timeout=300s
# 显示状态
echo "Redis 部署完成!"
kubectl get pods -n redis
kubectl get pvc -n redis
kubectl get svc -n redis

View File

@@ -0,0 +1,52 @@
# Redis 部署说明
## 配置信息
- **命名空间**: redis
- **存储**: 使用 Longhorn 提供 5Gi 持久化存储
- **镜像**: redis:7-alpine
- **持久化**: 启用 RDB + AOF 双重持久化
- **内存限制**: 2GB
- **访问地址**: redis.redis.svc.cluster.local:6379
## 部署方式
```bash
bash deploy.sh
```
## 持久化配置
### RDB 快照
- 900秒内至少1个key变化
- 300秒内至少10个key变化
- 60秒内至少10000个key变化
### AOF 日志
- 每秒同步一次
- 自动重写阈值: 64MB
## 内存策略
- 最大内存: 2GB
- 淘汰策略: allkeys-lru (所有key的LRU算法)
## 连接测试
在集群内部测试连接:
```bash
kubectl run redis-test --rm -it --image=redis:7-alpine -- redis-cli -h redis.redis.svc.cluster.local ping
```
## 备份说明
Redis 数据存储在 Longhorn 卷上,可以通过 Longhorn UI 创建快照和备份到 S3。
## 监控
可以通过以下命令查看 Redis 状态:
```bash
kubectl exec -n redis $(kubectl get pod -n redis -l app=redis -o jsonpath='{.items[0].metadata.name}') -- redis-cli info
```

View File

@@ -0,0 +1,123 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: redis-pvc
namespace: redis
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 5Gi
---
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-config
namespace: redis
data:
redis.conf: |
# Redis 配置
bind 0.0.0.0
protected-mode yes
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
# 持久化配置
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /data
# AOF 持久化
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
# 内存管理
maxmemory 2gb
maxmemory-policy allkeys-lru
# 日志
loglevel notice
logfile ""
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
namespace: redis
spec:
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7-alpine
command:
- redis-server
- /etc/redis/redis.conf
ports:
- containerPort: 6379
name: redis
volumeMounts:
- name: data
mountPath: /data
- name: config
mountPath: /etc/redis
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "2Gi"
cpu: "1000m"
livenessProbe:
tcpSocket:
port: 6379
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: data
persistentVolumeClaim:
claimName: redis-pvc
- name: config
configMap:
name: redis-config
---
apiVersion: v1
kind: Service
metadata:
name: redis
namespace: redis
spec:
selector:
app: redis
ports:
- port: 6379
targetPort: 6379
protocol: TCP
type: ClusterIP

View File

@@ -0,0 +1,25 @@
#!/bin/bash
# 创建命名空间
kubectl create namespace postgresql
# 部署 PostgreSQL
kubectl apply -f postgresql-deployment.yaml
# 等待 PostgreSQL 启动
echo "等待 PostgreSQL 启动..."
kubectl wait --for=condition=ready pod -l app=postgresql -n postgresql --timeout=300s
# 显示状态
echo "PostgreSQL 部署完成!"
kubectl get pods -n postgresql
kubectl get pvc -n postgresql
kubectl get svc -n postgresql
echo ""
echo "连接信息:"
echo " 主机: postgresql-service.postgresql.svc.cluster.local"
echo " 端口: 5432"
echo " 用户: postgres"
echo " 密码: postgres123"
echo " 数据库: postgres"

View File

@@ -0,0 +1,167 @@
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgresql-pvc
namespace: postgresql
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: Secret
metadata:
name: postgresql-secret
namespace: postgresql
type: Opaque
stringData:
POSTGRES_PASSWORD: "postgres123"
POSTGRES_USER: "postgres"
POSTGRES_DB: "postgres"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: postgresql-config
namespace: postgresql
data:
postgresql.conf: |
# 连接设置
listen_addresses = '*'
max_connections = 100
# 内存设置
shared_buffers = 256MB
effective_cache_size = 1GB
maintenance_work_mem = 64MB
work_mem = 4MB
# WAL 设置
wal_level = replica
max_wal_size = 1GB
min_wal_size = 80MB
# 日志设置
logging_collector = on
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_statement = 'all'
log_duration = on
# 性能优化
random_page_cost = 1.1
effective_io_concurrency = 200
pg_hba.conf: |
# TYPE DATABASE USER ADDRESS METHOD
local all all trust
host all all 0.0.0.0/0 md5
host all all ::0/0 md5
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgresql
namespace: postgresql
spec:
serviceName: postgresql
replicas: 1
selector:
matchLabels:
app: postgresql
template:
metadata:
labels:
app: postgresql
spec:
containers:
- name: postgresql
image: postgres:16-alpine
ports:
- containerPort: 5432
name: postgresql
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgresql-secret
key: POSTGRES_USER
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgresql-secret
key: POSTGRES_PASSWORD
- name: POSTGRES_DB
valueFrom:
secretKeyRef:
name: postgresql-secret
key: POSTGRES_DB
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
- name: config
mountPath: /etc/postgresql
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "2Gi"
cpu: "1000m"
livenessProbe:
exec:
command:
- pg_isready
- -U
- postgres
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command:
- pg_isready
- -U
- postgres
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: data
persistentVolumeClaim:
claimName: postgresql-pvc
- name: config
configMap:
name: postgresql-config
---
apiVersion: v1
kind: Service
metadata:
name: postgresql
namespace: postgresql
spec:
selector:
app: postgresql
ports:
- port: 5432
targetPort: 5432
protocol: TCP
type: ClusterIP
clusterIP: None
---
apiVersion: v1
kind: Service
metadata:
name: postgresql-service
namespace: postgresql
spec:
selector:
app: postgresql
ports:
- port: 5432
targetPort: 5432
protocol: TCP
type: ClusterIP

View File

@@ -0,0 +1,99 @@
# PostgreSQL 16 部署说明
## 配置信息
- **命名空间**: postgresql
- **版本**: PostgreSQL 16 (Alpine)
- **存储**: 使用 Longhorn 提供 10Gi 持久化存储
- **内存限制**: 2GB
- **访问地址**: postgresql-service.postgresql.svc.cluster.local:5432
## 默认凭证
- **用户名**: postgres
- **密码**: postgres123
- **数据库**: postgres
⚠️ **安全提示**: 生产环境请修改默认密码!
## 部署方式
```bash
bash deploy.sh
```
## 数据库配置
### 连接设置
- 最大连接数: 100
- 监听地址: 所有接口 (*)
### 内存配置
- shared_buffers: 256MB
- effective_cache_size: 1GB
- work_mem: 4MB
### WAL 配置
- wal_level: replica (支持主从复制)
- max_wal_size: 1GB
### 日志配置
- 记录所有 SQL 语句
- 记录执行时间
## 连接测试
在集群内部测试连接:
```bash
kubectl run pg-test --rm -it --image=postgres:16-alpine --env="PGPASSWORD=postgres123" -- psql -h postgresql-service.postgresql.svc.cluster.local -U postgres -c "SELECT version();"
```
## 数据持久化
PostgreSQL 数据存储在 Longhorn 卷上:
- 数据目录: /var/lib/postgresql/data/pgdata
- 可以通过 Longhorn UI 创建快照和备份到 S3
## 常用操作
### 查看日志
```bash
kubectl logs -n postgresql postgresql-0 -f
```
### 进入数据库
```bash
kubectl exec -it -n postgresql postgresql-0 -- psql -U postgres
```
### 创建新数据库
```bash
kubectl exec -n postgresql postgresql-0 -- psql -U postgres -c "CREATE DATABASE myapp;"
```
### 创建新用户
```bash
kubectl exec -n postgresql postgresql-0 -- psql -U postgres -c "CREATE USER myuser WITH PASSWORD 'mypassword';"
kubectl exec -n postgresql postgresql-0 -- psql -U postgres -c "GRANT ALL PRIVILEGES ON DATABASE myapp TO myuser;"
```
## 备份与恢复
### 手动备份
```bash
kubectl exec -n postgresql postgresql-0 -- pg_dump -U postgres postgres > backup.sql
```
### 恢复备份
```bash
cat backup.sql | kubectl exec -i -n postgresql postgresql-0 -- psql -U postgres postgres
```
## 监控
查看数据库状态:
```bash
kubectl exec -n postgresql postgresql-0 -- psql -U postgres -c "SELECT * FROM pg_stat_activity;"
```

View File

@@ -0,0 +1,32 @@
FROM python:3.11-alpine
# 安装 nginx
RUN apk add --no-cache nginx
# 创建工作目录
WORKDIR /app
# 复制生成器脚本
COPY generator.py /app/
COPY index.html /usr/share/nginx/html/
# 创建 nginx 配置
RUN mkdir -p /run/nginx && \
echo 'server {' > /etc/nginx/http.d/default.conf && \
echo ' listen 80;' >> /etc/nginx/http.d/default.conf && \
echo ' root /usr/share/nginx/html;' >> /etc/nginx/http.d/default.conf && \
echo ' index index.html;' >> /etc/nginx/http.d/default.conf && \
echo ' location / {' >> /etc/nginx/http.d/default.conf && \
echo ' try_files $uri $uri/ =404;' >> /etc/nginx/http.d/default.conf && \
echo ' }' >> /etc/nginx/http.d/default.conf && \
echo '}' >> /etc/nginx/http.d/default.conf
# 启动脚本
RUN echo '#!/bin/sh' > /app/start.sh && \
echo 'nginx' >> /app/start.sh && \
echo 'python3 /app/generator.py' >> /app/start.sh && \
chmod +x /app/start.sh
EXPOSE 80
CMD ["/app/start.sh"]

Some files were not shown because too many files have changed in this diff Show More