# QQ Bot WebSocket Instability — CloudCone Server Diagnosis

## Session: 2026-05-17 05:48 UTC

### Symptoms
- QQ Bot shows "灵魂不在线" (soul not present) in QQ
- WebSocket disconnects every ~60 seconds
- Auto-reconnect usually succeeds but occasionally fails with "Failed to get QQ Bot gateway URL"
- After failure, bot stops responding until gateway restart

### Environment
- Server: CloudCone VPS, IP 74.48.56.81 (later migrated to `hermes-ai` host)
- OS: Ubuntu 22.04, 2GB RAM
- Service: `hermes-gateway.service` (systemd)
- QQ Bot AppID: 1903996332
- WebSocket endpoint: `wss://api.sgroup.qq.com/websocket`

### Log Pattern (Normal Disconnect-Reconnect Cycle)
```
WARNING WebSocket error: WebSocket closed
INFO Reconnecting in 2s (attempt 1)...
INFO WebSocket connected to wss://api.sgroup.qq.com/websocket
INFO Reconnected
INFO Resume sent (session_id=..., seq=...)
INFO Session resumed
... (60 seconds later, repeat)
```

### Log Pattern (Failure)
```
WARNING WebSocket error: WebSocket closed
INFO Reconnecting in 2s (attempt 1)...
WARNING Reconnect failed: Failed to get QQ Bot gateway URL:
```

### Fix
```bash
sudo systemctl restart hermes-gateway.service
```

After restart, verify:
```bash
tail -20 ~/.hermes/logs/gateway.log | grep -i "qq\|websocket\|connect"
```

Expected healthy output:
```
✓ qqbot connected
[QQBot:1903996332] WebSocket connected
[QQBot:1903996332] Ready, session_id=...
```

### Root Cause
US datacenter IPs have poor/unstable routing to QQ's Chinese infrastructure. The WebSocket long connection times out due to latency and packet loss. This is a known issue with budget VPS providers (CloudCone, etc.) when connecting to Chinese services.

### Long-Term Solution
Run Hermes on 张哥's local Windows PC (residential Chinese IP) instead of a US VPS. The local PC has a 4GB GPU and can connect to cloud AI APIs (OpenRouter, etc.) while maintaining stable QQ connectivity.
