---
name: hermes-server-ops
description: "Umbrella for Hermes Agent server operations — gateway troubleshooting, server migration, web dashboard deployment, API server, Nginx, and platform integration."
tags: [hermes, devops, gateway, migration, dashboard, nginx, qq-bot]
---

# Hermes Server Operations

Class-level umbrella for all Hermes Agent server management tasks. Covers three distinct operational domains:

1. [Gateway Troubleshooting](#1-gateway-troubleshooting) — diagnosing/fixing crashes, DB corruption, WebSocket disconnects, API relay switching
2. [Server Migration](#2-server-migration) — moving Hermes installation (config, skills, state, sessions) between VPS servers
3. [Web Dashboard Deployment](#3-web-dashboard-deployment) — web UI, API server, Nginx reverse proxy, auth, QQ bot integration

---

## 1. Gateway Troubleshooting

### When to Load This Section
- QQ/Telegram/WeChat bot not responding
- Gateway crashing or in a restart loop
- SQLite database errors (`file is not a database`, `no such column`)
- WebSocket connection timeouts (code=4009)
- Need to switch API relays (OpenRouter → OpenKey)

### Quick Diagnostic

```bash
# 1. Is gateway alive?
hermes gateway status
ps aux | grep hermes | grep -v grep

# 2. Check logs
tail -100 /root/.hermes/logs/gateway.log
tail -50 /root/.hermes/logs/errors.log

# 3. Check DB files
ls -la /root/.hermes/*.db

# 4. Check TCP connections for QQ
ss -tnp | grep hermes
```

**⚠️ This server runs gateway manually (not systemd):** `hermes gateway run` — if the process dies, nothing auto-restarts.

### Common Fixes

| Symptom | Fix |
|---------|-----|
| `file is not a database` (state.db) | Remove state.db — gateway falls back to JSONL |
| `no such column: session_id` (kanban.db) | ALTER TABLE to add missing column, then restart gateway |
| `kanban.db is not a valid SQLite database` | Remove kanban.db, run `hermes kanban init`, restart gateway |
| QQ WebSocket code=4009 | Restart gateway — resolves auto-reconnect |
| Gateway process dead | `hermes gateway run --replace` via terminal(background=true) |
| QQ silent but gateway "running" | Check TCP connections with `ss -tnp` — likely dead WebSocket |

### API Relay Switching (OpenRouter → OpenKey)

When OpenRouter KYC blocks top-up:
1. Get key from `https://openkey.cloud` (Alipay, no KYC)
2. Add to `.env`: `OPENKEY_API_KEY=sk-xxx`
3. Add provider to `config.yaml` using `api_key` directly (NOT `api_key_env`)
4. Update `model.provider: openkey` and `model.default: gpt-4o-mini` (no prefix)
5. Restart gateway

**Pitfalls:**
- `api_key_env` does NOT work for custom providers — use `api_key` directly
- Model name must NOT have provider prefix when `model.provider` is set
- Gateway caches environment — restart after .env changes

### Detailed Reference Files

See `references/gateway-diagnostics.md` — comprehensive fix procedures for all DB corruption scenarios.
See `references/qq-websocket-cloudcone.md` — US datacenter QQ WebSocket instability diagnosis.
See `references/api-relay-options.md` — API relay alternatives when OpenRouter is blocked.

---

## 2. Server Migration

### When to Load This Section
- VPS expiring, need to move to new server
- Switching hosting providers
- Re-deploying from scratch

### Core Principles
1. **First confirm what machine you're on** — run `hostname && ip addr show | grep 'inet '` before any SSH
2. **Set up new server first, then copy config** — don't operate two simultaneously
3. **Exclude large files** — `.hermes/` can be 700MB+; exclude node/temp/sessions-state-snapshots
4. **One step at a time** — verify each step before moving to next

### Critical Files to Migrate

| File | Why Critical |
|------|-------------|
| `SOUL.md` | Defines agent personality — missing it = default OWL identity |
| `sessions/sessions.json` | QQ bot session token — missing = QQ won't connect |
| `state.db*` | User memories — missing = no memory |
| `config.yaml` + `.env` | All configuration |
| `skills/` | All custom skills |

### Migration Checklist

```bash
# 1. New server prep
sshpass -p 'pw' ssh root@NEWIP 'apt install -y python3.14-venv'
sshpass -p 'pw' ssh root@NEWIP 'python3 -m venv /opt/hermes-venv && /opt/hermes-venv/bin/pip install -q hermes-agent'

# 2. Pack old server (exclude large files!)
cd /root && tar czf /tmp/hermes-migrate.tar.gz \
  .hermes/config.yaml .hermes/.env .hermes/SOUL.md \
  .hermes/skills/ .hermes/state.db* .hermes/sessions/sessions.json \
  .hermes/logs/ .hermes/cron/ .hermes/kanban.db .hermes/webchat/

# 3. Transfer
rsync -avz --timeout=60 -e "sshpass -p 'pw' ssh" /root/.hermes/skills/ root@NEWIP:/root/.hermes/skills/

# 4. Post-migration fixes
cp /root/.hermes/SOUL.md /root/AGENTS.md              # AGENTS.md is what Hermes actually reads
sed -i 's/personality: kawaii/personality: helpful/' config.yaml
ln -sf /opt/hermes-venv/bin/hermes /usr/local/bin/hermes
```

### Detailed Reference Files

See `references/server-migration-checkpoints.md` — complete step-by-step with all SSH commands, error handling, and verification.
See `references/qqbot-troubleshooting.md` — why QQ bot won't connect post-migration and how to fix.

---

## 3. Web Dashboard Deployment

### When to Load This Section
- User wants a browser-based chat UI for Hermes
- User wants access from multiple devices via URL
- User wants QQ bot + web chat as dual access paths
- User says "web chat should work like terminal"

### Architecture Options

| Option | Description | Best For |
|--------|-------------|----------|
| A: Built-in Dashboard | `hermes dashboard --host 0.0.0.0` (port 9119) | Quick test, single user |
| B: Standalone Web Chat | Custom server via `hermes chat -Q -q` subprocess | Simple Q&A, no tool calls |
| C: API Server Agent Mode | Full agent via Hermes API Server (port 8080) | Full parity with terminal |

**Option C is recommended** — it provides tools, memory, and session continuity.

### Quick Deploy (Option C)

```bash
# 1. Create server at ~/.hermes/webchat/server.py
#    (use templates/web-chat-server-api.py as base)

# 2. Start gateway with API Server
API_SERVER_ENABLED=true API_SERVER_KEY=$(openssl rand -hex 32) \
  API_SERVER_PORT=8080 API_SERVER_HOST=127.0.0.1 \
  API_SERVER_CORS_ORIGINS="*" \
  hermes gateway run --accept-hooks

# 3. Nginx reverse proxy
cat > /etc/nginx/sites-available/hermes-dashboard << 'EOF'
server {
    listen 80;
    server_name _;
    location / {
        proxy_pass http://127.0.0.1:8888;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_read_timeout 120;
    }
}
EOF
ln -sf /etc/nginx/sites-available/hermes-dashboard /etc/nginx/sites-enabled/
rm -f /etc/nginx/sites-enabled/default
nginx -t && nginx -s reload
```

### QQ Bot Integration

Add to `~/.hermes/.env`:
```bash
QQ_APP_ID=your-app-id
QQ_CLIENT_SECRET=your-secret
QQ_ALLOW_ALL_USERS=true
```

QQ bot is **built into hermes-agent** at `/opt/hermes-venv/lib/python3*/site-packages/gateway/platforms/qqbot/` — no extra pip install needed.

**Pitfalls:**
- SOUL.md must be populated BEFORE connecting QQ — otherwise the bot replies as "OWL" instead of your custom persona
- QQ sandbox mode: bot only receives messages from test channel until published
- US datacenter IPs → unstable QQ WebSocket (see Gateway Troubleshooting section)
- `hermes gateway install --system` refuses root — pass `--run-as-user root`

### Detailed Reference Files

See `references/systemd-deployment.md` — gateway + webchat systemd service templates and verification checklist.
See `references/session-notes.md` — deployment history, memory injection, 429 rate limits, Chrome HTTP blocking.
See `references/custom-routes.md` — adding custom pages (video viewer, etc.) to the webchat server.
See `templates/web-chat-server.py` — basic web chat server (CLI subprocess).
See `templates/web-chat-server-api.py` — full agent web chat (API Server, tools + memory).

---

## 4. Hermes Agent Upgrade

### When to Load This Section
- `hermes --version` shows "Update available"
- User asks to upgrade Hermes Agent
- Need to install dev version from GitHub (PyPI is behind)

### Upgrade Paths

#### A. PyPI Stable (simple cases)
```bash
# If system Python is NOT externally-managed:
pip install --upgrade hermes-agent

# If system Python IS externally-managed (Ubuntu 26.04+):
# Use the Hermes venv directly:
/opt/hermes-venv/bin/pip install --upgrade hermes-agent
```

#### B. GitHub Dev Version (when PyPI is behind)
Required when: (1) PyPI hasn't released latest commits, or (2) need unreleased features.

**⚠️ Python version constraint:** hermes-agent requires `>=3.11,<3.14`. Server with Python 3.14 CANNOT install directly. Must use a parallel Python 3.13 venv.

```bash
# Step 1: Install uv (if not present)
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"

# Step 2: Install Python 3.13
uv python install 3.13

# Step 3: Create venv with Python 3.13
uv venv /opt/hermes-dev --python 3.13

# Step 4: Clone and install dev version
cd /tmp && git clone https://github.com/NousResearch/hermes-agent.git
cd /tmp/hermes-agent && uv pip install -e . --python /opt/hermes-dev/bin/python

# Step 5: Verify
/opt/hermes-dev/bin/hermes --version
```

**Switching between versions:**
```bash
# Backup old, link new
mv /usr/local/bin/hermes /usr/local/bin/hermes-old
ln -s /opt/hermes-dev/bin/hermes /usr/local/bin/hermes

# Verify
hermes --version
```

**⚠️ uv-managed venvs have NO pip** — always use `uv pip install --python /path/to/venv/bin/python`, never `venv/bin/pip`.

### Post-Upgrade: Restart Gateway

**⚠️ Gateway restart disconnects the current session.** The agent CANNOT restart its own gateway — the user must do it manually.

Tell the user to run in their terminal:
```bash
# If gateway is a foreground process:
# Ctrl+C, then: hermes gateway run --replace

# If gateway is systemd:
systemctl --user restart hermes-gateway

# If gateway was started with nohup/background:
kill $(pgrep -f "hermes gateway" | head -1) && sleep 2 && hermes gateway run --replace
```

### Post-Upgrade: Install New Skills

New Hermes versions may include new skills in `skills/` and `optional-skills/` directories. Compare and copy:

```bash
# See what's new
diff <(ls ~/.hermes/skills/ | sort) <(ls /tmp/hermes-agent/skills/ | sort)
diff <(ls ~/.hermes/skills/ | sort) <(ls /tmp/hermes-agent/optional-skills/ | sort)

# Copy new skills (example)
cp -r /tmp/hermes-agent/optional-skills/research/scrapling ~/.hermes/skills/research/
```

### Cross-Domain Pitfalls

- **Gateway caches DB fingerprints** — replacing a DB file on disk is not enough; you MUST restart the gateway process.
- **pkill from remote** — blocked by security policy. Must be done locally in server terminal.
- **Duplicate gateway processes** cause QQ double-reply. Verify only one `ps aux | grep hermes` process.
- **Ubuntu 26.04 changed Python to 3.14** — `python3.14-venv` package, not `python3-venv`. hermes-agent requires Python <3.14.
- **Hostinger VPS default: no password SSH** — enable with `sed -i` on sshd_config.
- **Chrome blocks HTTP to non-localhost IPs** — use HTTPS or access via localhost/trusted network.
- **Playwright doesn't support Ubuntu 26.04** — but system Chromium exists; tools like scrapling can use it via `PLAYWRIGHT_BROWSERS_PATH`.
- **Agent cannot restart its own gateway** — it kills the session it's running in. Always tell the user to restart manually.
