---
name: hermes-model-management

description: Combine model management and decision-making skills for effective model selection and deployment within the Hermes framework.

---

### Overview
This umbrella skill encompasses skills related to managing model selection, deployment, and operational strategies, focusing on user preferences and decision-making guidelines.

## Key Areas
1. **User Preferences**: Understanding user requirements and customizing model selection accordingly.
2. **Decision-Making Framework**: Establish clear protocols for autonomous decisions regarding model changes.
3. **Performance Evaluation**: Provide feedback loops for ongoing performance assessment of selected models.

## ⚠️ Critical Rules for 张哥's Setup

### DO NOT change models without explicit user request
- **✓ 换模型不需要重启gateway**（2026-06-16张哥明确纠正："我换过多少次都没有重启"）
- `hermes config set model.default=xxx` 和 `hermes config set model.provider=xxx` 改完配置即刻生效，无需重启
- 只需改 `model.default` 和 `model.provider`（如果需要），gateway自动加载新配置
- 如果改后模型不可用（503等），直接改回来即可，不中断连接
- **Rule**: If current model works, DO NOT touch it. Only change when user explicitly asks.

### Free Model Realities (OpenRouter)
- "Free" models on OpenRouter often have hidden daily limits (e.g., "50 uses/day" that deplete in 2-3 actual requests)
- Free models can be throttled or become unreliable after initial use
- Free models are often slow and struggle with browser/web tasks
- User has been burned repeatedly by free model limitations — do not promise "free and stable"

### Paid Model Warnings
- User is extremely cost-averse after being burned by unexpected charges ($50 in minutes, $100-200 overnight)
- NEVER switch to a paid model without explicit user approval
- If a paid model is needed, clearly state the cost structure upfront
- OpenKey (openkey.cloud) was previously used — user has $50 credit but is wary of spending

### Model Change Procedure (only when explicitly requested)
1. Update config.yaml with new model (`hermes config set model.default=xxx` + `hermes config set model.provider=xxx`)
2. Verify the config change took effect (`grep -A3 '^model:' ~/.hermes/config.yaml`)
3. **不需要重启gateway** — 配置立即生效
4. Verify the new model is actually responding (send a test message)
5. If the new model keeps failing (e.g. HTTP 503), immediately switch back to the working model
6. **⚠️ 模型失败时不要发错误消息给用户** — 503/429等错误用户不需要看到，直接静默切换回可用模型或告诉用户"当前模型不可用"

### ⚠️ Local Model Recommendation Rules (2026-06-07)

**Before recommending ANY local model, you MUST check:**
1. Model file size / VRAM requirement
2. User's actual GPU VRAM (张哥: 4GB)
3. Model must fit in VRAM with headroom (max ~3.5GB for 4GB card)

**张哥's PC**: 16GB RAM + 4GB GPU → models must be ≤ ~3.5GB

**Safe recommendations for 4GB VRAM:**
- Qwen2.5-VL-3B (~2.5GB) — multimodal, Chinese-friendly
- Gemma-3-4B (~2.5GB) — multimodal, OK Chinese
- Qwen2.5-VL-7B-Q4 (quantized, ~3-4GB) — if quantized version available

**NEVER recommend without checking size first.** If model > VRAM, recommend quantized version or smaller model. 6GB model on 4GB card = won't work at all.

**LM Studio preferred over Ollama** for 张哥 (GUI, easier, less error-prone).

### ⚠️ API Relay vs Direct API Quality (2026-06-07)

张哥的实际经历：智谱 GLM-4V-5V 通过腾讯中转站使用时效果极好（猜球8场对7场，说话清醒坚定），但从官网直连同一个模型名时效果很差（5场全错，说话含糊）。

**教训**：
- API中转站可能使用不同的模型版本/配置/系统提示词
- 同一个模型名，不同入口质量可能差异巨大
- 当张哥说"之前用的很聪明，现在不聪明了"，可能是入口变了，不是模型变了
- 推荐模型时，入口/中转站的选择和模型本身一样重要

## AI Model Relay (中转站) Knowledge

### What 张哥 Means by "中转站"
- **NOT** a VPN/机场 (翻墙梯子)
- **IS** an API aggregation platform that provides one key to access multiple LLMs (GPT-4, Claude, Gemini, etc.)
- Pattern: use free model daily, switch to paid model (GPT-4/Claude) for hard tasks
- 张哥's current setup: OpenRouter (openrouter.ai) → owl-alpha (free)

### OpenRouter Specifics
- **Supports Alipay and WeChat Pay** for充值 — multiple B站/CSDN tutorials confirm this
- Free plan: 25+ free models, 50 reqs/day (depletes fast — 2-3 real requests)
- Paid: 400+ models via pay-as-you-go (5.5% platform fee)
- If 张哥 says "can't recharge", the recharge entry point may need navigation help — do NOT assume it's unsupported
- Payment methods: credit card, crypto, and via third-party channels (EasyPay virtual cards, etc.)

### Search Strategy for 张哥's Requests
- When 张哥 says "find X in Hong Kong/Taiwan/Vietnam/Thailand", search DIRECTLY for those regions
- Do NOT start with "US or China" options — this triggers his frustration
- 张哥 explicitly corrected this twice in one session: "不是美国的就是国内的，让你看看香港的呀"
- Expectation: research the SPECIFIC region he named, give ONE best recommendation, not a comparison table

## Cloud Server (VPS) Knowledge

### Current Server: CloudCone
- Identified by `/etc/resolv.conf` containing `CloudCone`
- Hostname: `hermes-ai`
- Config: 2核 + 2G RAM + 40G SSD
- Location: US Los Angeles
- Price: ~$59.1/year (~430元/年), same price on renewal

### Hong Kong VPS Alternatives
- 野草云: 1核1G + 20G ≈ ¥240-360/year — cheaper but half the specs
- VMiss: 1核1G + 20G ≈ ¥300-420/year — similar
- 华纳云: 1核1G + 20G ≈ ¥360-480/year — more expensive but stable
- Trade-off: HK is faster for China access but CloudCone is cheaper with better specs

## References
- [Free Models Reality Check](references/free-models-reality-check.md) — 张哥's actual experience with free/paid model traps
- [AI Relay Comparison](references/ai-relay-comparison.md) — API中转站 options and pricing

See also [model-management](model-management/SKILL.md)
