Hardware specs, inference configuration, optimization history, and agent infrastructure. Everything documented. Raw numbers only.
Two-machine cluster connected by direct-wire Ethernet on an isolated /30 subnet (10.10.10.0/30). Sub-millisecond latency between nodes. cha0tikhome handles all orchestration and agent processes. cha0tiktower is inference-only. Both connected to Tailscale mesh for remote access.
All inference clients route through a single local proxy on tower:8010. Backends are hot-swappable via config without reconfiguring any consuming agent. OpenAI-compatible API with model aliasing and auth. Switch command: proxy-switch genesis|aeon.
| Model | Quant | Gen | Context | Server | Status | Notes |
|---|---|---|---|---|---|---|
| Qwen3.6-35B-A3B (MoE) | UD-Q4_K_M | ~107 t/s | 65K | llama.cpp | active | Genesis. MoE: only 3B active params per token. f16 KV. Default backend. |
| Qwen3 27B (dense) | GPTQ-Marlin INT4, fp8 KV | ~85 t/s | 160K | vLLM 0.9+ | active | Long-context workloads. fp8 KV halves VRAM vs bf16. |
| AEON (Qwen3 NVFP4) | NVFP4 (ModelOpt) | ~69 t/s | 122K | vLLM 0.9+ | stopped | Blackwell-native quantization. MTP speculative decoding n=3. |
| Qwen3.6-27B (SSM hybrid) | Q4_K_M | ~22 t/s | 65K | llama.cpp | stopped | Mamba/SSM hybrid. Fast prefill (960 t/s), slow gen (SSM bottleneck). |
| Gemma 4 26B (CPU baseline) | Q4_K_M | ~11.7 t/s | 32K | llama.cpp | historical | CPU-only on cha0tikhome. Pre-tower reference. Consistent swap usage 4–8 GB. |
39 automated experiments across two model architectures using a scripted autoresearch loop. Stopping criterion: +5 tok/s improvement per iteration. Full logs at randomchaos7800-hub.
| Configuration | Gen Speed | Prompt Speed | Delta vs Baseline | Notes |
|---|---|---|---|---|
| Dual GPU, all-on-GPU, f16 KV | 107 t/s | 2,436 t/s | +50% vs single GPU | Final config. TP=2, PCIe x8+x4. |
| Single GPU, all-on-GPU, f16 KV | 71 t/s | — | Pre-dual baseline | After CPU offload fix. Clean VRAM fit. |
| Full GPU offload (Exp 2 breakthrough) | 70 t/s | 222 t/s | +118% gen, +200% prompt | vs expert-on-CPU config. |
| Expert tensors on CPU (MoE routing) | 32 t/s | 74 t/s | −55% | CPU bottleneck on MoE expert routing. |
| Metric | Value |
|---|---|
| Sample period | April 2–18, 2026 |
| Model | Gemma 4 26B Q4_K_M |
| Avg gen throughput | 10.49 tok/s |
| Range | 5.02 – 11.72 tok/s |
| Avg time to first token | 616 – 2,224 ms |
| Avg swap used | 4.6 – 7.7 GiB |
| Speedup vs GPU tier | 6.6× (10.5 → 74 tok/s) |
All agents run as systemd user services. Restart=always, StartLimitBurst=3, StartLimitIntervalSec=60. Recovery under 5 seconds on any crash. All inference via tower:8010. Slack Socket Mode (bi-directional) for interactive agents.
| Agent | Framework | Interface | Model tier | Status |
|---|---|---|---|---|
| Frank (harness) | Python / custom | Slack Socket Mode :8890 | Qwen3.6 MoE (local) | live |
| Mike | RelayV3 (Python) | Discord + Telegram + Slack + IRC | Local proxy :8010 | live |
| Kato | TypeScript / Frank persona | Slack Socket Mode | Sonnet 4.6 / local | live |
| Dave (CFO) | TypeScript / Frank persona | Slack Socket Mode / #finance | Local proxy :8010 | live |
| Hermes | Python / hermes-gateway.service | OpenRouter-backed gateway | OpenRouter | live |
| CJ Craig | TypeScript / Frank persona | Slack Socket Mode | Sonnet 4.6 / local | live |
| Morty | TypeScript / Frank persona | Slack Socket Mode | Haiku 4.5 (Agent SDK) | live |
| Sabrina | Python / bot.py | Telegram + Slack (REST) | Local proxy :8010 | live |
| Chronicle | Python / cron | 2am daily cron | Claude API | live |
kill -9 → verified restart under 5 seconds. Mattermost decommissioned 2026-03-01. All agents migrated to Slack Socket Mode. Zero external API dependencies for core agent function at Tier 2 hardware — OpenRouter and Claude API remain available as fallbacks, configured to alert loudly when triggered.All benchmark data and logs referenced in the research paper are available publicly.
| Artifact | Location | Contents |
|---|---|---|
| Live benchmark data | dinovitale.com/benchmarks.html | Inference metrics, MESA scores, LongMemEval results |
| Agent infrastructure | randomchaos7800-hub | Frank harness, benchmark tooling, optimization logs |
| Research paper | papers.html | Full preprint text, tables, references |
| Weekly build logs | dinovitale.com | Weekly recap posts with build decisions and operational notes |