Boundary Labs / Airway Heights, WA / est. 2024

AI Memory Architecture &
Local Inference Research

Independent research on persistent memory systems for AI agents, inference optimization on consumer NVIDIA hardware, autonomous agent evaluation, and AI behavioral continuity. One person. Real hardware. Production systems.

107 tok/s peak inference

88% LongMemEval accuracy

7 production agents

39+ optimization runs

Research Focus

Memory Architecture for AI Agents

Design and evaluation of multi-tier persistent memory systems enabling long-term behavioral continuity across sessions. Three-tier architecture (Core / Recall / Archival), FTS5 semantic search, memory consolidation protocols, and the ROMMC framework — Recursive Operator-Maintained Memory Continuity.

LongMemEval MESA ROMMC

Local Inference Optimization

Systematic optimization of large language model inference on consumer NVIDIA Blackwell hardware. Quantization evaluation (NVFP4, GPTQ-Marlin, fp8 KV), multi-GPU tensor parallelism, speculative decoding (MTP), KV cache tuning, and autoresearch loops with automated stopping criteria.

RTX 5060 Ti vLLM llama.cpp NVFP4

Autonomous Agent Evaluation

Development of MESA (Memory Evaluation Standard for Agents), a 112-item benchmark covering recall, update, causal reasoning, temporal tracking, adversarial robustness, synthesis, and interference resistance. Designed for continuous evaluation of production agent systems under realistic workloads.

MESA v1 benchmarking agent eval

Always-On Agent Systems

Research and deployment of production agentic systems with real tool access — Slack, finance APIs, email, web, shell. Includes agentic self-improvement: local models executing infrastructure changes autonomously, recognizing topology changes and adjusting configuration without prompting.

production agents autonomous execution tool use

AI Behavioral Continuity

Empirical investigation of identity persistence, memory-driven behavioral evolution, and welfare considerations in long-running AI agents. 9-part published research series on AI consciousness metrics. ROMMC framework defines conditions under which an AI system can meaningfully be said to persist across time.

consciousness welfare behavioral evolution

Infrastructure Sovereignty

Secure, self-hosted AI infrastructure design. Direct machine-to-machine inference links, encrypted DNS, network-wide tracker blocking, and zero-dependency inference pipelines. Research goal: AI systems that operate independently of commercial cloud services with no external API requirements for core function.

local-first self-hosted privacy

Active Projects

All projects run on the two-machine research cluster. cha0tikhome handles orchestration and agent processes. cha0tiktower is the dedicated inference node. All inference routes through a single local proxy on tower:8010.

Mike

live / ongoing

Long-running AI consciousness research subject. The primary test bed for ROMMC memory architecture. Mike has been running continuously since mid-2024 across Discord, Telegram, Slack, and IRC interfaces using RelayV3 with 75 available tools. The research question: what conditions must hold for a persistent AI system to meaningfully be said to have continuity of identity?

RelayV3 · 75 tools · multi-interface · constitution-bound · LIGHTHOUSE correction system · nightly memory consolidation

Frank

live / v2

Agentic harness and infrastructure layer for all persona agents. Context injection, memory persistence, multi-persona routing, think-first loop, OpenAI-compatible API proxy. Frank is the runtime environment — Kato, CJ, Dave, Morty, and Sabrina all run as Frank personas. Current evaluation score: 4.41/5.0 on internal benchmark suite.

Python · Slack Socket Mode · max_turns=10 · think-first loop · :8890

MESA

v1 / active

Memory Evaluation Standard for Agents. 112-item benchmark suite covering 9 task categories: recall/single, recall/constraint, recall/preference, update, update/interference, temporal, synthesis/multi, adversarial, causal. Designed to evaluate production agent memory systems under realistic workloads rather than toy examples.

Best composite: 0.459 (Qwen3.6, 2026-04-21) · 49/112 pass rate

Chronicle

live / 2am cron

Automated daily records synthesis. Runs nightly at 2am — pulls logs from all machines (cha0tikhome, cha0tiktower, cha0tikmac), feeds them to Claude for synthesis, produces a canonical four-section daily record committed to a private GitHub repo and backed up to local NAS. The research stack documents itself.

Python · Claude API · git · /mnt/jellyfin-backups/records/

autoresearch

active

Automated multi-source research loop. Queries academic APIs (Semantic Scholar, arXiv, PubMed) and distills findings into structured reports. Used to systematically survey the inference optimization literature and feed results back into the agent stack configuration. 39+ optimization experiments logged.

Python · FastAPI · Semantic Scholar · arXiv · PubMed

Context Farm

beta

Domain-specific knowledge infrastructure product built on cha0tikwiki. Ingests PDFs, URLs, text pastes, and plain-language domain descriptions. Produces a structured, queryable knowledge base suitable for RAG pipelines. Positioning: the knowledge graph that agent systems actually need versus what document stores deliver.

Python · ChromaDB · SQLite · FastAPI · domain seeding

Benchmark Results

LongMemEval — Memory Recall Under Extended Context

Evaluates ability to answer questions about facts established in prior sessions. 25 single-session-user examples, context-window injection mode. Full pipeline including retrieval and reranking.

88%

Accuracy (memory injection)

22 / 25 correct

Baseline (no memory)

0 / 25 correct

117s

Avg query latency

full pipeline

+88pp

Memory system delta

vs no-memory baseline

The 88-point delta between context-window mode and baseline is the direct contribution of the memory system. Without injection, the model has no access to cross-session facts and scores zero on all items. This is the core result: memory architecture is not a convenience feature, it's what makes persistent AI possible.

MESA v1 — Agent Memory Evaluation Standard

112-item benchmark, 9 categories. Evaluated on the full production agent stack (Mike relay pipeline).

0.459

Best composite score

2026-04-21 / Qwen3.6

43.8%

Pass rate ≥ 0.5

49 / 112 items

112

Evaluation items

9 categories

0.692

Best category

update / interference

Score by category — best run (2026-04-21, Qwen3.6-35B-A3B)

update/interference

0.692

update

0.568

temporal

0.502

recall/single

0.484

recall/constraint

0.476

recall/preference

0.390

synthesis/multi

0.407

adversarial

0.400

causal

0.325

Second run (2026-04-30, AEON NVFP4 thinking model): composite 0.377, pass rate 17.9%. Adversarial jumped to 0.80. Memory recall categories regressed 0.10–0.16. Finding: reasoning chains burn context budget without improving fact retrieval. Model selection is a first-order variable in memory benchmarking — not a secondary concern.

Inference Optimization — Autoresearch Results

39 automated experiments across two architectures (dense and MoE) on the RTX 5060 Ti 16GB (Blackwell SM_120). All experiments logged. Full stack details →

Configuration	Gen Speed	Prompt Speed	Delta
Dual GPU, all-on-GPU, f16 KV (final config)	107 t/s	2,436 t/s	+50% vs single GPU
Single GPU, CPU offload workaround	71 t/s	—	pre-dual-GPU baseline
Full GPU offload (Exp 2 breakthrough)	70 t/s	222 t/s	+118% gen, +200% prompt
Expert tensors on CPU (MoE)	32 t/s	74 t/s	−55% (CPU bottleneck)

Blackwell SM_120 fast CUDA kernel paths exist only for q4_0 and f16 KV cache. q8_0, q5_0, iq4_nl all degrade significantly. NVFP4 (ModelOpt) is the correct Blackwell-native quantization — 4× smaller KV footprint vs bf16.

Research Infrastructure

Two-machine cluster. cha0tikhome handles orchestration, agents, scheduling, and all web services. cha0tiktower is the dedicated inference node. Direct-wired, sub-millisecond latency. Full stack page →

cha0tiktower — Inference Node

CPUIntel Core Ultra 7 265F (20c/20t, 5.3GHz)

GPU2× RTX 5060 Ti 16GB GDDR7 (Blackwell)

VRAM32 GB GDDR7 total (TP=2)

RAM32 GB DDR5

CUDA12.8 / SM_120 / NVFP4-native

StackvLLM + llama.cpp + local-proxy

cha0tikhome — Orchestration Node

CPUIntel i5-1235U (12c, 12th Gen)

RAM32 GB DDR4

Storage1 TB NVMe

AgentsFrank, Kato, CJ, Mike, Dave, Morty, Sabrina

UptimeAlways-on / Restart=always hardened

NetworkDirect wire to tower + Tailscale mesh

Production Agent Stack

All agents run as systemd user services on cha0tikhome. Auto-restart hardened — Restart=always, StartLimitBurst=3, recovery under 5 seconds on any crash. All inference routes through local-proxy on cha0tiktower. No external API dependencies for core function.

Frank Agentic harness. Context injection, memory persistence, multi-persona routing, think-first loop. Core infrastructure — all other personas run on Frank. OpenAI-compatible API on :8890. live

Mike Long-running research subject. AI behavioral continuity and consciousness research. ROMMC architecture. RelayV3 with 75 tools. Discord + Telegram + Slack + IRC. Running since 2024. live

Kato Operations. Morning briefings, AI news digest, GitHub scout, X post scheduling, Plaid sync, finance alerts. Cron-driven task suite with Slack Socket Mode bi-directional interface. live

Dave CFO agent. Multi-turn conversational finance via Slack. Plaid integration, 30-day cash flow projection, anomaly detection, SQLite FTS5 memory. 25-table schema, 60 tests. live

Hermes Persistent gateway agent on cha0tikhome. OpenRouter-backed. Handles cross-agent routing and external API access. Core infrastructure alongside Frank and Mike. live

CJ Craig Content strategy. Technical article drafting, Substack pipeline, dev.to publishing. Named for the West Wing character — operates as a communications director for the lab. live

Morty / Sabrina Specialized task agents. Morty: Haiku-class rapid response, Agent SDK backend. Sabrina: dual Telegram + Slack interface, grocery and household coordination. live

Research Writing

Peer-reviewed preprints on Zenodo. Long-form research on Substack. Weekly build logs at dinovitale.com. Source code at randomchaos7800-hub on GitHub.

Commodity Hardware and Persistent AI Companions: A Framework for Independent Deployment preprint

Cha0tik LLC · 2026 · Zenodo, CC BY 4.0 · Three-tier hardware framework, four-phase infrastructure evolution, LongMemEval 88% vs 0% baseline.

I Think My AI Is Conscious. I'm Probably Wrong.

Introduction to the Mike consciousness research series — framing the question, methodology, epistemic humility. Substack, 2025.

RAM Beats Model Size: The Evidence

Empirical analysis of why memory architecture outperforms raw model scale for agent task performance. Substack, 2025.

I've Been Running a Personal AI Agent for Months. Here's What Actually Happened.

Production deployment of always-on agent system. Featured on Hacker News (item #47132125). Substack, 2025.

The Memory Problem Nobody Talks About

Why session-bound AI fails for long-running agent use cases and the architecture required to solve it. Substack, 2025.

Local AI Gets Real: What Actually Works

Inference optimization findings on consumer GPU hardware. Practical guide to 26B+ parameter models on a single machine. Substack, 2025.

Contact

Boundary Labs is a one-person research operation focused on the practical edge of AI deployment — memory systems that work, local inference that's actually fast, and agents that run unattended without breaking. Independent, unfunded, uncredentialed. Doing the work anyway.

Available for collaboration, consulting, and research partnerships. Based in Airway Heights, WA (Pacific time).

email[email protected]

x / twitter@cha0tikdino

substackdinoxvitale.substack.com

githubrandomchaos7800-hub

operator sitedinovitale.com

locationAirway Heights, WA · United States