boundary.labs
Boundary Labs  ·  Airway Heights, WA  ·  Est. 2024

AI Memory Architecture
& Local Inference Research

Independent research on persistent memory systems for AI agents, inference optimization on consumer NVIDIA Blackwell hardware, autonomous agent evaluation, and AI behavioral continuity. One researcher. Live hardware. Production systems. Real numbers.

117 tok/s peak inference
88% LongMemEval accuracy
5 active lab services
210+ experiments logged

Boundary Labs runs as a compact local research cluster with a clean architectural split: orchestration and long-running services on one node, dedicated GPU inference on another. The current production backend is Nemotron 3 Nano 30B (llama.cpp) and Qwen3.6-27B INT4 Genesis (vLLM) behind a single internal inference gateway. All work is operational first, then documented publicly.

ComponentCurrent StateNotes
Inference tierDedicated GPU nodeDual RTX 5060 Ti 16GB Blackwell — research and serving host
Orchestration tierService and agent nodeAgents, web, scheduling, monitoring, recovery
Primary backendsNemotron 3 Nano 30B + GenesisServed through a single internal gateway with local-first routing
Peak local throughput117.60 t/sllama.cpp cuda128-clean, measured 2026-05-20
Active lab servicesharness, Mike, Hermes, autoresearch, chronicleSee stack for full service placement

Six active research tracks spanning persistent memory, hardware optimization, evaluation methodology, and the theoretical foundations of AI continuity.

Memory Architecture for AI Agents

Design and evaluation of multi-layer persistent memory systems enabling long-term behavioral continuity across sessions. Entity knowledge graph with category-specific decay scoring, LIGHTHOUSE reasoning journal, working memory investigation threads, tacit knowledge layer, and nightly four-phase consolidation — cross-layer replay, decay archival, pattern promotion, and contradiction resolution.

LongMemEval MESA

Local Inference Optimization

Systematic optimization of large language model inference on consumer NVIDIA Blackwell hardware. Quantization evaluation (NVFP4, GPTQ-Marlin, fp8 KV), multi-GPU tensor parallelism, speculative decoding (MTP), KV cache tuning, and autoresearch loops with automated stopping criteria.

RTX 5060 Ti vLLM llama.cpp NVFP4

Autonomous Agent Evaluation

Development of MESA (Memory Evaluation Standard for Agents), a benchmark framework for evaluating agent memory under realistic workloads. The current public release includes a 361-item gold set across 9 task types, plus a smaller probe battery for fast diagnostic loops and failure taxonomy.

MESA v0.7 benchmarking agent eval

Always-On Agent Systems

Research and deployment of production agentic systems with real tool access — Slack, finance APIs, email, web, shell. Includes agentic self-improvement: local models executing infrastructure changes autonomously, recognizing topology changes and adjusting configuration without prompting.

production agents autonomous execution tool use

AI Behavioral Continuity

Empirical investigation of identity persistence, memory-driven behavioral evolution, and welfare considerations in long-running AI agents. 9-part published research series on AI consciousness metrics. Core question: what conditions must hold for a persistent AI system to meaningfully be said to persist across time?

consciousness welfare behavioral evolution

Infrastructure Sovereignty

Secure, self-hosted AI infrastructure design. Direct machine-to-machine inference links, encrypted DNS, network-wide tracker blocking, and zero-dependency inference pipelines. Research goal: AI systems that operate independently of commercial cloud services with no external API requirements for core function.

local-first self-hosted privacy

All projects run on the local research cluster. One tier handles orchestration and agent processes; the other is dedicated inference infrastructure. All inference routes through a single internal gateway.

Mike live / ongoing

Long-running AI consciousness research subject. The primary test bed for the multi-layer persistent memory architecture. Mike has been running continuously since mid-2024 across Discord, Telegram, Slack, and IRC interfaces using RelayV3 with 75 available tools. The research question: what conditions must hold for a persistent AI system to meaningfully be said to have continuity of identity?

RelayV3  ·  75 tools  ·  multi-interface  ·  constitution-bound  ·  LIGHTHOUSE correction system  ·  nightly memory consolidation
Adam Selene live / open source

Model-agnostic persistent memory architecture extracted from the Mike deployment and generalized for independent use. Benchmarked at 75% on LongMemEval using a mid-tier model — above Mem0+GPT-4o (67.6%). Memory travels with the agent, not the inference backend. MIT-licensed.

LongMemEval 75%  ·  89% single-session recall  ·  58 agent tools  ·  MIT licensed
Agent Harness live / v2

Agentic harness and infrastructure layer for the home lab agent stack. Context injection, memory persistence, multi-persona routing, think-first loop, OpenAI-compatible API proxy. The harness is the runtime environment for active services and internal personas used for orchestration, research, and household workflows.

Python  ·  Slack Socket Mode  ·  max_turns=10  ·  think-first loop  ·  internal API
MESA v0.7 / active

Memory Evaluation Standard for Agents. The current public benchmark release uses a 361-item gold set across 9 task categories, with explicit support for official pure-injection baselines versus full-production diagnostic runs. A smaller probe battery is used for fast morning checks and failure-mode diffs.

Public gold set: 361 items  ·  9 task types  ·  probe battery for fast diagnostic loops
Context Farm beta

Operational memory product for small teams and agent fleets. Captures tribal knowledge, structures rules and exceptions, and serves grounded context back through local-first retrieval and briefings. Current direction is anchored by a manual service-dispatch demo domain and the internal knowledge pipeline already running at Boundary Labs.

Python  ·  SQLite  ·  ChromaDB  ·  FastAPI  ·  local-first inference
Chronicle live / 2am cron

Automated daily records synthesis. Runs nightly, pulls logs from the operating environment, feeds them to Claude for synthesis, and produces a canonical four-section daily record committed to a private GitHub repo and backed up to local NAS. The research stack documents itself.

Python  ·  Claude API  ·  git  ·  /mnt/jellyfin-backups/records/

Key measurements from the active research program. All numbers from production hardware and documented experimental runs. Full tables, charts, and experiment history on the benchmarks page.

117 tok/s peak — Nemotron 30B MoE, llama.cpp cuda128-clean, 2026-05-20
88% LongMemEval accuracy — memory injection mode, 22/25 correct
0% LongMemEval baseline — no memory; the delta is the architecture's contribution
75% Adam Selene LongMemEval — above Mem0+GPT-4o (67.6%)
4.84 Nemotron quality score / 5.0 — 25-probe suite across 5 capability categories
89.6% Genesis quality — 26/29 probes, 100% instruction following and tool calling
361 MESA public gold set — 9 task types, probe battery for fast diagnostics
210+ Autoresearch experiments — 12 model families on Blackwell SM_120

Full benchmark record, optimization history, and quality evaluation →

Compact two-tier local research cluster. Inference node carries dual RTX 5060 Ti 16GB (Blackwell SM_120) with 32 GB GDDR7 total. Orchestration node handles agents, web, scheduling, and monitoring. Direct-wired, low-latency, and documented in full on the stack page.

Inference Node

CPUIntel Core Ultra 7 265F (20c/20t, 5.3GHz)
GPU2× RTX 5060 Ti 16GB GDDR7 (Blackwell)
VRAM32 GB GDDR7 total (TP=2)
RAM32 GB DDR5 (expandable to 192 GB)
CUDA13.0.3 / SM_120 / NVFP4-native
StackvLLM + llama.cpp + internal gateway

Orchestration Node

CPUIntel i5-1235U (12c, 12th Gen)
RAM32 GB DDR4
Storage1 TB NVMe
Active servicesharness, Mike, Hermes, autoresearch, chronicle
UptimeAlways-on / Restart=always hardened
NetworkDirect-wired cluster + private mesh access

Full stack — inference configuration, optimization log, agent infrastructure →

Core agent processes run as hardened background services with auto-restart and post-boot recovery. All inference routes through a single internal gateway. Core runtime behavior is local-first, with external APIs used selectively for surrounding workflows.

Agent Harness Agentic harness. Context injection, memory persistence, multi-persona routing, think-first loop. Core infrastructure — all other personas run on the harness. OpenAI-compatible internal API. live
Mike Long-running research subject. AI behavioral continuity and consciousness research. Multi-layer persistent memory architecture. RelayV3 with 75 tools. Discord + Telegram + Slack + IRC. Running since mid-2024. live
Hermes Persistent gateway agent. Local inference via tower:8010 proxy — routes to Qwen3.6-27B INT4 genesis or Nemotron 3 Nano 30B depending on backend config. Handles cross-agent routing and external API access. No cloud inference dependency. live
autoresearch Automated multi-source research loop. Queries academic APIs (Semantic Scholar, arXiv, PubMed) and distills findings into structured reports. 210+ optimization experiments logged. active
Chronicle Nightly records synthesis. Pulls all machine logs, synthesizes to a canonical four-section daily record via Claude API, commits to private GitHub and backs up to local NAS. live / 2am

Peer-reviewed preprints on Zenodo. Long-form research on Substack. Weekly build logs at dinovitale.com. Source code at randomchaos7800-hub.

Cha0tik LLC  ·  April 2026  ·  Zenodo, CC BY 4.0  ·  Three-tier hardware framework, four-phase infrastructure evolution, LongMemEval 88% vs 0% baseline.
Introduction to the Mike consciousness research series — framing the question, methodology, epistemic humility. Substack, 2025.
Empirical analysis of why memory architecture outperforms raw model scale for agent task performance. Substack, 2025.
Production deployment of always-on agent system. Featured on Hacker News (item #47132125). Substack, 2025.
Why session-bound AI fails for long-running agent use cases and the architecture required to solve it. Substack, 2025.
Inference optimization findings on consumer GPU hardware. Practical guide to 26B+ parameter models on a single machine. Substack, 2025.
Inference Benchmarks live record

Consumer Blackwell benchmark record — 210+ experiments, 12 model families. Optimization deltas, quality evaluation suites, compatibility notes, and documented stack-specific failure modes.

Blackwell SM_120  ·  NVFP4  ·  llama.cpp  ·  vLLM  ·  TRT-LLM constraints
Adam Selene open source

Model-agnostic persistent memory system. Benchmarked, deployed, and MIT-licensed. The memory architecture that makes AI agent identity portable across model swaps and provider changes.

LongMemEval 75%  ·  above Mem0+GPT-4o  ·  58 tools  ·  constitutional constraints
Preprint & Papers public

Operational research on persistent AI companions, memory architecture, and commodity-hardware deployment. Zenodo preprint with full methodology, deployment history, and benchmark results.

Preprint  ·  methodology  ·  deployment history  ·  benchmark results
Partners overview

A sponsor-facing summary of what the lab is already doing, what is blocked today, and what additional compute would unlock. Current stack, bottlenecks, likely outputs, and collaboration posture.

Current stack  ·  compute bottlenecks  ·  likely outputs  ·  partnership posture

Boundary Labs is an independent AI research operation at the practical edge of agent deployment — persistent memory architectures, local inference optimization on consumer NVIDIA hardware, and autonomous agent systems built to run without intervention. All results are grounded in live systems and public technical artifacts.

Available for collaboration, consulting, grant partnerships, and hardware sponsorship. Based in Airway Heights, WA (Pacific time).