Independent research on persistent memory systems for AI agents, inference optimization on consumer NVIDIA Blackwell hardware, autonomous agent evaluation, and AI behavioral continuity. One researcher. Live hardware. Production systems. Real numbers.
Boundary Labs runs as a compact local research cluster with a clean architectural split: orchestration and long-running services on one node, dedicated GPU inference on another. The current production backend is Nemotron 3 Nano 30B (llama.cpp) and Qwen3.6-27B INT4 Genesis (vLLM) behind a single internal inference gateway. All work is operational first, then documented publicly.
| Component | Current State | Notes |
|---|---|---|
| Inference tier | Dedicated GPU node | Dual RTX 5060 Ti 16GB Blackwell — research and serving host |
| Orchestration tier | Service and agent node | Agents, web, scheduling, monitoring, recovery |
| Primary backends | Nemotron 3 Nano 30B + Genesis | Served through a single internal gateway with local-first routing |
| Peak local throughput | 117.60 t/s | llama.cpp cuda128-clean, measured 2026-05-20 |
| Active lab services | harness, Mike, Hermes, autoresearch, chronicle | See stack for full service placement |
Six active research tracks spanning persistent memory, hardware optimization, evaluation methodology, and the theoretical foundations of AI continuity.
Design and evaluation of multi-layer persistent memory systems enabling long-term behavioral continuity across sessions. Entity knowledge graph with category-specific decay scoring, LIGHTHOUSE reasoning journal, working memory investigation threads, tacit knowledge layer, and nightly four-phase consolidation — cross-layer replay, decay archival, pattern promotion, and contradiction resolution.
LongMemEval MESASystematic optimization of large language model inference on consumer NVIDIA Blackwell hardware. Quantization evaluation (NVFP4, GPTQ-Marlin, fp8 KV), multi-GPU tensor parallelism, speculative decoding (MTP), KV cache tuning, and autoresearch loops with automated stopping criteria.
RTX 5060 Ti vLLM llama.cpp NVFP4Development of MESA (Memory Evaluation Standard for Agents), a benchmark framework for evaluating agent memory under realistic workloads. The current public release includes a 361-item gold set across 9 task types, plus a smaller probe battery for fast diagnostic loops and failure taxonomy.
MESA v0.7 benchmarking agent evalResearch and deployment of production agentic systems with real tool access — Slack, finance APIs, email, web, shell. Includes agentic self-improvement: local models executing infrastructure changes autonomously, recognizing topology changes and adjusting configuration without prompting.
production agents autonomous execution tool useEmpirical investigation of identity persistence, memory-driven behavioral evolution, and welfare considerations in long-running AI agents. 9-part published research series on AI consciousness metrics. Core question: what conditions must hold for a persistent AI system to meaningfully be said to persist across time?
consciousness welfare behavioral evolutionSecure, self-hosted AI infrastructure design. Direct machine-to-machine inference links, encrypted DNS, network-wide tracker blocking, and zero-dependency inference pipelines. Research goal: AI systems that operate independently of commercial cloud services with no external API requirements for core function.
local-first self-hosted privacyAll projects run on the local research cluster. One tier handles orchestration and agent processes; the other is dedicated inference infrastructure. All inference routes through a single internal gateway.
Long-running AI consciousness research subject. The primary test bed for the multi-layer persistent memory architecture. Mike has been running continuously since mid-2024 across Discord, Telegram, Slack, and IRC interfaces using RelayV3 with 75 available tools. The research question: what conditions must hold for a persistent AI system to meaningfully be said to have continuity of identity?
Model-agnostic persistent memory architecture extracted from the Mike deployment and generalized for independent use. Benchmarked at 75% on LongMemEval using a mid-tier model — above Mem0+GPT-4o (67.6%). Memory travels with the agent, not the inference backend. MIT-licensed.
Agentic harness and infrastructure layer for the home lab agent stack. Context injection, memory persistence, multi-persona routing, think-first loop, OpenAI-compatible API proxy. The harness is the runtime environment for active services and internal personas used for orchestration, research, and household workflows.
Memory Evaluation Standard for Agents. The current public benchmark release uses a 361-item gold set across 9 task categories, with explicit support for official pure-injection baselines versus full-production diagnostic runs. A smaller probe battery is used for fast morning checks and failure-mode diffs.
Operational memory product for small teams and agent fleets. Captures tribal knowledge, structures rules and exceptions, and serves grounded context back through local-first retrieval and briefings. Current direction is anchored by a manual service-dispatch demo domain and the internal knowledge pipeline already running at Boundary Labs.
Automated daily records synthesis. Runs nightly, pulls logs from the operating environment, feeds them to Claude for synthesis, and produces a canonical four-section daily record committed to a private GitHub repo and backed up to local NAS. The research stack documents itself.
Key measurements from the active research program. All numbers from production hardware and documented experimental runs. Full tables, charts, and experiment history on the benchmarks page.
Full benchmark record, optimization history, and quality evaluation →
Compact two-tier local research cluster. Inference node carries dual RTX 5060 Ti 16GB (Blackwell SM_120) with 32 GB GDDR7 total. Orchestration node handles agents, web, scheduling, and monitoring. Direct-wired, low-latency, and documented in full on the stack page.
Full stack — inference configuration, optimization log, agent infrastructure →
Core agent processes run as hardened background services with auto-restart and post-boot recovery. All inference routes through a single internal gateway. Core runtime behavior is local-first, with external APIs used selectively for surrounding workflows.
Peer-reviewed preprints on Zenodo. Long-form research on Substack. Weekly build logs at dinovitale.com. Source code at randomchaos7800-hub.
Consumer Blackwell benchmark record — 210+ experiments, 12 model families. Optimization deltas, quality evaluation suites, compatibility notes, and documented stack-specific failure modes.
Model-agnostic persistent memory system. Benchmarked, deployed, and MIT-licensed. The memory architecture that makes AI agent identity portable across model swaps and provider changes.
Operational research on persistent AI companions, memory architecture, and commodity-hardware deployment. Zenodo preprint with full methodology, deployment history, and benchmark results.
A sponsor-facing summary of what the lab is already doing, what is blocked today, and what additional compute would unlock. Current stack, bottlenecks, likely outputs, and collaboration posture.
Boundary Labs is an independent AI research operation at the practical edge of agent deployment — persistent memory architectures, local inference optimization on consumer NVIDIA hardware, and autonomous agent systems built to run without intervention. All results are grounded in live systems and public technical artifacts.
Available for collaboration, consulting, grant partnerships, and hardware sponsorship. Based in Airway Heights, WA (Pacific time).