boundary.labs
Boundary Labs  /  Partnership Overview

Validated local AI systems research with clear next-step compute leverage.

Boundary Labs is an independent research operation focused on persistent agent memory, local inference optimization, and production-grade autonomous systems on constrained hardware. The work is already live, measured, and publicly documented. Additional compute would not create direction; it would accelerate an existing agenda with defined bottlenecks and concrete outputs.

117tok/s current local peak
155+experiments logged
88%LongMemEval best run
3active research tracks

The lab already operates as a real system rather than a proposal. One tier handles orchestration, agents, monitoring, and web services; another is dedicated inference infrastructure. Results are generated on live hardware, published publicly, and tied to specific deployment constraints rather than idealized benchmarks.

Persistent Agent Memory

Adam Selene and Mike provide the memory-architecture side of the lab: long-running behavior, continuity across backend changes, nightly consolidation, and evaluation against memory-specific benchmarks including the current MESA public release.

Inference Systems Research

Blackwell consumer-GPU work across `vLLM`, `llama.cpp`, TRT-LLM constraints, NVFP4 behavior, PCIe tensor parallelism, and stack-specific failure modes under production conditions.

Operational Validation

Services are not only benchmarked but run continuously. Recovery, boot hardening, watchdogs, and topology-aware routing are part of the research surface rather than an afterthought.

Most public AI evaluation still centers on model quality in isolation. Boundary Labs focuses on the layer that determines whether a model can function as part of a durable system: memory architecture, inference deployment, runtime continuity, and cost-realistic operation on accessible hardware.

The output is useful beyond this lab because it produces operational knowledge that other independent labs, small teams, and applied researchers can reuse: what works on consumer Blackwell, where current stacks break, which optimizations are real, and what architectural patterns preserve agent continuity across changing model substrates.

ConstraintCurrent effectWhat it limits
GPU memory ceiling32 GB total on the current dual-GPU towerLarger model sweeps, wider quantization matrices, and replication across higher-context deployments
Single-node scopeMost experiments are validated on one local inference clusterCross-environment replication and stronger claims about portability
Throughput budget for evaluationLong-form eval suites like the 361-item MESA gold set consume meaningful wall-clock time on local hardwareLarger benchmark matrices, longer longitudinal studies, and more frequent regression testing
Cloud-scale comparison gapExcellent local numbers, limited systematic cloud-side contrastMore complete deployment guidance across local-first and elastic compute tiers
modest support
Higher-frequency eval reruns, broader quantization and context-window sweeps, publication-quality comparison tables for the current local stack, and tighter regression monitoring across active agent systems.
mid-tier support
Cross-stack replication across local and rented compute, wider model-family coverage, larger benchmark campaigns, and more systematic release of benchmark artifacts, deployment notes, and compatibility findings.
substantial support
Longitudinal multi-model memory studies, stronger cloud-vs-local deployment analysis, full benchmark suites across several inference stacks, and a more complete open reference set for independent labs deploying persistent agents.
The point is not speculative scale. The point is to take an already functioning research program and increase experimental breadth, replication quality, and output cadence.
WindowOutputForm
30 daysExpanded compatibility and optimization findings for the active inference stackPublic benchmark update, technical notes, reproducible configs
60 daysCross-model or cross-environment evaluation set with documented bottlenecksArtifact release, benchmark tables, implementation notes
90 daysPublication-grade synthesis of memory, inference, and deployment findingsPreprint, public benchmark corpus, deployment guide
benchmarks
Live benchmark record for the local inference stack, including Blackwell-specific findings and optimization history. View benchmarks
stack
Current hardware, network topology, inference routing, and service layout. View stack
preprint
Commodity-hardware framework and persistent-agent operational analysis. View papers
memory system
Adam Selene documents the persistent memory architecture side of the lab, including the benchmark context MESA was built from. View Adam Selene
operational memory
Context Farm documents the lab's local-first operational memory direction for small teams and agent fleets, including the current demo-domain and structured retrieval work. View Context Farm

Boundary Labs is not presenting a blank-slate idea in search of resources. The core posture is different: the systems already run, the benchmark record already exists, and the public output is already underway. Additional compute would improve the depth, speed, and portability of that work.

The best fit is support that values transparent technical reporting, operational realism, and open artifacts over inflated claims. The lab is optimized for applied research output: measurements, deployment findings, benchmark corpora, and architecture notes that other operators can actually use.

Boundary Labs is available for research partnerships, compute-backed collaboration, and infrastructure-oriented sponsorship aligned with open technical output.