continuity · substrate · open source

The continuity layer for everything you do with AI.

One open source server. Every tool you use, every device you work from. No cloud rental, no vendor lock-in, and the continuity is yours.

$ gh repo clone nram-ai/nram Read the philosophy

where it started

I was copying conversations between Claude and ChatGPT, generating handoff docs, re-explaining the same decisions over and over. That's when it hit me: I was the continuity layer between every AI tool, had to be it for them, and worst of all, I'm lossy too.

Brandon Lehmann, creator of Neural Ram

the job you didn't sign up for

You're already doing this by hand.

You've re-explained the same project to a blank chat more times than you can count. You keep a doc of decisions so the next tool can catch up, and it's stale by the time you've switched. The sharpest thing you worked out last week is buried in a conversation you'll never scroll back to. The carrying, the re-explaining, the copy-paste from one window to the next: that's a job, and right now it's yours.

Nobody's good at it. You lose the thread, you lose the nuance, you lose the best version of the idea, and you never notice the moment it slips. Whether you're wiring up agents or just living across a dozen tabs, nram keeps the thread.

a layer, not an app

Your agent reads. nram remembers.

Memory today lives inside one tool: one app, one agent, one vendor. nram is the layer underneath them instead, not another memory app bolted onto one of them. Researching on a laptop, coding on a desktop, drafting on a tablet, picking it back up on your phone, switching between Claude, ChatGPT, Grok, Mistral, Perplexity, Cursor, and your own scripts: none of that should reset the work every time you change rooms.

Your agent already reads the PDF, watches the video, runs the test, scrapes the page. nram's job is to keep what mattered. Across every tool. Across every conversation. On infrastructure that belongs to you.

one substrate, many jobs

One server, not four separate tools.

A single server covers work that today is split across four separate products: conversational memory, document and corpus recall, standing rules, and agent state. One substrate does all of it, so there's nothing to stitch together and nothing to keep in sync.

Conversational continuity

Memory that survives across sessions, tools, and vendors, reachable over MCP.

Document and corpus recall

Semantic search and an entity-deduped knowledge graph over your stored corpus. A substrate, not a chat UI.

Procedural rules

Verbatim standing rules and conventions an assistant loads at session start. Returned byte-for-byte, never paraphrased.

Agent memory

Persistent memory for coding, research, and custom agents, with consolidation and a knowledge graph on top.

like sleeping on it

The best thinking happens offline.

You've felt it. The fix that arrives in the shower, the connection that surfaces on a walk, the problem that's somehow simpler after a night's sleep. Your mind keeps working when you step away from it, sorting what counts from what doesn't, settling what was left unsettled.

nram does the same. What matters carries forward, across tools, across devices, across weeks. While nram sits idle, it dreams: folding in what's new, resolving contradictions instead of stacking them, letting the stale fade. You come back to memory that's been refined while you were gone, not just stored.

what "self-hosted" actually means

A server, not a script.

"Self-hosted" in this space usually means a Python library you embed, a localhost shim with no auth, or an open-source wrapper sitting on rented infrastructure. None of them survive the moment "self-hosted" was supposed to matter. nram is a real server.

not a library
A single MIT binary you run as a server. SQLite with a pure-Go vector index by default, Postgres with pgvector or Qdrant at scale. One command registers it as a real background service on macOS, Linux, or Windows, and it comes back up on its own.
not a localhost shim
OAuth 2.0 with PKCE and dynamic client registration. WebAuthn passkeys. Per-org OIDC SSO. Your laptop, desktop, and phone see the same brain.
not single-user
Organizations, projects, hierarchical namespaces. RBAC across five roles. Your server is shared, your memories stay yours.
not stdio-only
MCP over Streamable HTTP. Plus REST, SSE with reconnect, signed webhooks, Prometheus at /metrics. Every tool you use shares one server.
not locked to one vendor
Runs on OpenAI, Anthropic, Gemini, Ollama, OpenRouter, vLLM, SGLang, llama.cpp, or any OpenAI-compatible endpoint. Swap providers without moving your memory.
not a black box
A Web Console for organizations, projects, providers, the knowledge graph, the dreaming cycle, and per-model cost. A guided setup walks a fresh install through the parts it needs, and after that you can see exactly what your memory is doing.

under the hood

More than a database with vector search.

nram does the part of memory that's actually hard.

Hybrid recall

Vector + lexical (FTS5, tsvector), fused with reciprocal rank fusion
Seven ranking terms, each tunable per project
Multi-vector facets score multi-topic memories per topic
An MMR diversity pass drops near-duplicate clusters
Multi-part questions split into focused sub-queries
An optional relevance pass demotes off-topic hits

Knowledge graph

Entities and relationships, extracted in two passes
An ingestion judge decides add / update / delete / none on near-duplicates
Multi-hop traversal, operator-tunable depth
Visualized in the Web Console

Dreaming

Fourteen phases, only when nram's idle and something changed
Entity dedup; embedding, augmentation, facet backfill
A coverage sweep catches anything left unprocessed
Paraphrase fold, transitive inference, contradiction detection
Consolidation with novelty audit, pruning, weight recalc
Contradictions resolved, not stacked; memory stays current

Memory tiers

Procedural tier: standing rules stored verbatim, never embedded
Persona (about_me) tier: identity and preferences, on every recall
Global tier: world-knowledge across every project
Project tier: the full enrichment and dreaming pipeline

Grounded synthesis

The ask tool returns one cited answer, not a page of results
Top hits expand along the knowledge graph
Inline citations on every answer, back to the memories behind it
A grounding score says how well it's anchored in what you stored
Ungrounded? It says so, instead of making something up
Opt-in, with its own provider slot

Provenance

Every memory keeps its source
Updates supersede, never overwrite; the history stays
Every dreaming cycle writes an audit log
Runs on your hardware: nothing happens you can't see

open source. first, last, forever.

Open source. No asterisks.

nram is free and open software, and always will be. You can run it, read it, change it, and build on it, for anything you want. No one, us included, can ever take that back. No bait and switch. No "enterprise edition" that hides the features you actually need behind a contract. The substrate stays open. The economics stay aligned with the people using it.

Because it runs on your infrastructure, nothing happens to your memory you can't see. Every memory keeps its source and lineage, nothing gets quietly overwritten, and each dreaming cycle writes an audit log. You can always trace why nram knows what it knows.

Run it.

Download it and go. SQLite by default, no signup, no cloud. Then hand it to your operating system so it starts with the machine and stays up. Prefer to build it yourself? Go 1.26+, Node 18+, and a few minutes.

$ tar xzf nram_*.tar.gz && ./nram

$ sudo ./nram service install && sudo ./nram service start

Grab a prebuilt package for macOS, Linux, or Windows →Or clone the repo and build from source →

about the name

Working memory is what your brain holds while you're working. It moves between tools without you thinking about it. Neural Ram is that, for your AI.