architecturecore

Phase 1: The Core — Why We Built memset

Every AI tool you use starts from zero. We set out to build a shared memory layer that changes that — one brain for every AI.

The Problem That Started Everything

I use AI every day. ChatGPT for brainstorming, Claude for writing, Cursor for code, Gemini for research. And every single session starts the same way:

"I prefer concise answers with code examples." "This project uses PostgreSQL with pgvector." "I'm building a SaaS product that..."

Sound familiar? Each tool treats you like a stranger. The conversation you had with Claude yesterday doesn't exist when you open ChatGPT today. That architectural decision you worked through in Cursor? Gone when you switch to VS Code.

AI tools have incredible capabilities, but they have zero memory across sessions and zero awareness of each other. You're the integration layer — and you're doing it manually, every time.

What We Set Out to Build

memset is a shared memory layer for every AI tool you use. The core idea is deceptively simple:

Store memories — preferences, knowledge, decisions, context — in one place
Make them searchable — not by filename or date, but by meaning
Inject them everywhere — so every AI you use already knows what you know

The technical challenge is the "by meaning" part. Traditional search is keyword-based. If you saved "always use connection pooling in PostgreSQL" and later search for "database best practices," keyword search finds nothing. Semantic search finds it instantly because it understands the relationship between concepts.

The Architecture

We chose a stack optimized for semantic operations at scale:

FastAPI (Python) — async, fast, great ecosystem for ML/embedding operations
PostgreSQL + pgvector — vector similarity search alongside relational data. One database, not two.
OpenAI embeddings — convert text memories into high-dimensional vectors for semantic matching
Redis — rate limiting, caching, and idempotency keys

The core data flow is straightforward:

User saves a memory
  → Text is embedded into a vector (1536 dimensions)
  → Vector + metadata stored in PostgreSQL
  → Tags, projects, and relationships auto-generated

User recalls a memory
  → Query text is embedded into the same vector space
  → pgvector finds the nearest neighbors by cosine similarity
  → Results ranked by relevance + recency + importance

Using pgvector inside PostgreSQL (instead of a separate vector database like Pinecone) was a deliberate choice. It means memories, user accounts, projects, and vectors all live in one database with one set of backups, one connection pool, and full SQL query capability. At our scale, it's simpler and more than fast enough.

The First Working Flow

The first thing we built was the simplest possible remember/recall loop:

POST /api/v1/memories
{ "content": "Always use async/await over .then() chains in this project" }

GET /api/v1/memories/recall?q=javascript promises approach
→ Returns the memory above with 0.89 similarity score

No UI, no extension, no fancy features — just an API that stores text and finds it by meaning. But the moment that first semantic search returned the right memory from a completely different set of words, we knew the core concept worked.

What We Learned

Building the core taught us something important: the hard part isn't the technology — it's the capture friction. If saving a memory takes more than 2-3 seconds, people won't do it. If recalling requires leaving your current context, people won't bother.

The API worked, but nobody wants to make raw HTTP calls while they're in the middle of a conversation with an AI. We needed to meet people where they already are — inside their AI tools.

That realization shaped everything that came next.

Phase 2: The Browser Extension — Meeting Users Where They Are