How a Local RAG Experiment Turned Into an MCP Workflow

This project started from a pretty personal use case.

There was this very technical person I follow who would go live on YouTube from time to time. He has a ton of experience, and would casually drop really good insights about software architecture, engineering tradeoffs, and the kind of lessons you usually only learn after years of doing the work.

He also posts shorter clips, but I wanted something else: I wanted that knowledge to be always there, queryable whenever I needed it.

At the same time, I was also trying to understand what RAG actually is in practice, and how to learn applied AI by building something real instead of just reading about it.

If you want to take a look first, the project is open source here: github.com/buralog/beyin.

So the idea behind beyin was simple:

Take videos, podcasts, and files I care about, turn them into searchable context, and make that knowledge available while I work.

The first assumption: this has to be fully local

My first thought was that if I wanted to query my own data locally, then I probably needed a local LLM too.

So I looked into Ollama and thought, alright, I can build this on top of that and just run everything on my machine.

At that point I also had some pretty wrong assumptions about local models and resource usage.

The first version worked, but it felt underwhelming

After building the first version, it technically worked.

Retrieval itself was useful, but the final answer did not feel as smart as I expected.

I use Codex and Claude Code a lot in my daily workflow, so maybe I was unfairly expecting something that felt more intelligent, or at least looked that way.

That was the point where I started asking a more useful question:

If retrieval is already doing the valuable part, what exactly is the local model adding?

The shift that made it click

After a lot of testing with agents, I realized something kind of obvious.

The Ollama part was mostly just taking the retrieved chunks and turning them into a proper answer. And if that is the job, why couldn't an agent do the same thing?

So I tried wiring it through MCP.

That was the moment where the project really clicked.

The answers became much better structured, the whole thing felt smarter, and more importantly, it fit directly into how I already work. Instead of having a separate tool where I go ask questions, the knowledge just becomes available inside the agent workflow itself.

The agent can retrieve it, use it, suggest things, and continue the task.

That was exactly what I wanted, maybe even better than what I had in mind when I started.

It stopped feeling like a demo

The best part for me is that once it is set up, it kind of disappears.

I just keep adding YouTube videos, podcasts, and files, and then that context becomes available while I am working with AI agents.

It stops feeling like a RAG demo and starts feeling like part of the actual workflow.

That is the part I like most about beyin.

It is not something I open to admire. It is something I keep feeding because it makes my everyday workflow better.

Wrap-up

What started as a small local RAG experiment ended up turning into something much more useful than I originally imagined.

And in the process, it taught me something practical about applied AI: sometimes the interesting part is not the model itself, but where the retrieval lives and how naturally it fits into the tools you already use.