Riverforge — a local AI coding agent for VS Code

// follow the current

Everything a coding agent should do — on your own machine.

01// private by design

Your code never leaves your PC.

Every model call, edit and memory write runs locally through Ollama. There’s no server to send anything to — and you can prove it with netstat.

$ netstat -ano | findstr :8765
TCP 127.0.0.1:8765 LISTENING — local only
$ curl 127.0.0.1:8765/ready 200 OK (no calls out)

02// vibe-code it

Say what you want. It builds it.

Vibe-code on your own machine: describe the change in plain English — “add retry logic and a test” — and Riverforge plans it, edits the files, runs the tests and hands you the diff. No incantation-prompts, no copy-paste, no cloud — and it won’t say “done” until the tests actually pass.

recall→plan→edit→verify→remember

03// it proves its work

An agent that doesn’t bluff.

Every edit shows up as a diff and gets a second, near-deterministic review. A run won’t finish until your tests or linter pass, and it can’t cite a file it never opened or describe a deliverable that’s still a placeholder. Approve each step, or let it run on autopilot — and undo a whole run in one click.

write src/http.py? · +18 −2

Approve~~Skip~~

diff reviewtests must passno phantom editsone-click undo

04// a memory that grows

A brain that remembers — and barely slows down.

SQLite is the source of truth and FAISS does the search, split across three scopes — you, this project, your research. Recall is ~0.36 ms, and still ~3 ms across a million memories. Facts strengthen each time they’re used and gently fade when they’re not — and nothing is ever hard-deleted.

recallyou prefer pytest, type hints & small commits

[2 days ago] · strength · recalled 9×

recallhttp client now retries (3×, backoff)

[updated from last week] · project brain

brainprojectresearcheach: knowledge · conversation · codebase

SQLite truth (rebuilds without Ollama) · ~0.36 ms → ~3 ms at 1M · 60-day soft decay

05// an identity it earns

An assistant that becomes yours.

It starts almost blank and earns a consistent character from real work with you — learning your style and forming its own opinions, modelling you as richly as itself. The voice has personality; the part that edits your files stays strict and predictable.

seed→grow→reflect→evolve

06// the whole toolbox

79 tools — it can touch your whole stack.

Read and edit files, run shells and tests, drive git, query a database, call an HTTP API, search the web, write to memory — 79 built-in tools, each checked before it runs. You never name them: just say what you want and Riverforge picks the right one. Need more? Register your own HTTP tools.

read · editshelltests · lint · buildgithttp · sqlwebmemory+ custom HTTP

every call validated before it touches disk

07// it reads the whole repo

16K of context that reads like 100K.

Clever context engineering — an append-only evidence menu, large-file outlines and cross-run recall — lets it work across codebases far bigger than its window without losing the thread or guessing at what it hasn’t seen.

evidence menularge-file outlinescross-run recall

audits projects 100×–10,000× its context window

08// research, when you ask

Offline by default. Online on request.

It needs no internet to work. Ask it to research and it searches the web, reads the real pages and cites them — then files what it learned into a separate research brain with the source URL, a hash and a confidence score, ready to reuse.

web searchfetch & readimage searchingest url · pdf

web tools stay off until you ask · provenance kept

09// where you use it

VS Code, a tray, a CLI — even your phone.

A dockable VS Code panel with diffs, image cards, attachments, undo and an // AI? comment watcher. A tray app that frees your VRAM for a game in one click. A live memory visualiser, a full command line, and phone access over an outbound relay that opens no ports.

VS Codetray · pause VRAMvisualiserCLIphone relay

10// any model, any rig

Built for 8 GB. Scales as you do.

Point it at any Ollama model — Gemma, Qwen, Llama, your own fine-tune — and it adapts sampling, prompts and reasoning (off / on / adaptive). Designed for an 8 GB-VRAM, 32 GB-RAM machine; give it more and it scales straight up — bigger models, longer context, a deeper brain. It gets smarter as your hardware does.

Gemma 4 TurboQwenLlamaPhiDeepSeek-R1your fine-tune

8 GB · 32 GB→16 GB→24 GB+→more capable

// the specs

Built lean, on purpose.

runtimea LangGraph pipeline, run locally

inferenceOllama on your GPU · any local model

memorySQLite + FAISS · 3 scopes × 3 tiers

identityan evolving AI brain, earned over time

tools79 built-in + custom HTTP

runs inWindows app · VS Code · tray · CLI

controlapproval · diff review · verify · rollback

hardwarefrom 8 GB VRAM + 32 GB RAM — scales up

data sent outnever

made byAurasoft, UK

Code with an agent that’s yours.

One Windows installer and a VS Code extension. It sets up Ollama, pulls a model, and you’re pair-programming with a fully private agent in minutes. Not released yet — free, in private beta, public download coming soon.

Coming soon — Windows installer Coming soon — VS Code extension