Troubleshooting
Most problems come down to one of three things: the server isn’t running, Ollama isn’t reachable, or the context is mis-tuned. Start with the quick reference, then jump to the section you need.
Quick reference
| Problem | First thing to check |
|---|---|
| Chat says server offline | Open the tray, click Start Server, then check Status |
| VS Code command missing | Reinstall the VSIX and reload VS Code |
| Server starts but the model is slow | First response loads the model — use Resume VRAM to warm it |
| GPU needed for another app | Click Pause VRAM |
| Attachments don’t appear | Drop them on the chat/composer area, or use the + button |
| Test command blocked | Add the command name in the tray under Tool Config |
| Responses look truncated | Set OLLAMA_CONTEXT_LENGTH=16384 and restart Ollama |
| Ollama errors | Restart Ollama, then run tray Status again |
The chat says the server is offline
- Open the Riverforge Tray and click Start Server. Wait for the status log to show online.
- Check the header dot in VS Code — it turns green when the server is reachable. The extension reconnects on its own.
- Still red? Confirm the server is up from a terminal:
PS> (Invoke-WebRequest -UseBasicParsing http://127.0.0.1:8765/ready).StatusCode # 200 - If the tray also says offline, start it from the tray. If Ollama is down, restart it from its tray icon.
The VS Code command is missing
The installer can occasionally land the extension in a different VS Code profile than the one you use. Install the bundled VSIX into your window by hand: Extensions view → … → Install from VSIX… → pick riverforge-vscode.vsix → reload. After an upgrade, if the commands still look old, reinstall the latest VSIX. See Installation.
Responses are slow
- The first response after startup or a VRAM pause includes model warmup. Use Resume VRAM to warm the model before you need it.
- Check
nvidia-smiduring generation. If GPU utilisation is under 50%, layers are spilling to CPU — the model may be too large for your VRAM. - If every response cold-loads, confirm
OLLAMA_KEEP_ALIVE=-1andOLLAMA_MAX_LOADED_MODELS=3are set, then restart Ollama.
Output looks cut off
By default Ollama serves a 2048-token context and silently truncates anything longer. Set OLLAMA_CONTEXT_LENGTH=16384 on the Ollama service and restart it. Riverforge sets this for you during install — this only bites on a hand-tuned or source setup. See Models & Hardware.
“CUDA out of memory” / Ollama crashes
- Click Pause VRAM, close other GPU-heavy apps, then Resume VRAM.
- Confirm
OLLAMA_CONTEXT_LENGTH=16384andOLLAMA_MAX_LOADED_MODELS=3, then restart Ollama. - Stick with the default model unless you’re deliberately trying a larger one.
- If a large model still won’t load, switch back to the 4B default — it’s sized for an 8 GB card.
Riverforge can’t see Ollama
- Confirm Ollama is answering:
curl http://localhost:11434/api/tagsshould return JSON. - If a firewall is blocking localhost, add a rule for
ollama.exe. - Windows sometimes binds Ollama to IPv6 only — run
setx OLLAMA_HOST 127.0.0.1:11434and restart it. - Restart Ollama from its tray icon, then run the tray Status button again.
It edits files but the tests never pass
- Set the correct test/lint command in the tray under Tool Config.
- Run that command yourself from the project root to confirm it works outside the agent.
- Inspect the chat’s live tool rows for the exact failing command and output.
- If the project needs a specific executable, make sure it’s on PATH — any executable on PATH is allowed; only sensitive system paths are blocked.
Windows Defender flags Ollama
Defender occasionally quarantines ollama.exe after an update. Add %LOCALAPPDATA%\Programs\Ollama\ to your exclusions list and reinstall or restore Ollama.
Still stuck?
The tray’s Status button gives you a one-look snapshot of the server, Ollama, the loaded model and your tools — start there. For a deeper look, Riverforge keeps logs in your data folder; open it with the tray’s Open Data Folder button. Those logs are the most useful thing to include if you report a problem.