AMF · 10.16.91.74 | MME · 10.16.92.74 | Prometheus · 172.29.10.26:9099 | Grafana · :3000 | gemma4:e4b · Ollama :11434 | mistral:7b · search ready | phi3:mini · router ready | codellama:13b · code ready | llava:13b · vision ready | BB 6630 · GPS synced | NTP · 172.29.10.26 → BB 6630 | nomic-embed · vector DB ready | AMF · 10.16.91.74 | MME · 10.16.92.74 | Prometheus · 172.29.10.26:9099 | Grafana · :3000 | gemma4:e4b · Ollama :11434 | mistral:7b · search ready | phi3:mini · router ready | codellama:13b · code ready | llava:13b · vision ready | BB 6630 · GPS synced | NTP · 172.29.10.26 → BB 6630 | nomic-embed · vector DB ready |

5G AI Lab — System Architecture

AI server (WSL2) · Open5GS core · Ericsson Router 6000 · BB 6630

supptel Ericsson Team

Ahmadreza Majlesara

Open5GS v2.7.6 Ollama v0.20+ OpenClaw 2026.5.12 6 AI Models
Chat History
Network Topology
AI Subsystem
Model Roles
Hardware Requirements
Demo
AI Server
WSL2 / Ubuntu 22.04
localhost · Windows PC
Ollama :11434
gemma4:e4b mistral:7b codellama:13b llava:13b phi3:mini nomic-embed mistral-instruct
orchestrator.py
phi3 routes · Gemma thinks · specialists execute · CLI chat
Tool layer
web_search prom_query open5gs_cmd vector_search vision_analyze
Vector DB (ChromaDB)
Lab docs · configs · logs · manuals
nomic-embed-text
WiFi·SSH
5G Core Laptop
Ubuntu 22.04
172.29.10.26 · WiFi + Eth port 17
Open5GS v2.7.6
MME
10.16.92.74
AMF
10.16.91.74
UPF
user plane
Monitoring stack
NMS :8888 Prometheus :9099 Grafana :3000
NTP server
172.29.10.26 · serves time to BB 6630
active
via core (future) Eth port 17
Network Layer
Ericsson Router 6000
core subnet 10.16.x.72/29 · RAN subnet 10.15.x.72/29
SUP_NR
NR context
SUP_LTE
LTE context
SUP_OM
OAM context
port 17
→ core laptop
port 7
→ BB 6630
Eth port 7 NTP ← core
RAN
Ericsson Baseband BB 6630
RAN subnet 10.15.x.72/29 · router port 7
NR radio
VLAN 2120
LTE radio
VLAN 2130
OM / config
VLAN 2140planned
GPS sync
synced
NTP client
← 172.29.10.26
Legend
Active / reachable
Physical Ethernet
WiFi / SSH / REST / NTP
Planned (future)
Planned / not yet active

Question routing & model delegation flow

👤 User (CLI)
types a question
⚙️ orchestrator.py
manages session · conversation history · logging
⚡ phi3:mini — Intelligent Router
reads question · classifies in ~200ms · picks specialist
code? image? web search? logs? complex?
gemma4:e4b
Complex reasoning · multi-step · final synthesis
mistral:7b
Web search · fetch · summarize for Gemma
codellama:13b
Scripts · CLI configs · log parsers · playbooks
llava:13b
Screenshots · Grafana · alarm images · UI analysis
mistral-instruct
Log parsing · structured JSON · alarm reports
nomic-embed
Vector search · local docs · configs · manuals
results fed back to Gemma for final synthesis
🧠 gemma4:e4b — Final Answer
synthesizes all results · produces coherent response · logs to session
👤 User sees answer
CLI · with tool indicators

Model roles, responsibilities & example prompts

🧠
gemma4:e4b
Main brain · orchestrator · ~4.5B params
Primary reasoning engine. Receives all specialist results and synthesizes a final coherent answer. Handles complex multi-step questions, interprets 5G lab context, and decides when more tool calls are needed.
"Why is UE registration failing based on these logs?"
"Explain what these Prometheus metrics mean for our setup"
click for full model report →
ollama pull gemma4:e4b
Already installed ✓
~4GB · ~5GB VRAM
Role: orchestrator
Temp: 0.7 · Ctx: 8192
phi3:mini
Intelligent router · ~2B params · ~200ms
Reads every incoming question and classifies it in milliseconds — is this a code task, a visual, a web search, a log parse, or a complex reasoning question? Routes to the right specialist instantly, making the system feel fast and responsive.
"this is a code question → codellama"
"this has an image → llava"
click for full routing report →
ollama pull phi3:mini
~2GB · ~2.5GB VRAM
Role: router
Latency target: <200ms
🔍
mistral:7b
Web search specialist · 7B params
Activated when Gemma needs current information from the web. Sends a precise search query, fetches results, and returns a clean 3-5 sentence factual summary. Used for hardware specs, 3GPP standards, software changelogs, and known issues.
"What 3GPP release does Open5GS 2.7.6 support?"
"Ericsson BB 6630 maximum UE capacity"
click for full search engine report →
ollama pull mistral:7b
~4GB · ~6GB VRAM
Role: web search
Temp: 0.3 · concise output
💻
codellama:13b
Code & config specialist · 13B params
Handles all code generation, script writing, config generation, and log parsing with regex. Generates Ericsson router CLI commands, Python scripts, Ansible playbooks, and PromQL queries. Far more reliable than general models for exact CLI syntax.
"Write CLI to configure VLAN 2140 on Ericsson router"
"Parse AMF logs and extract all S1AP errors as JSON"
click for full model report →
ollama pull codellama:13b
~8GB · ~10GB VRAM
Role: code & CLI specialist
Needs GPU for best speed
👁️
llava:13b
Vision specialist · multimodal · 13B params
Reads and analyzes images. Point it at Grafana dashboard screenshots, Ericsson Element Manager UI screenshots, alarm lists, or network diagram photos. Identifies anomalies, reads values, and describes what it sees in technical language.
"Here is a Grafana screenshot — what looks wrong?"
"Analyze this baseband alarm screenshot"
click for full model report →
ollama pull llava:13b
~8GB · ~12GB VRAM
Role: vision & image analysis
Needs 12GB+ VRAM
📋
mistral:7b-instruct
Structured output specialist · 7B params
Instruct-tuned for strict JSON and table output. Used specifically for log analysis, alarm report generation, and any task where the output must be machine-readable or precisely formatted. Ideal for feeding results into dashboards or other tools.
"Parse today's MME logs → JSON list of failed registrations"
"Summarize Prometheus alerts as structured report"
click for full model report →
ollama pull mistral:7b-instruct
~4GB · ~6GB VRAM
Role: structured JSON output
Log parsing · alarm reports
📚
nomic-embed-text
Embeddings · local knowledge base · 270MB
Converts text to vectors for semantic search. Feeds the local ChromaDB vector database. Index your lab documentation, Ericsson manuals, Open5GS configs, router configs, and historical logs. Then search them by meaning, not just keywords.
"What does our router config say about VLAN 2120?"
"Find all handover failures from last week's logs"
click for full knowledge base report →
ollama pull nomic-embed-text
~270MB · ~0.5GB VRAM
Role: embeddings · ChromaDB
Tiny size · huge value

Recommended hardware for the AI server

Every query runs 3 models in sequence — peak VRAM is their sum
phi3:mini
2.5 GB
router · always first
+
specialist
6–12 GB
varies by query type
+
gemma4:e4b
5 GB
synthesis · always last
=
peak demand
13.5–19.5 GB
19.5 GB with llava:13b
Worst-case path: phi3:mini (2.5) + llava:13b (12) + gemma4:e4b (5) = 19.5 GB  ·  Typical path (search/code): phi3:mini (2.5) + specialist (6–10) + gemma4:e4b (5) = 13.5–17.5 GB
ComponentMinimumRecommendedWhyYour risk without it
GPU RTX 3080 Ti (12GB) RTX 4090 (24GB) or A10 (24GB) Pipeline runs 3 models simultaneously. llava:13b alone needs 12GB. codellama:13b needs 10GB. Without GPU, all 7 models fall back to CPU. 60–180s response times on 13B models. llava:13b is completely unusable on CPU. Vision queries time out.
VRAM 16GB 24GB+ The pipeline runs phi3:mini + specialist + gemma4:e4b in sequence — all three overlap in VRAM during synthesis. Peak demand is 19.5GB (llava path). 24GB covers all 7 specialists with no offloading. 16GB handles most queries but llava:13b will partially offload to RAM (~3–4s extra per vision query). 10GB is too small — codellama alone won't fit cleanly.
RAM (system) 32GB 64GB Models overflow VRAM to RAM. SearXNG + Playwright headless Chromium + ChromaDB + WSL2 overhead together consume ~8–12GB just for the support stack. Remaining RAM absorbs VRAM spillover. System thrashes when multiple specialists are called in sequence. Playwright browser fetch crashes. ChromaDB indexing slows to a crawl.
CPU 8 core 16 core (Ryzen 9 / i9 / Xeon) phi3:mini routing runs on CPU even with a GPU present. Playwright headless Chromium, SearXNG Flask server, ChromaDB embeddings, and SSH tool calls all run concurrently on CPU alongside model inference. phi3:mini routing adds 400–800ms instead of <200ms. Parallel search fetch competes with inference. Tool calls feel sluggish.
Storage 512GB NVMe 1TB+ NVMe (PCIe 4.0) Current 7 models total ~31GB. Planned upgrade to qwen2.5-coder:32b adds ~20GB. ChromaDB grows as you index lab docs. SearXNG, Playwright browser (~130MB), logs, and WSL2 image add up fast. Running out of space when pulling the code model upgrade. Slow model load from a SATA SSD adds 5–15s cold-start latency per model.
Network 100Mbps Ethernet 1Gbps Ethernet to lab network SSH to core (172.29.10.26), Prometheus queries (:9099), NMS health (:8888), AMF/MME log streaming — all concurrent when multiple tools are invoked. SearXNG also fetches from upstream search engines. Prometheus queries and log fetches add 1–3s latency. SSH timeout risks during heavy concurrent tool use.
OS WSL2 Ubuntu 22.04 ✓ (current) Native Ubuntu 22.04 (dual boot or dedicated server) Native Linux removes WSL2 memory overhead, fixes GPU passthrough latency, eliminates port-forwarding complexity, and lets Playwright and SearXNG bind ports directly without Windows NAT. WSL2 works but adds ~2GB RAM overhead, occasional GPU driver friction, and Playwright requires --no-sandbox workaround (already applied).

Current model stack — 7 models · ~31 GB total on disk

ModelRoleDiskVRAMActive whenPriority
phi3:mini Router ~2 GB~2.5 GB Every single query — always first ★★★ pull first
gemma4:e4b Brain · synthesis ~4 GB~5 GB Every single query — always last ★★★ pull second
nomic-embed-text Embeddings · KB ~270 MB~0.5 GB KNOWLEDGE queries · knowledge base ingest ★★★ tiny · huge value
mistral:7b Web search ~4 GB~6 GB SEARCH queries · SearXNG result synthesis ★★★ pull third
mistral:7b-instruct-v0.3-q8_0 Log parser ~4 GB~6 GB LOGS queries · structured JSON output ★★☆ pull fourth
codellama:13b Code · CLI ~8 GB~10 GB CODE queries · scripts · configs · PromQL ★★☆ pull fifth
llava:13b Vision · images ~8 GB~12 GB VISION queries only · needs image path from user ★☆☆ needs 24GB VRAM
Total — all 7 models ~30.3 GB peak 19.5 GB not all active simultaneously — Ollama loads on demand

Planned model upgrade — requires RTX 4090 (24GB VRAM)

Current modelPlanned replacementDiskVRAMQuality gainNeeds
codellama:13b qwen2.5-coder:32b ~20 GB~20 GB HumanEval: 38% → 80% · better CLI syntax · better PromQL 24GB VRAM minimum
After upgrade: total disk ~42 GB · peak VRAM 27 GB (phi3 + qwen2.5-coder:32b + gemma4) → requires 24GB GPU with partial RAM offload for gemma4 layers. Prompt documented in CODING_IMPROVEMENTS.md PROMPT 6.

Support stack — RAM & CPU overhead (no VRAM)

ComponentRAM usageCPU threadsNotes
SearXNG (Flask server)~150 MB1–2Must be started manually after WSL reboot · logs to ai-lab/logs/searxng.log
Playwright headless Chromium~300–500 MB2–4Launched per JS-fetch request · auto-closes after · requires --no-sandbox on WSL2
ChromaDB (vector store)~100 MB + index size1Grows as you ingest lab docs · stored at knowledge/chroma_db/
Ollama server~200 MB + model2–8Keeps last-used model warm in VRAM · auto-evicts after idle timeout
WSL2 + Ubuntu overhead~2–3 GB2Fixed OS overhead · more on Windows host with other apps running
Support stack total ~3–4 GB ~8–16 on top of whatever model VRAM is in use
VRAM management — how Ollama handles the 7-model stack
Ollama loads models on demand and keeps them warm until idle timeout (~5 min). It does not pre-load all 7 models.
In practice, only 2–3 models are in VRAM at any moment: phi3:mini (router) and gemma4:e4b (brain) stay warm between queries since they're used every time. The specialist evicts after its query completes.

Priority order to keep warm manually (if you want to pre-load):
phi3:mini (2.5GB · always needed) → gemma4:e4b (5GB · always needed) → nomic-embed-text (0.5GB · tiny, keep warm) → mistral:7b (6GB · most common specialist) → load others on demand

With 16GB VRAM: phi3 + gemma4 + mistral:7b fits (14GB). llava:13b will offload ~3.5GB to RAM — adds ~4s to vision queries.
With 24GB VRAM: all specialists including llava:13b run fully in VRAM. Zero RAM offloading at current model sizes.
Live Demo — real models, real answers ● checking server…
Type a question below or click an example.
Watch phi3:mini route it → specialist → gemma4 synthesis.
TRY AN EXAMPLE
Routing Pipeline
📥
Input received
waiting for question…
phi3:mini — Router
few-shot classify · temp 0 · 5 tokens
🎯
Route decision
label will appear here
🔧
Specialist model
processes the request
🧠
gemma4:e4b — Synthesis
final answer generation
Answer delivered
session log updated
0
Queries
Last route
Route ms
🔍
mistral:7b — Search Engine Report
search_agent.py · full architecture & roadmap
At a glance
3
search backends
2
fetch layers
4
SearXNG engines
0
external API keys
Search stack — priority order
01
SearXNG — self-hosted metasearch
Runs locally on http://localhost:8080 · no rate limits · no API key
Aggregates: Google · DuckDuckGo · Wikipedia · GitHub in one query
Engines: only English-language results · Bing disabled (noise)
Config: /home/ericsson/searxng/searxng-lab.yml
Start: bash /home/ericsson/ai-lab/start_searxng.sh
LIVE
02
OpenClaw → DuckDuckGo
Subprocess call to openclaw CLI · triggers only if SearXNG is down
Provider: duckduckgo · limit 6 results · 30s timeout
Configured in ~/.openclaw/openclaw.json
FALLBACK 1
03
ddgs — DuckDuckGo Python library
Direct Python call · no subprocess · last resort if openclaw also fails
pip package: ddgs · same results as DuckDuckGo
FALLBACK 2
Page fetch stack — enriches thin snippets (<120 chars)
Layer 1 — requests + trafilatura (static)
Fast plain HTTP fetch · ~1–2s · works for any static HTML page
trafilatura extracts main content, strips nav/ads/boilerplate
If extracted text ≥ 150 chars → accepted, no Layer 2 needed
LIVE
Layer 2 — Playwright headless Chromium (JS)
Full browser render · ~4–6s · fires only when Layer 1 returns <150 chars
Handles React/Vue/Angular SPAs, lazy-loaded content, dynamic docs
Waits 1.5s after domcontentloaded for JS to settle
Confirmed working: releasealert.dev, deepwiki-style pages
Browser binary: ~/.cache/ms-playwright/chromium-1223/
LIVE
ALWAYS SKIPPED (bot-protected / unfetchable)
ericsson.com scribd.com youtube.com twitter.com / x.com linkedin.com facebook.com springer.com jstor.org ieee.org .pdf · .doc · .ppt · .zip
Synthesis — after results are collected
mistral:7b · temperature 0.1 · max 800 tokens
System prompt: research assistant · use only provided search results · cite source URL for every key fact
Input: question + up to 6 search results (title · URL · body/extracted content)
Output: cited factual answer in plain prose · streamed token by token to the terminal
Future additions — roadmap
Semantic reranking HIGH
Use nomic-embed-text (already running for ChromaDB) to score each search result by semantic similarity to the query. Reorder results so mistral:7b always sees the most relevant content first, not just what SearXNG ranked first.
PDF extraction HIGH
Install pymupdf (pip install pymupdf) to extract text from PDF URLs — unlocks Ericsson datasheets, 3GPP specs, and archive.org technical documents currently skipped by the fetch stack.
Parallel URL fetch MED
Use concurrent.futures to fetch top-N URLs simultaneously instead of sequentially. Cuts fetch time from N×8s to ~8s regardless of how many URLs are enriched.
Query expansion MED
Use phi3:mini (already running for routing) to rephrase the query before searching. Turns "BB 6630 GPS sync" into "Ericsson Baseband 6630 GPS synchronization IEEE 1588 PTP requirements" — better query, better results.
More SearXNG engines MED
Stack Overflow (technical Q&A) and arXiv (5G research papers) are built into SearXNG — one-line additions to searxng-lab.yml. Also consider enabling a local Brave Search engine for broader coverage.
Better mistral prompt MED
Add lab-specific instructions: prefer open5gs.org and 3gpp.org over generic blogs, always cite version numbers, format commands as code blocks, downrank sources older than 2 years. Direct improvement to answer quality.
🧠
gemma4:e4b — Brain & Synthesis Engine
orchestrator.py · the final step in every single query
At a glance
ALL
queries pass through
2048
max output tokens
0.4
temperature
6
messages of history
Role in the pipeline
Always the last step — synthesis
Every user query ends here regardless of which specialist ran first.
Receives the specialist's raw output and synthesizes a final coherent natural-language answer.
Has the full lab topology baked into its system prompt — knows every IP, device, and VLAN.
ALWAYS ACTIVE
COMPLEX label — reasons directly, no specialist
When phi3:mini labels the query COMPLEX, gemma4:e4b answers alone without invoking any specialist tool.
Used for: 5G/LTE concepts, status checks, architecture explanations, multi-step reasoning.
DIRECT
Lab context baked into the system prompt
Open5GS 5G/LTE core → 172.29.10.26
MME: 10.16.92.74  |  AMF: 10.16.91.74
NMS :8888  ·  Prometheus :9099  ·  Grafana :3000
NTP: 172.29.10.26 → Ericsson BB 6630 (GPS synced)
Ericsson Router 6000 contexts: SUP_NR · SUP_LTE · SUP_OM
BB 6630 VLANs: 2120 (NR) · 2130 (LTE) · 2140 (OM, planned)
Core subnet: 10.16.x.72/29  ·  RAN subnet: 10.15.x.72/29
All infrastructure is on-premises. No cloud dependencies.
Conversation memory
Last 3 exchanges kept in context (6 messages)
Pipeline maintains a rolling history of the last 3 question/answer pairs.
Lets gemma4 refer back to previous answers in the same session.
History resets on process restart — held in RAM only, never written to disk.
phi3:mini — Intelligent Router
router.py · classifies every query before any specialist runs
At a glance
~200
ms routing time
6
routing labels
0
temperature (deterministic)
36
few-shot examples
Routing labels — what goes where
🔧
CODE → codellama:13b
Write scripts · generate CLI commands · create config files · PromQL queries · bash automation
👁
VISION → llava:13b
Analyze images · screenshots · Grafana dashboards · alarm panels · network diagrams
🔍
SEARCH → mistral:7b
Web search · current specs · datasheets · release notes · 3GPP standards
📋
LOGS → mistral:7b-instruct
Parse log text · extract errors/warnings · analyze journalctl output · structured JSON output
🧠
KNOWLEDGE → nomic-embed-text + ChromaDB
Search local ingested documents · manuals · configs · runbooks by semantic meaning
💡
COMPLEX → gemma4:e4b (direct)
5G/LTE concepts · status checks · architecture questions · multi-step reasoning — no specialist needed
How it works
num_predict: 5 — forces a single-word response · impossible to ramble
temperature: 0 — fully deterministic · same input always produces the same label
36 pre-written few-shot conversation examples guide every classification decision
On error or invalid output → silently falls back to COMPLEX
Runs first in every query before any specialist is loaded or invoked
💻
codellama:13b — Code & CLI Specialist
tools/code_agent.py · triggered by CODE label · temperature 0.1
At a glance
13B
parameters
~8GB
disk · ~10GB VRAM
0.1
temperature
1024
max tokens
System prompt — expertise domains
Python · Linux CLI (bash/sh) · Ericsson Router 6000 CLI
Open5GS YAML configuration · PromQL · telecom automation
Rule: no preamble · no filler text · output only the code or command
Inline comments only where strictly necessary for clarity
What it can generate
🐍
Python automation scripts
Prometheus metric fetchers · SSH wrappers · log parsers · Grafana API callers · data exporters
LIVE
🔌
Ericsson Router 6000 CLI
VLAN configuration · context management (SUP_NR / SUP_LTE / SUP_OM) · interface setup · routing
LIVE
⚙️
Open5GS configuration
YAML config editing · AMF/MME/UPF setup · subscriber provisioning · PLMN configuration
LIVE
📊
PromQL queries for Prometheus
Metric queries · alerting rules · recording rules · rate/increase/histogram · threshold expressions
LIVE
What triggers this model
phi3:mini routes to CODE when query contains: write · generate · create · script · command · config · bash · python
Also triggered by: "give me a one-liner" · "cron job" · "parse with regex" · "PromQL for..."
Output flows to gemma4:e4b for final synthesis and formatting before display
👁️
llava:13b — Vision & Image Analyst
tools/vision_agent.py · triggered by VISION label · base64 image input
At a glance
13B
params · multimodal
~12GB
VRAM required
0.2
temperature
512
max tokens
How it works — step by step
01
phi3:mini routes the query as VISION
Triggered by: "analyze this image" · "screenshot" · "diagram" · "front panel" · "look at this"
02
CLI prompts: "Image path:"
User enters path to a local image file (.png · .jpg · .jpeg) on the AI server or mounted drive
03
Image read → base64 encoded → Ollama API
File read in binary · base64 encoded · sent alongside the question in the multimodal chat request
04
Technical analysis streamed back
Focused on: numbers · labels · status indicators · error messages · anomalies · actionable details
What it can read and analyze
Grafana dashboards
Reads metric values · spots anomalies · interprets trends · identifies threshold crossings in panels
Ericsson Element Manager
Alarm lists · status panels · subscriber counts · BB 6630 LED indicators · port status screens
Network topology diagrams
Connection maps · VLAN drawings · RAN/core diagrams · describes paths and identifies gaps
Prometheus graph exports
Exported graph screenshots · rate-of-change trends · anomaly detection in historical data
System prompt focus
Role: technical image analyst · always focus on technical details and exact numbers
Priority: labels · status indicators · error messages · diagrams · actionable information
Rule: be concise and specific — no generic descriptions or vague observations
📋
mistral:7b-instruct — Log Parser
tools/log_agent.py · mistral:7b-instruct-v0.3-q8_0 · triggered by LOGS label
At a glance
7B
params · Q8_0 quant
~4GB
disk · ~6GB VRAM
0
temperature (deterministic)
1024
max tokens
Strict JSON output schema
Always returns valid JSON · parseable by json.loads() with zero post-processing
{
  "summary": "one sentence — what happened overall",
  "errors": ["error message 1", "error message 2", ...],
  "warnings": ["warning 1", "warning 2", ...],
  "events": ["notable event 1", "event 2", ...]
}
System prompt enforces: no markdown · no prose · no code fences · raw JSON only
Auto-strips accidental ``` fences if the model adds them · graceful error JSON on failure
Input sources accepted
📄
Raw log text — paste directly into CLI
Paste journalctl output · syslog lines · Open5GS log lines directly at the CLI prompt
📁
File path — agent reads the file automatically
Enter a path like /var/log/open5gs/amf.log — agent reads and parses the full file
What it parses best
AMF / MME logs
Registration failures · NAS rejections · S1AP errors · UE attach/detach events · NGAP issues
UPF logs
Packet routing errors · PFCP session failures · data plane drops · GTP tunnel problems
PTP4L / chrony logs
Sync loss events · clock offset violations · master clock changes · GPS lock/unlock events
systemd journal
Any journalctl output · service crashes · kernel messages · OOM kills · segfaults
Terminal display
Renders a colored box: ┌── Log Analysis ──────────────────────────────────────────┐
Shows: summary · up to 5 errors (✗) · up to 5 warnings (⚠) · up to 5 events (·)
JSON result also passed to gemma4:e4b for final natural-language synthesis
📚
nomic-embed-text — Knowledge Base
knowledge/search.py · knowledge/ingest.py · ChromaDB at knowledge/chroma_db/
At a glance
270
MB model size
~0.5
GB VRAM
Top 3
chunks returned
100%
offline · no API
Ingest & search pipeline
01
Ingest: ingest <path>
Reads any file or directory · splits documents into overlapping text chunks for better recall
02
Embed: each chunk → float vector via Ollama
POST /api/embed with nomic-embed-text · returns a 768-dimensional float vector per chunk
03
Store: ChromaDB at knowledge/chroma_db/
Persistent local vector database · survives restarts · stored on disk in the project folder
04
Search: query → embed → cosine similarity → top 3
Query embedded with same model · ChromaDB finds closest vectors · top 3 chunks returned with relevance scores shown as ████░░ bars
05
Synthesize: top chunks → gemma4:e4b → answer
The 3 matching chunks are passed to gemma4:e4b, which answers citing the source document and chunk number
What you should index
Equipment manuals
Ericsson BB 6630 admin guide · Router 6000 CLI reference · Element Manager documentation
Open5GS configs
Your YAML configs for AMF/MME/UPF/SMF/NRF · subscriber exports · PLMN and slice settings
Lab runbooks
Step-by-step procedures · maintenance guides · troubleshooting checklists · incident reports
Router CLI exports
Router 6000 running configs · VLAN assignments · context configurations · route tables
CLI commands
ingest /path/to/docs/ # index a whole folder
ingest router_config.txt # index a single file
ingest /mnt/e/ericsson_manuals/ # index from a Windows drive
Then ask naturally: "What does our router config say about VLAN 2120?"
Semantic search finds meaning — no exact keyword match needed