Landing
<title>Vibe Analyzer</title>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta property="og:title" content="Vibe Analyzer" />
<meta property="og:description" content="Agentic RAG engine for code and knowledge bases" />
<link rel="icon" href="assets/images/favicon.png" />
Analyzer
# Quick Start
## 1. Install OpenSearch ## 2. Install Open WebUI
## 3. Install Vibe Analyzer cargo install vibe-analyzer
## 4. Add source — project or knowledge vibe-analyzer source add {path}
## 5. Index projects vibe-analyzer scan index
## 6. Start MCP vibe-analyzer serve start
Introduction
What is Vibe Analyzer
Vibe Analyzer is an Agentic RAG engine for codebases and knowledge bases. It extracts structure from source code via AST parsing, enriches it with LLM, and indexes everything into OpenSearch. AI assistants access knowledge through 11 MCP tools.
The Problem with Traditional RAG
Traditional RAG works like this:
Query → Embeddings → Find similar documents → Load into prompt → Response
Problems:
- 📈 Found documents are added to the prompt in their entirety
- 💾 The larger the project, the more VRAM is required
- 🔍 Relevance drops as context volume grows
- 💸 Each query becomes more expensive
How Agentic RAG Works
Vibe Analyzer flips the paradigm:
Query → AI model selects an MCP tool → Tool returns a structured response
Advantages:
- 📉 Minimal context — the model receives only what the tool returns
- 🧠 No embeddings — keyword and AST search via OpenSearch
- 🔗 One tool call = complete answer, no document stuffing
- ♾️ Context size stays constant regardless of project size
Key Features
- 🌳 AST parsing for 13 programming languages
- 💡 LLM enrichment: descriptions and search tags for each file
- 📄 Export AST and AST+LLM to JSON, JSON5, TOON, XML
- 📝 Semantic and morphological search across code and documentation
- ⚡ Incremental indexing (modified files only)
- 📦 Self-contained tools (one call — complete response)
- 🗂️ Multilingual support (RU, EN, ZH)
- 🦀 Built in Rust — fast and memory-efficient
Anti-Hallucination Protection
To prevent AI models from making up parameters and tool names:
- ✅ Soft parameter validation
- 🛡️ Input parameter normalization
- 📋 Optimized tool descriptions
- 🏷️ 150+ aliases for tool names
- 🌐 Automatic query language detection
- 🧪 Full-cycle end-to-end tests
- 📐 Tested on models from 3B parameters
Who This Is For
- Development teams — index your entire codebase, and AI assistants can answer questions about architecture, find functions, and explain module connections
- Developers under NDA — the entire stack runs locally: OpenSearch, Ollama, MCP server. No data ever leaves to external APIs. Index proprietary code without risk of violating agreements
- Private projects — models from 3B parameters run on your hardware. No one sees your code or your queries
- Technical writers — store documentation in Markdown files and search it in any language
- Open-source projects — give contributors a quick way to understand the code
- Startups — lower the entry barrier for new developers without cloud API costs
What’s Next
- Quick Start — installation, setup, and first run
- Architecture — how everything works under the hood
- MCP Tools — complete reference for all 11 tools
Quick Start
Prerequisites
Vibe Analyzer requires two external services:
- OpenSearch — storage and search for indexed data
- Ollama — running LLMs for code enrichment with descriptions and tags
Both services can be run locally via Docker.
Installation
Install via Cargo (recommended)
The package is available on crates.io:
cargo install vibe-analyzer
Build from Source
# 1. Clone the repository
git clone https://gitcode.com/keygenqt_vz/vibe-analyzer.git
# 2. Enter the directory
cd vibe-analyzer
# 3. Build
cargo build --release
Build Dependencies
- Rust toolchain (cargo, rustc)
- libssl-dev (for TLS)
Starting Services
The repository includes two ready-to-use docker-compose files:
docker/opensearch/docker-compose.yml— OpenSearch for indexing and searchdocker/open-webui/docker-compose.yml— Open WebUI with Ollama for AI assistant connection
Docker (recommended)
OpenSearch:
cd docker/opensearch
docker-compose up -d
Open WebUI (optional, for AI assistant connection):
cd docker/open-webui
docker-compose up -d
Verification
# OpenSearch should respond
curl http://localhost:9200
# Ollama should be accessible
curl http://localhost:11434/api/tags
Configuration
The configuration file is located at ~/.vibe-analyzer/config.json5. It is created automatically with default settings the first time any CLI command is run.
Example Working Configuration
{
// Configuration version (do not modify)
"version": "0.0.1",
// OpenSearch connection
"opensearch": {
"host": "http://192.168.1.10:9200"
},
// MCP server
//
// host — bind address (0.0.0.0 for all interfaces, 127.0.0.1 local only)
// port — server port (default: 9020)
// protocol — MCP protocol version (2024-11-05, 2025-03-26, 2025-06-18, or 'latest')
"mcp": {
"host": "0.0.0.0",
"port": 9020,
"protocol": "latest"
},
// Ollama LLM servers
//
// host — API endpoint
// model — model name
// max_chunk_chars — maximum characters per request
// max_chunk_files — maximum files per request
// timeout_secs — request timeout in seconds
// temperature — generation temperature (0.0 – 1.0)
// seed — seed for reproducibility
// num_ctx — context window size
// num_predict — maximum tokens in response
// ast_imports — include imports in analysis
// ast_variables — include variables in analysis
// ast_functions — include functions in analysis
// ast_enums — include enums in analysis
// ast_interfaces — include interfaces in analysis
"ollama": [
{
"host": "http://192.168.1.10:11434",
"model": "qwen2.5-coder:3b-instruct",
"max_chunk_chars": 4000,
"max_chunk_files": 3,
"timeout_secs": 60,
"temperature": 0.1,
"seed": 42,
"num_ctx": 4096,
"num_predict": 2048,
"ast_imports": false,
"ast_variables": false,
"ast_functions": true,
"ast_enums": true,
"ast_interfaces": true
},
{
"host": "http://localhost:11434",
"model": "qwen2.5-coder:3b-instruct",
"max_chunk_chars": 4000,
"max_chunk_files": 3,
"timeout_secs": 60,
"temperature": 0.1,
"seed": 42,
"num_ctx": 4096,
"num_predict": 2048,
"ast_imports": false,
"ast_variables": false,
"ast_functions": true,
"ast_enums": true,
"ast_interfaces": true
}
],
// Knowledge sources for indexing
"sources": ["/Users/keygenqt/Documents/Gitcode/Projects/vibe-analyzer"]
}
Checking Configuration
# View current settings
cat ~/.vibe-analyzer/config.json5
Adding a Knowledge Source
A source is anything you want to index: a code project, a documentation folder, or both.
# Add a project
vibe-analyzer source add /path/to/your/project
# Add a documentation directory
vibe-analyzer source add /path/to/docs
# List all added sources
vibe-analyzer source list
Scanning and Indexing
Vibe Analyzer provides three commands for different tasks:
Export AST to File
Code structure extraction only, without LLM. The result is saved in JSON/JSON5/TOON/XML:
# All sources
vibe-analyzer scan ast
# A specific source
vibe-analyzer scan ast --target /path/to/your/project
# With format specified
vibe-analyzer scan ast --target /path/to/your/project --format json5
Export AST with LLM Enrichment to File
AST parsing + enrichment via Ollama (descriptions, tags). The result is saved to a file:
vibe-analyzer scan analyze --target /path/to/your/project
Note: enrichment requires a running Ollama with the selected model. If multiple Ollama hosts are configured, files are distributed among them via competing consumers.
Indexing to OpenSearch
Full cycle — AST parsing, LLM enrichment, and writing to OpenSearch for search via MCP tools:
vibe-analyzer scan index --target /path/to/your/project
After indexing, data is ready for search through the MCP server.
Starting the MCP Server
vibe-analyzer serve start
The server starts on the address and port specified in the configuration (default http://0.0.0.0:9020).
Verifying Results
# Project statistics
vibe-analyzer stats info --target /path/to/your/project
# File tree
vibe-analyzer stats tree --target /path/to/your/project
# List all indexed projects
vibe-analyzer stats info
Connecting an AI Assistant
Open WebUI
- Make sure Open WebUI is running (see the “Starting Services” section)
- In Open WebUI settings, add an MCP server:
- URL:
http://<host>:9020(as specified in the configuration) - Transport: Streamable HTTP
- URL:
- Once connected, the AI model will have 11 tools for searching code and documentation
Incremental Updates
Vibe Analyzer uses BLAKE3 hashes to track changes. When running scan index again, only modified files are processed:
# Reindexing — only changed files are affected
vibe-analyzer scan index --target /path/to/your/project
To force a full reindex, use the --force flag:
vibe-analyzer scan index --target /path/to/your/project --force
The same can be done via the admin_sync MCP tool without restarting the server.
Troubleshooting
OpenSearch Unreachable
# Check Docker container status
docker ps | grep opensearch
Ollama Not Responding
# Check if Ollama is running
curl http://localhost:11434/api/tags
Model Not Installed
# Download the model
ollama pull qwen2.5-coder:3b-instruct
What’s Next
- Architecture — detailed breakdown of Vibe Analyzer’s internals
- CLI Reference — complete command reference
- Integrations — connecting AI assistants
Architecture
Overview
Vibe Analyzer consists of four main components that work sequentially:
Code sources
│
▼
┌─────────────┐
│ Scanner │ AST parsing, structure extraction
└─────────────┘
│
▼
┌─────────────┐
│ Analyzer │ LLM enrichment, descriptions, tags
└─────────────┘
│
▼
┌─────────────┐
│ Indexer │ Writing to OpenSearch
└─────────────┘
│
▼
┌─────────────┐
│ MCP Server │ HTTP API for AI assistants
└─────────────┘
Components
Scanner
The Scanner handles initial source code processing:
- File system traversal — recursive directory scanning respecting
.gitignoreand default exclusion patterns (.git,target,node_modules, etc.) - Language detection — selects the appropriate tree-sitter parser based on file extension
- AST parsing — extracts code structure: functions, classes, imports, variables, enums, interfaces, structs
- Metadata collection — line count, file size, BLAKE3 content hash
- License detection — searches for a LICENSE file and identifies the license type via askalono and SPDX
- README detection — priority: root > subdirectories,
.md>.txt> no extension - Statistics collection — aggregation by language, file count, lines of code
Analyzer
The Analyzer enriches scanner results using LLM:
- Prompt generation — builds a request for each file containing the AST structure
- Request distribution — with multiple Ollama hosts configured, files are pushed to a shared channel and workers compete for them (competing consumers). The fastest worker takes the next file, maximizing host utilization
- Batch processing — files are grouped into batches limited by
max_chunk_charsandmax_chunk_files - Controlled generation — configurable parameters
temperature,seed,num_ctx,num_predictfor reproducible results - Enrichment — LLM adds a description and multilingual search tags to each file
- Project summarization — a separate request generates a brief description of the entire source
Indexer
The Indexer manages writing data to OpenSearch:
- Three indices per source:
vibe_meta— project metadata (summary, license, statistics, README)vibe_files_{hash}— full file contentsvibe_files_analysis_{hash}— AST, enriched descriptions, and search tags
- Bulk operations — batch writing for maximum performance
- Incremental updates — BLAKE3 hash comparison, only changed files are re-processed
- Cleanup — removes stale data no longer present in the source
MCP Server
The MCP server provides an API for AI assistants:
- Protocol — Model Context Protocol (MCP) via Streamable HTTP transport
- 11 tools — admin, get, search, and show categories
- Anti-Hallucination Protection — parameter normalization, tool name aliases, auto language detection
- Logging — middleware for tracking all requests
- CORS — cross-origin request support for web interfaces
Indexing Lifecycle
Full Indexing
1. source add → save path to config
2. scan index → check OpenSearch → cleanup orphaned data → AST parsing → LLM enrichment → indexing
Incremental Updates
1. scan index → load hashes from OpenSearch → compare with files on disk
2. New/modified → AST parsing → LLM enrichment → indexing
3. Deleted → removal from OpenSearch
4. Unchanged → skip
Export without Indexing
1. scan ast → traverse files → AST parsing → export to file
2. scan analyze → traverse files → AST parsing → LLM enrichment → export to file
scan ast and scan analyze do not touch OpenSearch — file export only.
OpenSearch Indices
vibe_meta
One document per project: summary, license, README, aggregated statistics (files, lines, size).
vibe_files_{hash}
One document per file: full contents. The content field is not indexed for search — only stored for retrieval via get_file_content.
vibe_files_analysis_{hash}
One document per text file. Contains AST (functions, classes, imports, etc.), file metadata, and multilingual search tags. The description and tags fields are added after LLM enrichment.
Ollama Clustering
When multiple Ollama hosts are configured, Vibe Analyzer distributes files via competing consumers:
- All workers read from a single shared channel
- The fastest worker takes the next file
- This maximizes utilization of all hosts
- If any host fails, all workers stop
- At the end, per-host statistics are reported: how many files each host processed
This approach allows:
- Faster enrichment through parallel processing on multiple GPUs/servers
- Scaling by adding more hosts to the configuration
Anti-Hallucination Protection
Protection against AI model hallucinations when calling tools:
| Mechanism | Description |
|---|---|
| Name aliases | 150+ alternative tool names (e.g., search_code_functions → search_by_code_functions) |
| Parameter normalization | Wildcard replacement, whitespace trimming, type casting |
| Bounds validation | limit always in 1–10 range, level capped |
| Auto language detection | Detects Cyrillic, Latin, and CJK in search queries |
| Soft error handling | Invalid parameters don’t cause errors, they are normalized to safe values |
Performance
- Rust — native execution without GC overhead
- Parallel parsing — each file processed independently
- Bulk OpenSearch writes — thousands of documents per operation
- Streaming processing — files are processed as they are discovered, without waiting for the entire directory
- Incremental updates — only changed files are re-indexed when updating a source
Supported Languages
Vibe Analyzer supports AST parsing and LLM enrichment for 13 programming languages and file formats.
Full List
| Language | Extensions | AST | Enrichment | Parser |
|---|---|---|---|---|
| Rust | .rs | ✅ | ✅ | RustParser |
| Python | .py | ✅ | ✅ | PythonParser |
| JavaScript | .js | ✅ | ✅ | JavaScriptParser |
| TypeScript | .ts | ✅ | ✅ | TypeScriptParser |
| Java | .java | ✅ | ✅ | JavaParser |
| Go | .go | ✅ | ✅ | GoParser |
| C# | .cs | ✅ | ✅ | CSharpParser |
| Kotlin | .kt | ✅ | ✅ | KotlinParser |
| Swift | .swift | ✅ | ✅ | SwiftParser |
| Dart | .dart | ✅ | ✅ | DartParser |
| Bash | .sh | ✅ | ✅ | BashParser |
| Batch | .bat | ✅ | ✅ | BatchParser |
| ArkTS | .ets | ✅ | ✅ | ArkTsParser |
| Markdown | .md | ✅ | ✅ | MarkdownParser |
Note: Markdown is a special case. Headings, links, code blocks, and frontmatter metadata are extracted instead of programmatic constructs. This allows Markdown files to be used as a knowledge base: documentation, guidelines, standards, project legends. They are searchable via
search_documentationandsearch_knowledge.
Extracted Element Categories
For All Programming Languages
| Element | Description | Example (Rust) |
|---|---|---|
header_comments | Module comment — file purpose | "Application configuration management" |
functions | Functions and methods | fn add(a: i32, b: i32) -> i32 |
classes | Classes | class User |
structs | Structs and records | struct Config |
enums | Enums | enum Color |
interfaces | Interfaces, traits, protocols | trait Display |
variables | Module-level variables and constants | const MAX_SIZE: usize |
imports | Imports and dependencies | use std::fs |
Markdown Only
| Element | Description | Example |
|---|---|---|
headings | Headings with level, text, and preview | { level: 1, title: "Vibe Analyzer", preview: "Universal knowledge base..." } |
links | Links | { text: "documentation", url: "https://example.com/docs" } |
code_blocks | Code block languages | ["bash", "rust"] |
frontmatter | YAML metadata | { title: "...", tags: "...", author: "..." } |
Element Support by Language
| Language | Functions | Classes | Structs | Enums | Interfaces | Variables | Imports |
|---|---|---|---|---|---|---|---|
| Rust | ✅ | — | ✅ | ✅ | ✅ | ✅ | ✅ |
| Python | ✅ | ✅ | — | — | — | ✅ | ✅ |
| JavaScript | ✅ | ✅ | — | — | — | ✅ | ✅ |
| TypeScript | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Java | ✅ | ✅ | — | ✅ | ✅ | ✅ | ✅ |
| Go | ✅ | — | ✅ | ✅ | — | ✅ | ✅ |
| C# | ✅ | ✅ | ✅ | ✅ | — | ✅ | ✅ |
| Kotlin | ✅ | ✅ | — | ✅ | ✅ | ✅ | ✅ |
| Swift | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Dart | ✅ | ✅ | — | ✅ | — | ✅ | ✅ |
| Bash | ✅ | — | — | — | — | ✅ | ✅ |
| Batch | ✅ | — | — | — | — | ✅ | ✅ |
| ArkTS | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Multilingual Search Tags
Each extracted element receives tags in three languages. This allows searching for elements in Russian, English, or Chinese — regardless of the source code language.
| Element Type | EN | RU | ZH |
|---|---|---|---|
| Functions | functions | функции | 函数 |
| Classes | classes | классы | 类 |
| Structs | structs | структуры | 结构体 |
| Enums | enums | перечисления | 枚举 |
| Interfaces | interfaces | интерфейсы | 接口 |
| Variables | variables | переменные | 变量 |
| Imports | imports | импорты | 导入 |
| Headings (MD) | headings | заголовки | 标题 |
| Code blocks (MD) | code_blocks | блоки_кода | 代码块 |
| Links (MD) | links | ссылки | 链接 |
| Module comments | header_comments | — | — |
Doc Comment Formats
Vibe Analyzer extracts documentation from specially formatted comments. Regular comments (//, #) are ignored.
| Language | Doc Comment | Module Comment | Example |
|---|---|---|---|
| Rust | /// or /** */ | //! or /*! */ | /// Adds two numbers |
| Python | """...""" (docstring) | """...""" at file start | """Adds two numbers""" |
| JavaScript | /** */ (JSDoc) | /** */ at file start | /** @param {number} a */ |
| TypeScript | /** */ (JSDoc) | /** */ at file start | /** @param a First number */ |
| Java | /** */ (Javadoc) | /** */ at file start | /** @param a First number */ |
| Kotlin | /** */ (KDoc) | /** */ at file start | /** @param a First number */ |
| C# | /// or /** */ | /** */ or /// at file start | /// <summary>Adds two numbers</summary> |
| Swift | /// or /** */ | /// or /** */ at file start | /// - Parameters: a: First number |
| Dart | /// | /// at file start | /// Adds two numbers |
| Go | // (any before declaration) | // at file start | // Add adds two numbers |
| Bash | ## or # before function | ## at script start | ## Module documentation for Bash testing |
| Batch | :: before label | :: at script start | :: Module documentation for Batch testing |
| ArkTS | /** */ | /** */ at file start | /** Async function example */ |
AST Example
Source code (sample.py):
"""
Module documentation for Python testing
"""
import os
import sys
from datetime import datetime
from typing import List, Optional
# Regular comment - ignored
def add(a: int, b: int) -> int:
"""Adds two numbers"""
return a + b
def multiply(a: int, b: int) -> int:
"""Multiplies two numbers"""
return a * b
async def fetch_data(url: str) -> str:
"""Async function example"""
return "data"
class User:
"""User class"""
def __init__(self, name: str, age: int):
"""Constructor"""
self.name = name
self.age = age
def get_name(self) -> str:
"""Get user name"""
return self.name
class Config:
"""Config class"""
debug: bool = False
max_size: int = 1024
class Color:
"""Color enum (using class constants)"""
RED = 1
GREEN = 2
BLUE = 3
MAX_SIZE: int = 1024
DEFAULT_TIMEOUT: int = 30
APP_NAME: str = "vibe-analyzer"
# Regular comment at the end - ignored
Extracted AST:
{
"functions": [
{
"signature": "def add(a: int, b: int) -> int",
"comments": ["Adds two numbers"]
},
{
"signature": "def multiply(a: int, b: int) -> int",
"comments": ["Multiplies two numbers"]
},
{
"signature": "async def fetch_data(url: str) -> str",
"comments": ["Async function example"]
}
],
"classes": [
{
"signature": "class User",
"comments": ["User class"]
},
{
"signature": "class Config",
"comments": ["Config class"]
},
{
"signature": "class Color",
"comments": ["Color enum (using class constants)"]
}
],
"variables": [
{
"signature": "MAX_SIZE: int"
},
{
"signature": "DEFAULT_TIMEOUT: int"
},
{
"signature": "APP_NAME: str"
}
],
"imports": ["os", "sys", "datetime", "typing"],
"header_comments": ["Module documentation for Python testing"],
"tags": [
"header_comments",
"imports",
"импорты",
"导入",
"variables",
"переменные",
"变量",
"functions",
"функции",
"函数",
"classes",
"классы",
"类"
]
}
Limitations
- Maximum file size for AST parsing: 10 MB (
MAX_AST_FILE_SIZEconstant) - Default ignored directories:
target,node_modules,__pycache__,.venv,venv,.git,.idea - Default ignored files:
.DS_Store,Thumbs.db,*.hprof,*.log - Binary files: detected by extension, name, and content analysis, excluded from parsing
- Nested elements: methods inside classes are extracted as functions, variables inside functions/classes are not extracted
Configuration
Vibe Analyzer uses a JSON5 configuration file. JSON5 is an extended version of JSON with support for comments, trailing commas, and other convenient features.
Location
The configuration file is located at:
~/.vibe-analyzer/config.json5
The file is created automatically with default settings the first time any CLI command is run.
Configuration Structure
The configuration consists of four sections: version, opensearch, mcp, ollama, and sources.
Full Example with Comments
{
// Configuration version — do not modify manually
"version": "0.0.1",
// OpenSearch connection
"opensearch": {
// OpenSearch server URL
"host": "http://192.168.1.10:9200"
},
// MCP server
"mcp": {
// Bind address:
// 0.0.0.0 — accessible from all interfaces (for Docker, remote connections)
// 127.0.0.1 — local only
"host": "0.0.0.0",
// Server port (default: 9020)
"port": 9020,
// MCP protocol version:
// '2024-11-05' — stable
// '2025-03-26' — improved streaming
// '2025-06-18' — latest
// 'latest' — auto-detect
"protocol": "latest"
},
// Ollama LLM servers — specify multiple for load distribution
"ollama": [
{
// Ollama API endpoint
"host": "http://192.168.1.10:11434",
// Model for enrichment
"model": "qwen2.5-coder:3b-instruct",
// Maximum characters per LLM request
// Files are grouped into batches until the total size exceeds this limit
"max_chunk_chars": 4000,
// Maximum files per LLM request
"max_chunk_files": 3,
// Request timeout in seconds
"timeout_secs": 60,
// Generation temperature (0.0 — deterministic, 1.0 — creative)
"temperature": 0.1,
// Seed for reproducible results (same seed → same output)
"seed": 42,
// Model context window size
"num_ctx": 4096,
// Maximum tokens in response
"num_predict": 2048,
// Which AST elements to include in the prompt
"ast_imports": false,
"ast_variables": false,
"ast_functions": true,
"ast_enums": true,
"ast_interfaces": true
},
{
// Second host for load distribution
"host": "http://localhost:11434",
"model": "qwen2.5-coder:3b-instruct",
"max_chunk_chars": 4000,
"max_chunk_files": 3,
"timeout_secs": 60,
"temperature": 0.1,
"seed": 42,
"num_ctx": 4096,
"num_predict": 2048,
"ast_imports": false,
"ast_variables": false,
"ast_functions": true,
"ast_enums": true,
"ast_interfaces": true
}
],
// Knowledge sources for indexing — absolute project paths
"sources": ["/Users/keygenqt/Documents/Gitcode/Projects/vibe-analyzer"]
}
Sections in Detail
version
"version": "0.0.1"
Configuration file version. Do not modify manually — updated automatically during config migration between versions.
opensearch
"opensearch": {
"host": "http://192.168.1.10:9200"
}
| Parameter | Type | Default | Description |
|---|---|---|---|
host | string | http://localhost:9200 | OpenSearch server URL. Can point to a local or remote server |
mcp
"mcp": {
"host": "0.0.0.0",
"port": 9020,
"protocol": "latest"
}
| Parameter | Type | Default | Description |
|---|---|---|---|
host | string | 127.0.0.1 | Server bind address. 0.0.0.0 — accessible externally (Docker, remote clients), 127.0.0.1 — local only |
port | integer | 9020 | MCP server port |
protocol | string | latest | MCP protocol version: 2024-11-05, 2025-03-26, 2025-06-18, or latest |
ollama
The ollama section is an array of Ollama server configurations. One or more hosts can be specified for load distribution.
"ollama": [
{
"host": "http://192.168.1.10:11434",
"model": "qwen2.5-coder:3b-instruct",
"max_chunk_chars": 4000,
"max_chunk_files": 3,
"timeout_secs": 60,
"temperature": 0.1,
"seed": 42,
"num_ctx": 4096,
"num_predict": 2048,
"ast_imports": false,
"ast_variables": false,
"ast_functions": true,
"ast_enums": true,
"ast_interfaces": true
}
]
Main Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
host | string | http://localhost:11434 | Ollama API endpoint |
model | string | qwen2.5-coder:3b-instruct | Model name for enrichment. Must be pre-loaded via ollama pull |
Batch Processing Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
max_chunk_chars | integer | 4000 | Maximum characters per LLM request. Files are grouped into batches until the total size exceeds the limit |
max_chunk_files | integer | 3 | Maximum files per request. Even if the character limit is not reached, no more than this number of files will be in a batch |
Generation Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
timeout_secs | integer | 60 | Ollama request timeout in seconds |
temperature | float | 0.1 | Generation temperature. 0.0 — maximally deterministic, 1.0 — maximally creative. Low values are recommended for code enrichment |
seed | integer | 42 | Random generator seed. Same seed guarantees reproducible results |
num_ctx | integer | 4096 | Model context window size in tokens |
num_predict | integer | 2048 | Maximum tokens in response |
AST Element Filters
Determines which AST elements are included in the LLM prompt. Disabling unnecessary elements reduces request size and speeds up processing.
| Parameter | Type | Default | Description |
|---|---|---|---|
ast_imports | boolean | false | Include imports in the prompt |
ast_variables | boolean | false | Include variables in the prompt |
ast_functions | boolean | true | Include functions in the prompt |
ast_enums | boolean | true | Include enums in the prompt |
ast_interfaces | boolean | true | Include interfaces in the prompt |
sources
"sources": [
"/Users/keygenqt/Documents/Gitcode/Projects/vibe-analyzer",
"/home/user/projects/my-backend",
"/home/user/docs/architecture"
]
| Parameter | Type | Default | Description |
|---|---|---|---|
sources | array of strings | [] | List of absolute paths to knowledge sources. Each source can be a code project, a documentation folder, or both |
Managing sources is easier via CLI rather than editing the config manually:
# Add a source
vibe-analyzer source add /path/to/project
# Remove a source
vibe-analyzer source remove --target /path/to/project
# List all sources
vibe-analyzer source list
Ollama Clustering
When multiple hosts are specified in the ollama section, Vibe Analyzer distributes files via competing consumers:
- All workers read from a single shared channel
- The fastest worker takes the next file
- This maximizes utilization of all hosts
- If any host fails, all workers stop
- At the end, per-host statistics are reported
This approach allows:
- Faster enrichment through parallel processing on multiple GPUs/servers
- Scaling by adding more hosts to the configuration
Configuration Validation
Vibe Analyzer validates the configuration at startup and applies safe defaults if parameters are missing or invalid:
max_chunk_chars→ minimum 1000, maximum 100000limitin search queries → always in the 1–10 range- Invalid paths → normalized to absolute form
- Missing sections → created with default values
Configuration Migration
When updating Vibe Analyzer, the configuration may automatically migrate to a new format. The configuration version (version) tracks the current format and applies migrations when necessary.
Overriding the Configuration Directory
For testing or custom scenarios, the configuration directory can be overridden:
# Set a custom directory
vibe-analyzer --config-dir /custom/path source list
The default is ~/.vibe-analyzer/.
Dev Section
An optional section for debugging. Usually absent in production config — added only when needed:
{
"dev": {
"log_level": "trace",
"spdx_data_path": "tests/mcp/fixtures/config/spdx"
}
}
| Parameter | Type | Default | Description |
|---|---|---|---|
log_level | string | default | Log level: default, trace, debug, info, warn, error. default means info |
spdx_data_path | string | ~/.vibe-analyzer/spdx | Path to SPDX data for license detection. Downloaded automatically on first run |
CLI Reference
Vibe Analyzer provides a command-line interface for managing knowledge sources, scanning, exporting, indexing, and running the MCP server.
General Syntax
vibe-analyzer [global options] <command> [subcommand] [options]
Global Options
| Option | Description |
|---|---|
--config-dir <path> | Use a custom config directory instead of ~/.vibe-analyzer/ |
--help | Show help |
--version | Show version |
Commands
source — Source Management
Add, remove, and list knowledge sources.
vibe-analyzer source <subcommand>
| Subcommand | Description |
|---|---|
add <path> | Adds a new directory or file to the sources list. The path is automatically converted to absolute |
remove --target <path> | Removes a source from the configuration. Accepts full path or unique directory name |
list | Shows all added sources with absolute paths |
Examples:
# Add a project
vibe-analyzer source add /home/user/projects/my-app
# Add a documentation directory
vibe-analyzer source add /home/user/docs
# Remove by full path
vibe-analyzer source remove --target /home/user/projects/my-app
# Remove by directory name (if unique)
vibe-analyzer source remove --target my-app
# List all sources
vibe-analyzer source list
Example source list output:
Configured sources:
- /Users/keygenqt/Documents/Gitcode/Projects/vibe-analyzer
- /home/user/projects/my-backend
scan — Scanning and Indexing
Extract code structure via AST parsing, optional LLM enrichment, and OpenSearch indexing.
vibe-analyzer scan <subcommand>
| Subcommand | Description |
|---|---|
ast | AST parsing only. Fast code structure extraction without LLM. Results can be exported to a file |
analyze | Full cycle: AST parsing → LLM enrichment. Does not index to OpenSearch automatically. Results can be exported to a file |
index | OpenSearch indexing. Runs scan analyze with incremental updates and writes results to indices |
scan ast
vibe-analyzer scan ast [options]
| Option | Description |
|---|---|
--target <path> | Process a specific source. If not specified — all sources are processed |
--format <format> | Export format: json (default), json5, toon, xml |
-o, --output <path> | Export path. If not specified — file is created in ~/Downloads/ |
Examples:
# AST for all sources
vibe-analyzer scan ast
# AST for a specific project
vibe-analyzer scan ast --target my-app
# AST with JSON5 export
vibe-analyzer scan ast --target my-app --format json5
# AST export to a specific file
vibe-analyzer scan ast --target my-app --format json --output /path/to/output.json
scan analyze
vibe-analyzer scan analyze [options]
| Option | Description |
|---|---|
--target <path> | Process a specific source. If not specified — all sources are processed |
--format <format> | Export format: json (default), json5, toon, xml |
-o, --output <path> | Export path. If not specified — file is created in ~/Downloads/ |
Before enrichment, it checks:
- That Ollama hosts are configured
- That all Ollama servers are reachable (healthcheck)
- Model warm-up on all servers
If any server is unavailable — the command fails with an error.
Examples:
# Full cycle for all sources
vibe-analyzer scan analyze
# Full cycle for a specific project
vibe-analyzer scan analyze --target my-app
# With export
vibe-analyzer scan analyze --target my-app --format json5
scan index
vibe-analyzer scan index [options]
| Option | Description |
|---|---|
--target <path> | Index a specific source. If not specified — all sources |
--force | Force full reindexing. Ignores hashes and processes all files again |
What scan index does:
- Checks OpenSearch availability
- Cleans up orphaned data (OpenSearch documents no longer in the source)
- If not
--force— loads hashes of already indexed files for incremental update - Runs
scan_enriches(AST + LLM), skipping files with unchanged hashes - Indexes project metadata (meta)
- Indexes file contents (files)
- Indexes file analysis (files_analysis)
- Prints a report
Example scan index output:
Indexing completed. Sources: 1, Files: 287, Analysis: 242 (took 45.3s)
Or, if all files are already indexed:
Index is up to date — all files are already indexed and database is in sync
Examples:
# Incremental indexing for all sources
vibe-analyzer scan index
# Index a specific project
vibe-analyzer scan index --target my-app
# Force full reindexing
vibe-analyzer scan index --target my-app --force
stats — Statistics
View information and statistics for indexed projects.
vibe-analyzer stats <subcommand>
stats info
vibe-analyzer stats info [options]
| Option | Description |
|---|---|
--target <path> | Show statistics for a specific project. If not specified — all projects |
Requires an active OpenSearch connection and indexed data.
Example output:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Language Files Lines AST Objects Size
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Rust 194 15146 1631 498.26 KB
Markdown 33 2884 296 102.64 KB
Python 4 163 23 2.85 KB
TypeScript 1 125 14 1.90 KB
Java 1 115 14 1.71 KB
Swift 1 99 13 1.35 KB
Kotlin 1 98 14 1.38 KB
C# 1 97 13 1.58 KB
ArkTs 1 83 9 1.11 KB
JavaScript 1 73 12 1.07 KB
Dart 1 68 13 1.09 KB
Go 1 63 12 895 B
Bash 1 55 9 882 B
Batch 1 48 9 748 B
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Other 45 5776 137.72 KB
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total 287 24893 2082 755.11 KB
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Stats generated in 0.5s
Columns:
| Column | Description |
|---|---|
Language | Programming language. Other — files without AST (images, configs, binaries) |
Files | Number of files for this language |
Lines | Total line count |
AST Objects | Number of extracted AST elements (functions, classes, structs, enums, interfaces, variables, imports, Markdown headings) |
Size | Total file size |
stats tree
vibe-analyzer stats tree --target <project> [options]
| Option | Description |
|---|---|
--target <path> | Project to display the tree for (required) |
-L, --level <number> | Maximum tree depth (default: 3) |
Requires an active OpenSearch connection and indexed data. The tree is built from indexed paths, not by traversing the filesystem.
Example output:
vibe-analyzer
|-- Cargo.lock
|-- Cargo.toml
|-- LICENSE
|-- README.crates.md
|-- README.md
|-- book
| |-- book.toml
| `-- src
| |-- SUMMARY.md
| `-- index.html
|-- data
| |-- other
| | `-- logo.png
| |-- preview
| | `-- preview-webui.png
| `-- prompts
| `-- comment-rules.md
|-- docker
| |-- open-webui
| | `-- docker-compose.yml
| `-- opensearch
| `-- docker-compose.yml
|-- src
| |-- analyzer
| | |-- analyzer.rs
| | |-- mod.rs
| | |-- ollama_client.rs
| | `-- ollama_cluster.rs
| |-- cli
| | |-- mod.rs
| | |-- scan.rs
| | |-- serve.rs
| | |-- source.rs
| | `-- stats.rs
| |-- configs
| | `-- ...
| |-- main.rs
| `-- utils
| `-- ...
`-- tests
|-- mcp
| `-- ...
`-- parsers
`-- ...
36 directories, 74 files
Summary at the end of the output with directory count, file count, and build time:
Found 36 directories, 74 files in 0.5s
serve — MCP Server
Start, stop, and check MCP server status.
vibe-analyzer serve <subcommand>
serve start
vibe-analyzer serve start [options]
| Option | Description |
|---|---|
--host <address> | Bind address. Overrides the config value |
--port <port> | Server port. Overrides the config value |
--workdir <path> | Working directory (default: current directory) |
--protocol <version> | MCP protocol version. Overrides the config value |
The server runs in foreground mode. Use a system service manager or terminal multiplexer to run in the background.
All parameters are optional — if not specified, values from config.json5 are used.
Examples:
# Start with config settings
vibe-analyzer serve start
# Start on a specific port
vibe-analyzer serve start --port 9020
# Start on localhost only
vibe-analyzer serve start --host 127.0.0.1 --port 8080
serve stop and serve status
vibe-analyzer serve stop
vibe-analyzer serve status
Note: the
stopandstatuscommands are reserved but not yet implemented.
Export Formats
When using the --format option with scan ast and scan analyze, four formats are available:
| Format | Key | Extension | Description |
|---|---|---|---|
| JSON | json | .json | Compact JSON without extra whitespace — minimal file size |
| JSON5 | json5 | .json5 | JSON5 with comments and trailing commas — human-readable |
| TOON | toon | .toon | TOON format — token-efficient output, optimized for LLMs |
| XML | xml | .xml | XML with pretty-print formatting |
If the export path is not specified via --output, the file is saved to ~/Downloads/ with an auto-generated name.
Exit Codes
| Code | Description |
|---|---|
0 | Successful execution |
1 | Error (invalid parameters, service unavailable, parsing error) |
MCP Tools
Vibe Analyzer provides 11 MCP tools that AI models can call to search code and documentation. Each tool returns a structured response — no document stuffing into the context.
General Concept
In traditional RAG, a search engine finds documents and adds them to the prompt. Vibe Analyzer works differently:
AI model → selects a tool → calls MCP → receives a structured response
Rules for the AI model (embedded in ServerInfo.instructions):
- Use tools, respond concisely
- One call is enough — no need to call multiple tools in sequence
- Only the listed tools
Tool Categories
| Category | Tools | Purpose |
|---|---|---|
| Admin | admin_sync | Reindex all projects |
| Get | get_file_content, get_file_ast | Retrieve file contents and AST |
| Show | show_projects, show_stats, show_tree | Project info: list, statistics, file tree |
| Search — Code | search_by_code_imports, search_by_code_functions, search_by_code_classes, search_by_code_variables | Code search: imports, functions, classes, variables |
| Search — Docs | search_documentation, search_knowledge | Markdown documentation and knowledge base search |
Admin
admin_sync
Triggers reindexing of all projects in the background.
When to call: the user says “update”, “sync”, “reindex”, “refresh”.
Parameters: none.
Response:
{
"result": "Started",
"message": "Indexing started. Projects are updating now."
}
Or, if indexing is already running:
{
"result": "AlreadyRunning",
"message": "Indexing is already running. Please wait."
}
Get
get_file_content
Returns the full contents of a file.
When to call: the user asks to see file contents, open a file.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
path | string | yes | File path. Supports partial matching and wildcards. Can be relative or absolute |
Response:
{
"root": "/path/to/project",
"path": "src/main.rs",
"language": "Rust",
"content": "fn main() {\n println!(\"Hello\");\n}\n"
}
get_file_ast
Returns the full AST of a file: imports, functions, classes, structs, enums, headings.
When to call: the user asks about file structure, functions in a file, AST.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
path | string | yes | File path. Can be relative or absolute |
Response:
{
"root": "/path/to/project",
"path": "src/main.rs",
"language": "Rust",
"ast": {
"header_comments": ["Vibe Analyzer - Main entry point."],
"imports": ["clap::Parser", "crate::cli::scan::ScanAction"],
"functions": [{ "signature": "async fn main()", "comments": [] }],
"structs": [{ "signature": "struct App", "comments": [] }],
"enums": [{ "signature": "enum Commands", "comments": [] }],
"tags": [
"functions",
"функции",
"函数",
"structs",
"структуры",
"结构体",
"enums",
"перечисления",
"枚举",
"imports",
"импорты",
"导入",
"header_comments"
]
}
}
Show
show_projects
Shows all indexed projects with names and brief descriptions.
When to call: the user asks “what projects are available”, “list projects”.
Parameters: none.
Response:
{
"projects": [
{
"path": "/path/to/project",
"name": "vibe-analyzer",
"summary": "Agentic RAG engine for code and knowledge bases"
}
],
"total": 1
}
show_stats
Shows project statistics: language breakdown, file count, lines of code, AST objects.
When to call: the user asks about statistics, file count, codebase size.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
target | string | no | Project name. If not specified — statistics for all projects |
Response:
{
"target": null,
"languages": [
{
"language": "Rust",
"files": 194,
"lines": 15146,
"ast_objects": 1631,
"size_human": "498.26 KB"
},
{
"language": "Markdown",
"files": 33,
"lines": 2884,
"ast_objects": 296,
"size_human": "102.64 KB"
}
],
"total": { "files": 287, "lines": 24893, "ast_objects": 2082, "size_human": "755.11 KB" }
}
show_tree
Shows the file and directory tree of a project.
When to call: the user asks about project structure, file tree, folders.
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
target | string | no | all projects | Project name |
level | integer | no | 3 | Maximum depth (1–10) |
Response:
{
"target": "vibe-analyzer",
"tree": "vibe-analyzer\n|-- Cargo.toml\n|-- src\n| |-- main.rs\n| |-- cli\n| | |-- mod.rs\n| | `-- scan.rs\n| `-- utils\n| `-- ...\n`-- tests\n `-- ...",
"total_files": 74,
"total_dirs": 36
}
Search — Code
search_by_code_imports
Finds imports and dependencies in code.
When to call: the user asks about imports, dependencies, libraries used. For “all imports”, use an empty query or *.
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | yes | — | Search query |
target | string | no | all projects | Project name |
limit | integer | no | 3 | Maximum results |
Response:
{
"query": "serde",
"results": [
{
"project": "/path/to/project",
"path": "src/main.rs",
"language": "Rust",
"header_comments": ["Vibe Analyzer - Main entry point."],
"imports": ["serde::Deserialize", "serde::Serialize"]
}
]
}
search_by_code_functions
Finds functions and methods in code.
When to call: the user asks about functions, methods, procedures.
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | yes | — | Search query |
target | string | no | all projects | Project name |
limit | integer | no | 3 | Maximum results |
Response:
{
"query": "scan_source",
"results": [
{
"project": "/path/to/project",
"path": "src/scanner/scanner.rs",
"language": "Rust",
"header_comments": ["Core scanning functionality for codebase analysis."],
"functions": [
{
"signature": "pub async fn scan_source(...)",
"comments": ["Scans a source and returns complete analysis results"]
}
]
}
]
}
search_by_code_classes
Finds classes, structs, interfaces, and traits.
When to call: the user asks about classes, structs, interfaces, types, traits, abstract classes, implements, extends.
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | yes | — | Search query |
target | string | no | all projects | Project name |
limit | integer | no | 3 | Maximum results |
Response:
{
"query": "AppConfig",
"results": [
{
"project": "/path/to/project",
"path": "src/configs/app.rs",
"language": "Rust",
"header_comments": ["Application configuration management for vibe-analyzer."],
"classes": [],
"structs": [
{
"signature": "pub struct AppConfig",
"comments": ["Main application configuration structure"]
}
],
"interfaces": []
}
]
}
search_by_code_variables
Finds variables, constants, and enums.
When to call: the user asks about variables, constants, enums, global variables, static fields.
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | yes | — | Search query |
target | string | no | all projects | Project name |
limit | integer | no | 3 | Maximum results |
Response:
{
"query": "MAX_SIZE",
"results": [
{
"project": "/path/to/project",
"path": "src/utils/constants.rs",
"language": "Rust",
"header_comments": ["Application constants and configuration defaults."],
"variables": [
{
"signature": "pub const MAX_AST_FILE_SIZE: u64",
"comments": ["Maximum file size for AST parsing (10 MB)"]
}
],
"enums": []
}
]
}
Search — Docs
search_documentation
Searches all Markdown documentation files. This is the default tool for non-code questions.
When to call: “who is”, “what is”, “how does”, “rules”, “processes”, “guides”, “legends” questions.
Search priority: Markdown files with knowledge: true in the frontmatter receive a significant boost (5.0) and appear first. This separates the knowledge base (legends, guidelines) from regular documentation. Example frontmatter:
---
knowledge: true
---
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | yes | — | Search query. Supports Cyrillic, Latin, CJK |
limit | integer | no | 3 | Maximum results |
Response:
{
"query": "architecture",
"results": [
{
"project": "/path/to/project",
"path": "docs/architecture.md",
"frontmatter": { "title": "Vibe Analyzer Architecture" },
"headings": [
{ "level": 1, "title": "Architecture", "preview": "Overview of Vibe Analyzer's design" }
],
"links": [{ "text": "Quick Start", "url": "./getting-started.md" }],
"code_blocks": ["bash", "rust"]
}
]
}
search_knowledge
Alias for search_documentation. Completely identical in parameters and response.
When to call: the user asks about the knowledge base, guidelines, standards, characters.
Anti-Hallucination Protection
Tool Name Aliases (160+)
Models often distort tool names. AliasHandler intercepts the call and replaces the name with the correct one:
ALIAS_HANDLER: Resolving 'search_functions' -> 'search_by_code_functions'
Parameter Normalization
| Mechanism | Example |
|---|---|
| Wildcard replacement | * and ? in query → space |
| Whitespace trimming | " search query " → "search query" |
query: "*" handling | Returns None (all elements) |
limit capping | Always in 1–10 range. Values ≤ 1 → default (3) |
| Fuzzy path matching | Partial match and wildcards for path in get_file_content |
target normalization | Search by exact path or unique directory name |
Auto Language Detection
When searching documentation, the system detects scripts in the query:
- Cyrillic → search using Russian tags
- Latin → search using English tags
- CJK → search using Chinese tags
Mixed queries search across all detected scripts simultaneously.
Soft Error Handling
Invalid parameters don’t cause errors, they are normalized to safe values:
- Invalid
target→ search across all projects limit> 10 → capped to 10- Non-existent
path→ returns an empty result, not an error
AST Parsing
Vibe Analyzer uses tree-sitter — an incremental parser that builds a concrete syntax tree (CST) from source code. The CST is then traversed to extract meaningful elements: functions, classes, imports, variables, documentation, and multilingual search tags.
How It Works
Source code → tree-sitter → CST → recursive traversal → AstData (structured data + tags)
Process for each file:
- Parser selection — the appropriate
LanguageParseris chosen from the staticPARSERSregistry based on file extension - CST parsing — tree-sitter builds the tree
- Recursive traversal — the
visit_nodefunction traverses all nodes and collects meaningful elements - Post-processing — deduplication, sorting, trimming function/class bodies
- Tag generation —
with_tags()adds multilingual tags (EN/RU/ZH)
Parser Registry
| Extensions | Parser |
|---|---|
rs | RustParser |
py | PythonParser |
js, jsx | JavaScriptParser |
ts, tsx | TypeScriptParser |
java | JavaParser |
go | GoParser |
cs | CSharpParser |
kt, kts | KotlinParser |
swift | SwiftParser |
dart | DartParser |
sh, bash, zsh | BashParser |
bat, cmd | BatchParser |
ets, arkts | ArkTsParser |
md, markdown | MarkdownParser |
Each parser implements the LanguageParser trait:
pub trait LanguageParser: Send + Sync {
fn parse(&self, content: &str) -> Result<AstData>;
fn language_name(&self) -> &'static str;
}
AstData Structure
pub struct AstData {
pub header_comments: Vec<String>, // Module comments
pub imports: Vec<String>, // Imports
pub variables: Vec<AstDataVariable>, // Variables/constants
pub functions: Vec<AstDataFunction>, // Functions/methods
pub classes: Vec<AstDataClass>, // Classes
pub structs: Vec<AstDataStruct>, // Structs
pub enums: Vec<AstDataEnum>, // Enums
pub interfaces: Vec<AstDataInterface>, // Interfaces/traits
pub frontmatter: Option<HashMap<String, String>>, // Frontmatter (Markdown)
pub headings: Vec<AstHeading>, // Headings (Markdown)
pub links: Vec<AstLink>, // Links (Markdown)
pub code_blocks: Vec<String>, // Code block languages (Markdown)
pub tags: Vec<String>, // Multilingual tags
}
Each element (function, class, etc.) has a signature and doc comments:
pub struct AstDataFunction {
pub signature: String, // "pub async fn scan_source(source_path: &Path, ...)"
pub comments: Vec<String>, // ["Scans a source and returns complete analysis results"]
}
Comment Extraction
Vibe Analyzer distinguishes three types of comments:
1. Header Comments
Describe the purpose of the entire file. Stored in header_comments.
Detection rules:
- The comment is at the beginning of the file (first node in the CST)
- Or all preceding sibling nodes are also comments (
is_module_comment) - Not inside a function or class
Syntax by language:
| Language | Syntax | Parser |
|---|---|---|
| Rust | //! or /*! */ | visit_node → line_comment/block_comment starts with //! |
| Python | """...""" at the beginning | visit_node → expression_statement → string (first node) |
| JS/TS/ArkTS | /** */ at the beginning | visit_node → comment starts with /**, first sibling |
| Java/Kotlin | /** */ at the beginning | visit_node → block_comment starts with /**, first sibling |
| C# | /** */ or /// at the beginning | visit_node → comment starts with /** or /// |
| Swift | /// at the beginning | visit_node → comment starts with ///, first or after another comment |
| Dart | /// at the beginning | visit_node → comment starts with ///, is_module_comment |
| Go | // at the beginning | visit_node → comment starts with //, first sibling |
| Bash | ## | visit_node → comment starts with ## |
| Batch | :: | visit_node → comment starts with ::, not inside a label |
2. Doc Comments
Describe a specific element (function, class, struct). Stored in the comments field of the corresponding object.
Extraction algorithm:
- Take the target node (function, class, etc.)
- Walk backwards through sibling nodes
- Collect all doc comments, skipping attributes/annotations
- Stop at the first non-doc node
- Reverse the list (from farthest to closest)
Syntax by language:
| Language | Syntax | Extraction function |
|---|---|---|
| Rust | ///, /** */, /*! */ | extract_rust_doc_comments — walks prev_sibling, skips attribute_item |
| Python | """...""" (docstring) | extract_python_docstring — finds first string in block → expression_statement |
| JavaScript | /** */ (JSDoc) | extract_js_doc_comments — walks prev_sibling |
| TypeScript | /** */ (JSDoc) | extract_ts_doc_comments — walks prev_sibling |
| Java | /** */ (Javadoc) | extract_java_doc_comments — walks prev_sibling, only block_comment |
| Kotlin | /** */ (KDoc) | extract_kotlin_doc_comments — finds /** */ in prefix before node |
| C# | /// or /** */ | extract_csharp_doc_comments — walks prev_sibling, skips attribute_list |
| Swift | /// or /** */ | extract_swift_doc_comments — finds /** */ in prefix before node |
| Dart | /// | extract_dart_doc_comments — walks prev_sibling |
| Go | // | extract_go_doc_comments — walks lines in prefix before node |
| Bash | # before function | extract_bash_doc_comments — walks prev_sibling, functions only |
| Batch | :: before label | extract_batch_doc_comments — walks prev_sibling |
| ArkTS | /** */ | extract_arkts_doc_comments — walks prev_sibling, skips export_declaration |
3. Regular Comments
Everything else — //, #, REM, ; — is ignored by the parser.
Signature Extraction
Signatures of functions, classes, and other elements are trimmed at the first delimiter ({, =, ;, :) — the body is not stored.
Language-specific details:
| Language | Detail |
|---|---|
| Python | Trailing : is trimmed from the signature |
| JS/TS | Arrow functions — signature is extracted from the variable declaration with => |
| TS | export prefix is added for exported elements |
| Go | Methods with receivers (func (s *Service) Method()) are detected and extracted as functions |
| Kotlin | Trimmed at { or = for expression bodies |
| Batch | Functions are :label, variables are trimmed from set |
Language-Specific Implementation Details
- Python — does not use
child_by_field_name("body")due to a tree-sitter bug; manual traversal is used instead - Swift — classes, structs, and enums come in a single
class_declarationnode, distinguished by text - TypeScript/ArkTS — doc comments for exported elements are looked up on the parent
export_statement - Java — methods inside classes are not extracted as separate top-level functions
- Go — methods with receivers are detected via
is_methodand are not duplicated - Kotlin —
sealed classis skipped - Batch — variables inside labels are not extracted
- Markdown — frontmatter is parsed manually, heading previews are cleaned of formatting
AST Export
Results can be exported in several formats:
# AST only
vibe-analyzer scan ast
# With export
vibe-analyzer scan ast --target my-app --format json5 --output analysis.json5
Supported formats: JSON, JSON5, TOON, XML.
LLM Enrichment
After AST parsing, Vibe Analyzer can enrich results via Ollama: adding a description and search tags to each file, and a brief summary to the project based on the README.
How It Works
AST data → batching → Ollama request → description + tags for each file
1. Project Summarization
The README (if present) is sent to Ollama with the prompt “write 2-3 sentences about the project”. The result is saved in summary.
2. File Enrichment
Batching. Files are grouped into batches based on two config limits:
max_chunk_chars— maximum characters per request (default 4000)max_chunk_files— maximum files per request (default 3)
The prompt does not contain the files themselves, but their AST: functions, classes, structs, and other elements. Which elements to include is controlled by the flags ast_imports, ast_variables, ast_functions, ast_enums, ast_interfaces.
Prompt. Ollama receives a JSON template:
{
"files": [
{
"path": "src/main.rs",
"description": "FILL_DESCRIPTION",
"tags": ["TAG1", "TAG2", "TAG3"]
}
]
}
The model must fill in description and tags while preserving the structure. The prompt strictly requires: don’t skip files, don’t change paths, copy the JSON as-is.
Response. Ollama returns the completed JSON:
{
"files": [
{
"path": "src/main.rs",
"description": "Main entry point for the CLI application",
"tags": ["entry-point", "cli", "argument-parsing"]
}
]
}
3. Parallel Processing
If multiple Ollama hosts are configured, files are distributed among them:
- All hosts read from a single channel
- The fastest one processes the most
- If any host errors — all stop (
error_flag) - At the end, per-host statistics are reported: how many files each processed
4. JSON Repair
LLMs often corrupt JSON: add comments, wrap in markdown blocks, drop quotes. clean_llm_json fixes this:
- Extracts JSON from ``` blocks
- Adds missing key quotes
- Removes trailing commas
- Balances unclosed braces
5. Retries
If Ollama returns fewer files than were in the batch — up to 5 retries with delay. If after 5 attempts it still doesn’t match — an error with recommendations to reduce max_chunk_chars or switch models.
Generation Parameters
The following are passed from config to the Ollama request:
temperature: 0.1— low temperature for stable resultsseed: 42— fixed seed for reproducibilitynum_ctx: 4096— context window sizenum_predict: 2048— maximum tokens in responsetimeout_secs: 60— request timeout
Model Warm-Up
Before enrichment begins, for each Ollama host:
- Availability check (
GET /) - Model presence check (
GET /api/tags) - Empty request to load the model into memory (
POST /api/generatewith empty prompt)
Enrichment Result
After processing, each file receives description and tags from the LLM, and the project receives a summary based on the README.
Exporting Results
AST parsing and LLM enrichment results can be exported to a file for analysis, debugging, or use in other tools:
# AST with LLM enrichment
vibe-analyzer scan analyze --target my-app
# With format and path specified
vibe-analyzer scan ast --target my-app --format json5 --output analysis.json5
Search and Indexing
Vibe Analyzer stores all data in OpenSearch and uses multilingual analyzers for search.
Three Indices
Three indices are created for each project:
| Index | Purpose | Contents |
|---|---|---|
vibe_meta | Metadata | 1 document per project: summary, license, README, statistics |
vibe_files_{hash} | Content | One document per file: full contents (not indexed for search, store only) |
vibe_files_analysis_{hash} | Search | One document per text file: AST, description, tags |
Multilingual Search
OpenSearch is configured with three analyzers:
russian_analyzer(typerussian) — stemming for Russianenglish_analyzer(typeenglish) — stemming for Englishchinese_analyzer(typechinese) — segmentation for Chinese
Each text field in vibe_files_analysis has three sub-fields — one per analyzer. This allows searching for “функции”, “functions”, and “函数” with correct morphology for each language.
Search Mechanics
Documentation Search (search_documentation)
The most complex query. Algorithm:
- Script detection in the query — Cyrillic, Latin, CJK
- Word extraction (longer than 2 characters)
- Wildcard search on headings with 10.0 boost + stemming for long words
- Language-specific match queries — for each detected script, a separate query to the corresponding sub-field with fuzziness
- Boost for knowledge documents — if the frontmatter contains
knowledge: true, the document gets a 5.0 boost
Ranking priority:
- Headings (
headings.title) — 10.0 boost - Preview (
headings.preview) — 2.0 boost - Links (
links.text) — 2.0 boost - Tags (
tags) — 1.0 boost
Code Search
Each search type has its own strategy:
- Imports —
matchon theast.importsfield + tags - Functions —
match_phrase_prefixon signatures +matchon comments (nested queries) - Classes/structs/interfaces — three nested queries in
shouldwithminimum_should_match: 1 - Variables/enums —
matchon signatures and comments (nested queries)
All code searches use fuzziness: AUTO for fuzzy matching and boost tags higher than specific fields.
Incremental Indexing
Vibe Analyzer doesn’t re-index files unnecessarily:
- Fetching hashes from OpenSearch via Scroll API —
GET /{index}/_search?scroll=1m - Comparison — a BLAKE3 hash is computed for each file and compared against the indexed one
- Skipping unchanged — files with matching hashes are not processed
If the --force flag is passed, hashes are ignored — all files are indexed.
Bulk Indexing
All documents are written to OpenSearch in batches via the Bulk API in NDJSON format:
{"index": {"_index": "vibe_files_xxx", "_id": "src/main.rs"}}
{"root": "/project", "path": "src/main.rs", "content": "..."}
{"index": {"_index": "vibe_files_xxx", "_id": "src/lib.rs"}}
{"root": "/project", "path": "src/lib.rs", "content": "..."}
The document ID is the file path (path). This ensures that re-indexing updates the existing document rather than creating a duplicate.
Orphaned Data Cleanup
cleanup runs automatically during indexing:
- Index removal for deleted projects
- Document removal for files no longer on disk (comparing paths in the index and on the filesystem)
- Meta-document removal for projects removed from the configuration
Project Statistics
show_stats_search collects aggregated statistics across all indexed files via the Scroll API. This enables:
- Project reports — language breakdown, file count, lines, AST objects
- Data presence checks — if statistics are empty, indexing hasn’t been performed or the project hasn’t been added
- Codebase size estimation — total size, text and binary file counts
Aggregation runs across all documents from files_analysis:
- Language grouping (via
get_language_name) - AST object counting: sum of functions, classes, structs, enums, interfaces, variables, imports, headings, links, code blocks
Other— files without a detectable language- Languages sorted by lines of code descending
Integrations
Vibe Analyzer provides an MCP server that AI assistants can connect to via the Model Context Protocol. Once connected, the model gains 11 tools for searching code and documentation.
How It Works from the User’s Perspective
The user communicates with the AI assistant in natural language. The model decides which tool to call. Examples from real testing scenarios:
Code Search
| User Query | Tool | What Happens |
|---|---|---|
“Find add functions in the samples project” | search_by_code_functions | Searches for functions with add in the signature, returns files and signatures |
| “What classes are in samples?” | search_by_code_classes | Returns all classes, structs, interfaces |
| “Show all enums in samples” | search_by_code_variables | Enums are also searched through this tool |
| “What libraries are used in samples?” | search_by_code_imports | List of all imports in the project |
“List files that have the MAX_VALUE constant” | search_by_code_variables | Search by constant name |
File Viewing
| User Query | Tool |
|---|---|
“Show the contents of src/main.rs” | get_file_content |
“Show the structure of main.py” | get_file_ast |
“What functions are in src/main.rs?” | get_file_ast |
“Open utils.py” | get_file_content |
Documentation and Knowledge Base Search
| User Query | Tool |
|---|---|
| “Who is Zizikosh?” | search_documentation |
| “Tell me about Kukyrbur’s abilities” | search_documentation |
| “Find Python coding guidelines” | search_documentation |
| “Show the release process” | search_documentation |
| “Find the code review checklist” | search_documentation |
Project Navigation
| User Query | Tool |
|---|---|
| “What projects are in the database?” | show_projects |
| “Show the tree of the samples project” | show_tree |
| “How many files are in knowledge?” | show_stats |
| “Show overall statistics for all projects” | show_stats |
Administration
| User Query | Tool |
|---|---|
| “Update the index” | admin_sync |
| “Reindex projects” | admin_sync |
How to Phrase Queries
The model understands queries in natural language. You don’t need to use exact tool names — plain language is enough.
Good:
- “Find add functions in the samples project”
- “What classes are in samples?”
- “Show the contents of src/main.rs”
- “Who is Zizikosh?”
Unnecessary (the model will understand via AliasHandler anyway, but it’s better to avoid):
- “Call search_by_code_functions with query=add”
- “Use the get_file_content tool for path=src/main.rs”
Important Notes
- Project names — you can use the full path or directory name:
"samples"or"/path/to/samples" - File paths — relative to the project root:
"src/main.rs", partial matching is supported - Result limit — default 3, maximum 10. If the model requests “all”, the limit is automatically raised
- One call is enough — the model is trained to respond after a single tool call, no need to ask again
Connecting to Open WebUI
-
Start the MCP server:
vibe-analyzer serve start -
In Open WebUI settings, add a new MCP server:
- URL:
http://localhost:9020 - Transport: Streamable HTTP
- URL:
-
Tools appear automatically
Connecting to Claude Desktop
Add to the configuration:
{
"mcpServers": {
"vibe-analyzer": {
"url": "http://localhost:9020",
"transport": "streamable-http"
}
}
}
MCP Protocol
Supported versions: 2024-11-05, 2025-03-26, 2025-06-18, latest. Configured in the settings:
{
"mcp": {
"host": "127.0.0.1",
"port": 9020,
"protocol": "latest"
}
}
Security
- Server without authentication — for trusted networks or localhost
- Default host
127.0.0.1(local only) 0.0.0.0— for access from Docker containers or other machines- Server only reads data,
admin_syncis the only tool that triggers background indexing
Testing
Vibe Analyzer uses two types of tests: unit tests for parsers and end-to-end tests for MCP tools.
Parser Unit Tests
Each of the 13 languages has a test that verifies AST parsing correctness using snapshot testing:
Source file → parser → AST → comparison with reference JSON
Example test (Rust):
#[test]
fn test_rust_parser() {
let code = fs::read_to_string("tests/parsers/fixtures/rust/sample.rs").unwrap();
let json = fs::read_to_string("tests/parsers/fixtures/rust/sample.json").unwrap();
let expected: serde_json::Value = serde_json::from_str(&json).unwrap();
let ast = parse_ast(&code, "rs").unwrap().unwrap();
let actual = serde_json::to_value(&ast).unwrap();
assert_eq!(actual, expected);
}
Fixture structure:
tests/parsers/fixtures/
├── rust/
│ ├── sample.rs ← source code
│ └── sample.json ← expected AST
├── python/
│ ├── sample.py
│ └── sample.json
├── markdown/
│ ├── sample.md
│ └── sample.json
└── ... (a pair of files per language)
All parser tests:
| Test | File | Language |
|---|---|---|
test_rust_parser | rust_test.rs | Rust (3 tests: sample, sample2, sample3) |
test_python_parser | python_test.rs | Python |
test_javascript_parser | javascript_test.rs | JavaScript |
test_typescript_parser | typescript_test.rs | TypeScript |
test_java_parser | java_test.rs | Java |
test_go_parser | go_test.rs | Go |
test_csharp_parser | csharp_test.rs | C# |
test_kotlin_parser | kotlin_test.rs | Kotlin |
test_swift_parser | swift_test.rs | Swift |
test_dart_parser | dart_test.rs | Dart |
test_bash_parser | bash_test.rs | Bash |
test_batch_parser | batch_test.rs | Batch |
test_arkts_parser | test_arkts.rs | ArkTS |
test_python_parser | markdown_test.rs | Markdown |
Run:
cargo test --test parsers_test
End-to-End MCP Tool Tests
E2E tests verify the full cycle: an AI model receives a query, selects a tool, calls it, and returns a response.
How It Works
Scenario (JSON) → Ollama model → MCP tool call → result verification
Two-turn dialog:
- Turn 1 (with tools): the model receives a query and must call exactly one tool
- Turn 2 (without tools): the model receives the tool result and must provide a final text response
If the model calls a second tool instead of responding — it’s an error.
Test Scenarios
Scenarios are stored in JSON files:
tests/mcp/fixtures/scenarios/
├── admin_sync.json
├── get_file_ast.json
├── get_file_content.json
├── search_by_code_classes.json
├── search_by_code_functions.json
├── search_by_code_imports.json
├── search_by_code_variables.json
├── search_documentation.json
├── show_projects.json
├── show_stats.json
└── show_tree.json
Example scenario (search_by_code_functions.json):
{
"tool": "search_by_code_functions",
"queries": [
"Find add functions in the 'samples' project",
"What methods are in 'samples'",
"Show all main functions in 'samples'",
"Find calculate functions in 'samples'",
"List files that have the multiply function"
]
}
Each scenario contains 5 queries in Russian and English — simple, one-sentence, without specifying the exact tool name.
Models for Testing
const MODELS: &[&str] = &[
"qwen2.5-coder:3b-instruct",
"qwen2.5-coder:7b-instruct",
"qwen2.5-coder:14b-instruct",
];
By default, tests run on qwen2.5-coder:3b-instruct — the smallest model that should work correctly.
Extracting JSON from Model Responses
The model may return a response in different formats. extract_json handles all variants:
| Response Format | Handling |
|---|---|
```json { ... } ``` | Extracted from the markdown block |
``` { ... } ``` | Extracted from the block without a language specifier |
{ ... } | Used as-is |
Parsing Tool Calls
parse_tool_call looks for the tool name in several JSON fields (models name them differently):
let name = parsed
.get("name") // standard
.or_else(|| parsed.get("function")) // OpenAI-style
.or_else(|| parsed.get("tool")) // alternative
.or_else(|| parsed.get("method")) // another variant
.or_else(|| parsed.get("call")); // and another
Test Infrastructure
A custom framework was developed for E2E tests that automatically sets up the entire environment:
- OpenSearch — via Docker Compose with fixtures from
tests/mcp/fixtures/opensearch/docker-compose.yml - MCP server — started automatically on port 9021
- Fixtures — test projects
samplesandknowledgewith legendary characters - Ollama — must be running beforehand with the required model
The framework manages the entire lifecycle: starting services, indexing fixtures, running scenarios, saving reports, and stopping the environment on completion.
Reports
After each query, an intermediate report is saved; after each scenario, a final one:
{
"test_name": "search_by_code_functions",
"model": "qwen2.5-coder:3b-instruct",
"timestamp": "2026-04-28T12:00:00Z",
"queries": [
{
"query": "Find add functions in the 'samples' project",
"tool_calls": [
{
"name": "search_by_code_functions",
"args": "{\"query\":\"add\",\"target\":\"samples\"}",
"result": "[{...}]"
}
],
"response": "Found function add in file src/lib.rs...",
"duration_ms": 1234
}
],
"summary": {
"total_queries": 5,
"successful_tool_calls": 5,
"total_duration_ms": 6170,
"avg_response_time_ms": 1234
}
}
Running
# Parser unit tests only (fast)
cargo test --test parsers_test
# Full E2E tests (require Docker + Ollama)
cargo test --test mcp_test -- --ignored --nocapture
Logging
Tests write a structured log to tests/reports/<timestamp>/mcp_test.log and simultaneously output to the terminal. Output is filtered by level: INFO shows progress, DEBUG shows model responses, TRACE shows everything including raw docker and MCP server output.
Expected Model Behavior
The test verifies that the model:
- Called a tool on the first turn — if not, error
Model did not call a tool - Did not call a non-existent tool — if it did, error
TOOL_NOT_FOUND - The tool returned a non-null result — if null, error
tool returned null - Provided a text response on the second turn — if it called another tool, error
Model called second tool
Recipes
Practical examples of using Vibe Analyzer through an AI assistant.
Exploring an Unfamiliar Project
Goal: you’re seeing a project for the first time and want to quickly understand its structure.
Assistant dialog:
You: What projects are in the database?
AI: [show_projects] Project "backend" available — payment processing service.
You: Show the structure of backend
AI: [show_tree] backend/
├── src/
│ ├── main.rs
│ ├── api/
│ │ ├── mod.rs
│ │ ├── payments.rs
│ │ └── refunds.rs
│ ├── db/
│ │ ├── mod.rs
│ │ └── models.rs
│ └── config.rs
├── migrations/
├── tests/
└── Cargo.toml
You: Show statistics for backend
AI: [show_stats]
Rust 45 files 8,200 lines 420 AST objects
SQL 12 files 1,500 lines —
Markdown 5 files 300 lines 25 AST objects
Total 62 files 10,000 lines 445 AST objects
You: What are the main functions in api/payments.rs?
AI: [get_file_ast for api/payments.rs]
- process_payment(amount, currency)
- validate_card(card_number)
- create_transaction(order_id)
- refund_payment(transaction_id)
Finding Dependencies
Goal: find out what libraries are used in a project.
You: What libraries are used in backend?
AI: [search_by_code_imports]
- serde (serialization)
- tokio (async runtime)
- sqlx (PostgreSQL)
- reqwest (HTTP client)
- stripe (payment API)
You: Where is stripe used?
AI: [search_by_code_imports with query="stripe"]
- src/api/payments.rs
- src/api/refunds.rs
- src/api/webhooks.rs
Finding an Implementation
Goal: find where a specific function is implemented.
You: Find the calculate_total function
AI: [search_by_code_functions with query="calculate_total"]
Found in src/orders.rs:
pub fn calculate_total(items: &[OrderItem]) -> f64
"Calculates total order amount with discounts applied"
Documentation Search
Goal: find rules or processes.
You: Find Python coding guidelines
AI: [search_documentation with query="Python coding guidelines"]
Found in docs/guidelines/python.md:
# Python Coding Guidelines
Follow PEP 8. Use snake_case, type annotations, docstrings.
You: What is the release process?
AI: [search_documentation with query="release process"]
Found in docs/process/release.md:
# Release Process
1. All milestone tasks closed
2. Create a release branch from develop
3. Update version in Cargo.toml
4. Merge into main
5. Notify the team
Knowledge Base Search
Goal: find information about project characters or legends.
You: Who is Kirikur?
AI: [search_documentation with query="Kirikur"]
Found in knowledge/legends/kirikur_programmer.md:
# Kirikur - Programmer
Backend developer, stack: Rust, Python, PostgreSQL.
Works at a food delivery startup.
Rewrote legacy code in Rust in 3 months.
Favorite phrase: "It works on my machine"
You: What is Bakobur's dream?
AI: [search_documentation with query="Bakobur's dream"]
Found in knowledge/legends/bakobur_homeless.md:
Dream: Warm socks and a doghouse for Sharik
Incremental Update
Goal: update the index after code changes.
You: Update the index
AI: [admin_sync]
Indexing started. Projects are updating.
# After some time, you can check:
You: Show backend statistics
AI: [show_stats]
Rust 47 files 8,500 lines 445 AST objects
...
Inspecting a Specific File
Goal: view the contents or structure of a specific file.
You: Show the contents of Cargo.toml
AI: [get_file_content]
[package]
name = "backend"
version = "0.1.0"
edition = "2021"
[dependencies]
serde = { version = "1", features = ["derive"] }
tokio = { version = "1", features = ["full"] }
sqlx = { version = "0.7", features = ["postgres", "runtime-tokio"] }
You: What structs are in db/models.rs?
AI: [get_file_ast]
- struct Order { id, user_id, status, total, created_at }
- struct OrderItem { id, order_id, product_id, quantity, price }
- enum OrderStatus { Pending, Confirmed, Shipped, Delivered, Cancelled }
Tips
- Start broad —
show_projects→show_tree→show_stats - Refine with search —
search_by_code_functions,search_documentation - Inspect details —
get_file_content,get_file_ast - Update the index after changes —
admin_sync - Use natural language — the model will choose the right tool automatically