Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Landing

<title>Vibe Analyzer</title>

<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta property="og:title" content="Vibe Analyzer" />
<meta property="og:description" content="Agentic RAG engine for code and knowledge bases" />

<link rel="icon" href="assets/images/favicon.png" />
Vibe
Analyzer
Agentic RAG engine for code and knowledge bases
AST extraction, LLM enrichment, OpenSearch indexing. 11 MCP tools give AI models agency over search — they decide what to find and how. Minimal context, no embeddings, single tool call = complete answer.
Get Started

# Quick Start

## 1. Install OpenSearch ## 2. Install Open WebUI

## 3. Install Vibe Analyzer cargo install vibe-analyzer

## 4. Add source — project or knowledge vibe-analyzer source add {path}

## 5. Index projects vibe-analyzer scan index

## 6. Start MCP vibe-analyzer serve start

Introduction

What is Vibe Analyzer

Vibe Analyzer is an Agentic RAG engine for codebases and knowledge bases. It extracts structure from source code via AST parsing, enriches it with LLM, and indexes everything into OpenSearch. AI assistants access knowledge through 11 MCP tools.

The Problem with Traditional RAG

Traditional RAG works like this:

Query → Embeddings → Find similar documents → Load into prompt → Response

Problems:

  • 📈 Found documents are added to the prompt in their entirety
  • 💾 The larger the project, the more VRAM is required
  • 🔍 Relevance drops as context volume grows
  • 💸 Each query becomes more expensive

How Agentic RAG Works

Vibe Analyzer flips the paradigm:

Query → AI model selects an MCP tool → Tool returns a structured response

Advantages:

  • 📉 Minimal context — the model receives only what the tool returns
  • 🧠 No embeddings — keyword and AST search via OpenSearch
  • 🔗 One tool call = complete answer, no document stuffing
  • ♾️ Context size stays constant regardless of project size

Key Features

  • 🌳 AST parsing for 13 programming languages
  • 💡 LLM enrichment: descriptions and search tags for each file
  • 📄 Export AST and AST+LLM to JSON, JSON5, TOON, XML
  • 📝 Semantic and morphological search across code and documentation
  • ⚡ Incremental indexing (modified files only)
  • 📦 Self-contained tools (one call — complete response)
  • 🗂️ Multilingual support (RU, EN, ZH)
  • 🦀 Built in Rust — fast and memory-efficient

Anti-Hallucination Protection

To prevent AI models from making up parameters and tool names:

  • ✅ Soft parameter validation
  • 🛡️ Input parameter normalization
  • 📋 Optimized tool descriptions
  • 🏷️ 150+ aliases for tool names
  • 🌐 Automatic query language detection
  • 🧪 Full-cycle end-to-end tests
  • 📐 Tested on models from 3B parameters

Who This Is For

  • Development teams — index your entire codebase, and AI assistants can answer questions about architecture, find functions, and explain module connections
  • Developers under NDA — the entire stack runs locally: OpenSearch, Ollama, MCP server. No data ever leaves to external APIs. Index proprietary code without risk of violating agreements
  • Private projects — models from 3B parameters run on your hardware. No one sees your code or your queries
  • Technical writers — store documentation in Markdown files and search it in any language
  • Open-source projects — give contributors a quick way to understand the code
  • Startups — lower the entry barrier for new developers without cloud API costs

What’s Next

Quick Start

Prerequisites

Vibe Analyzer requires two external services:

  • OpenSearch — storage and search for indexed data
  • Ollama — running LLMs for code enrichment with descriptions and tags

Both services can be run locally via Docker.

Installation

The package is available on crates.io:

cargo install vibe-analyzer

Build from Source

# 1. Clone the repository
git clone https://gitcode.com/keygenqt_vz/vibe-analyzer.git

# 2. Enter the directory
cd vibe-analyzer

# 3. Build
cargo build --release

Build Dependencies

  • Rust toolchain (cargo, rustc)
  • libssl-dev (for TLS)

Starting Services

The repository includes two ready-to-use docker-compose files:

  • docker/opensearch/docker-compose.yml — OpenSearch for indexing and search
  • docker/open-webui/docker-compose.yml — Open WebUI with Ollama for AI assistant connection

OpenSearch:

cd docker/opensearch
docker-compose up -d

Open WebUI (optional, for AI assistant connection):

cd docker/open-webui
docker-compose up -d

Verification

# OpenSearch should respond
curl http://localhost:9200

# Ollama should be accessible
curl http://localhost:11434/api/tags

Configuration

The configuration file is located at ~/.vibe-analyzer/config.json5. It is created automatically with default settings the first time any CLI command is run.

Example Working Configuration

{
  // Configuration version (do not modify)
  "version": "0.0.1",

  // OpenSearch connection
  "opensearch": {
    "host": "http://192.168.1.10:9200"
  },

  // MCP server
  //
  // host — bind address (0.0.0.0 for all interfaces, 127.0.0.1 local only)
  // port — server port (default: 9020)
  // protocol — MCP protocol version (2024-11-05, 2025-03-26, 2025-06-18, or 'latest')
  "mcp": {
    "host": "0.0.0.0",
    "port": 9020,
    "protocol": "latest"
  },

  // Ollama LLM servers
  //
  // host — API endpoint
  // model — model name
  // max_chunk_chars — maximum characters per request
  // max_chunk_files — maximum files per request
  // timeout_secs — request timeout in seconds
  // temperature — generation temperature (0.0 – 1.0)
  // seed — seed for reproducibility
  // num_ctx — context window size
  // num_predict — maximum tokens in response
  // ast_imports — include imports in analysis
  // ast_variables — include variables in analysis
  // ast_functions — include functions in analysis
  // ast_enums — include enums in analysis
  // ast_interfaces — include interfaces in analysis
  "ollama": [
    {
      "host": "http://192.168.1.10:11434",
      "model": "qwen2.5-coder:3b-instruct",
      "max_chunk_chars": 4000,
      "max_chunk_files": 3,
      "timeout_secs": 60,
      "temperature": 0.1,
      "seed": 42,
      "num_ctx": 4096,
      "num_predict": 2048,
      "ast_imports": false,
      "ast_variables": false,
      "ast_functions": true,
      "ast_enums": true,
      "ast_interfaces": true
    },
    {
      "host": "http://localhost:11434",
      "model": "qwen2.5-coder:3b-instruct",
      "max_chunk_chars": 4000,
      "max_chunk_files": 3,
      "timeout_secs": 60,
      "temperature": 0.1,
      "seed": 42,
      "num_ctx": 4096,
      "num_predict": 2048,
      "ast_imports": false,
      "ast_variables": false,
      "ast_functions": true,
      "ast_enums": true,
      "ast_interfaces": true
    }
  ],

  // Knowledge sources for indexing
  "sources": ["/Users/keygenqt/Documents/Gitcode/Projects/vibe-analyzer"]
}

Checking Configuration

# View current settings
cat ~/.vibe-analyzer/config.json5

Adding a Knowledge Source

A source is anything you want to index: a code project, a documentation folder, or both.

# Add a project
vibe-analyzer source add /path/to/your/project

# Add a documentation directory
vibe-analyzer source add /path/to/docs

# List all added sources
vibe-analyzer source list

Scanning and Indexing

Vibe Analyzer provides three commands for different tasks:

Export AST to File

Code structure extraction only, without LLM. The result is saved in JSON/JSON5/TOON/XML:

# All sources
vibe-analyzer scan ast

# A specific source
vibe-analyzer scan ast --target /path/to/your/project

# With format specified
vibe-analyzer scan ast --target /path/to/your/project --format json5

Export AST with LLM Enrichment to File

AST parsing + enrichment via Ollama (descriptions, tags). The result is saved to a file:

vibe-analyzer scan analyze --target /path/to/your/project

Note: enrichment requires a running Ollama with the selected model. If multiple Ollama hosts are configured, files are distributed among them via competing consumers.

Indexing to OpenSearch

Full cycle — AST parsing, LLM enrichment, and writing to OpenSearch for search via MCP tools:

vibe-analyzer scan index --target /path/to/your/project

After indexing, data is ready for search through the MCP server.

Starting the MCP Server

vibe-analyzer serve start

The server starts on the address and port specified in the configuration (default http://0.0.0.0:9020).

Verifying Results

# Project statistics
vibe-analyzer stats info --target /path/to/your/project

# File tree
vibe-analyzer stats tree --target /path/to/your/project

# List all indexed projects
vibe-analyzer stats info

Connecting an AI Assistant

Open WebUI

  1. Make sure Open WebUI is running (see the “Starting Services” section)
  2. In Open WebUI settings, add an MCP server:
    • URL: http://<host>:9020 (as specified in the configuration)
    • Transport: Streamable HTTP
  3. Once connected, the AI model will have 11 tools for searching code and documentation

Incremental Updates

Vibe Analyzer uses BLAKE3 hashes to track changes. When running scan index again, only modified files are processed:

# Reindexing — only changed files are affected
vibe-analyzer scan index --target /path/to/your/project

To force a full reindex, use the --force flag:

vibe-analyzer scan index --target /path/to/your/project --force

The same can be done via the admin_sync MCP tool without restarting the server.

Troubleshooting

OpenSearch Unreachable

# Check Docker container status
docker ps | grep opensearch

Ollama Not Responding

# Check if Ollama is running
curl http://localhost:11434/api/tags

Model Not Installed

# Download the model
ollama pull qwen2.5-coder:3b-instruct

What’s Next

Architecture

Overview

Vibe Analyzer consists of four main components that work sequentially:

Code sources
      │
      ▼
┌─────────────┐
│   Scanner   │  AST parsing, structure extraction
└─────────────┘
      │
      ▼
┌─────────────┐
│   Analyzer  │  LLM enrichment, descriptions, tags
└─────────────┘
      │
      ▼
┌─────────────┐
│   Indexer   │  Writing to OpenSearch
└─────────────┘
      │
      ▼
┌─────────────┐
│  MCP Server │  HTTP API for AI assistants
└─────────────┘

Components

Scanner

The Scanner handles initial source code processing:

  • File system traversal — recursive directory scanning respecting .gitignore and default exclusion patterns (.git, target, node_modules, etc.)
  • Language detection — selects the appropriate tree-sitter parser based on file extension
  • AST parsing — extracts code structure: functions, classes, imports, variables, enums, interfaces, structs
  • Metadata collection — line count, file size, BLAKE3 content hash
  • License detection — searches for a LICENSE file and identifies the license type via askalono and SPDX
  • README detection — priority: root > subdirectories, .md > .txt > no extension
  • Statistics collection — aggregation by language, file count, lines of code

Analyzer

The Analyzer enriches scanner results using LLM:

  • Prompt generation — builds a request for each file containing the AST structure
  • Request distribution — with multiple Ollama hosts configured, files are pushed to a shared channel and workers compete for them (competing consumers). The fastest worker takes the next file, maximizing host utilization
  • Batch processing — files are grouped into batches limited by max_chunk_chars and max_chunk_files
  • Controlled generation — configurable parameters temperature, seed, num_ctx, num_predict for reproducible results
  • Enrichment — LLM adds a description and multilingual search tags to each file
  • Project summarization — a separate request generates a brief description of the entire source

Indexer

The Indexer manages writing data to OpenSearch:

  • Three indices per source:
    • vibe_meta — project metadata (summary, license, statistics, README)
    • vibe_files_{hash} — full file contents
    • vibe_files_analysis_{hash} — AST, enriched descriptions, and search tags
  • Bulk operations — batch writing for maximum performance
  • Incremental updates — BLAKE3 hash comparison, only changed files are re-processed
  • Cleanup — removes stale data no longer present in the source

MCP Server

The MCP server provides an API for AI assistants:

  • Protocol — Model Context Protocol (MCP) via Streamable HTTP transport
  • 11 tools — admin, get, search, and show categories
  • Anti-Hallucination Protection — parameter normalization, tool name aliases, auto language detection
  • Logging — middleware for tracking all requests
  • CORS — cross-origin request support for web interfaces

Indexing Lifecycle

Full Indexing

1. source add → save path to config
2. scan index → check OpenSearch → cleanup orphaned data → AST parsing → LLM enrichment → indexing

Incremental Updates

1. scan index → load hashes from OpenSearch → compare with files on disk
2. New/modified → AST parsing → LLM enrichment → indexing
3. Deleted → removal from OpenSearch
4. Unchanged → skip

Export without Indexing

1. scan ast → traverse files → AST parsing → export to file
2. scan analyze → traverse files → AST parsing → LLM enrichment → export to file

scan ast and scan analyze do not touch OpenSearch — file export only.

OpenSearch Indices

vibe_meta

One document per project: summary, license, README, aggregated statistics (files, lines, size).

vibe_files_{hash}

One document per file: full contents. The content field is not indexed for search — only stored for retrieval via get_file_content.

vibe_files_analysis_{hash}

One document per text file. Contains AST (functions, classes, imports, etc.), file metadata, and multilingual search tags. The description and tags fields are added after LLM enrichment.

Ollama Clustering

When multiple Ollama hosts are configured, Vibe Analyzer distributes files via competing consumers:

  • All workers read from a single shared channel
  • The fastest worker takes the next file
  • This maximizes utilization of all hosts
  • If any host fails, all workers stop
  • At the end, per-host statistics are reported: how many files each host processed

This approach allows:

  • Faster enrichment through parallel processing on multiple GPUs/servers
  • Scaling by adding more hosts to the configuration

Anti-Hallucination Protection

Protection against AI model hallucinations when calling tools:

MechanismDescription
Name aliases150+ alternative tool names (e.g., search_code_functionssearch_by_code_functions)
Parameter normalizationWildcard replacement, whitespace trimming, type casting
Bounds validationlimit always in 1–10 range, level capped
Auto language detectionDetects Cyrillic, Latin, and CJK in search queries
Soft error handlingInvalid parameters don’t cause errors, they are normalized to safe values

Performance

  • Rust — native execution without GC overhead
  • Parallel parsing — each file processed independently
  • Bulk OpenSearch writes — thousands of documents per operation
  • Streaming processing — files are processed as they are discovered, without waiting for the entire directory
  • Incremental updates — only changed files are re-indexed when updating a source

Supported Languages

Vibe Analyzer supports AST parsing and LLM enrichment for 13 programming languages and file formats.

Full List

LanguageExtensionsASTEnrichmentParser
Rust.rsRustParser
Python.pyPythonParser
JavaScript.jsJavaScriptParser
TypeScript.tsTypeScriptParser
Java.javaJavaParser
Go.goGoParser
C#.csCSharpParser
Kotlin.ktKotlinParser
Swift.swiftSwiftParser
Dart.dartDartParser
Bash.shBashParser
Batch.batBatchParser
ArkTS.etsArkTsParser
Markdown.mdMarkdownParser

Note: Markdown is a special case. Headings, links, code blocks, and frontmatter metadata are extracted instead of programmatic constructs. This allows Markdown files to be used as a knowledge base: documentation, guidelines, standards, project legends. They are searchable via search_documentation and search_knowledge.

Extracted Element Categories

For All Programming Languages

ElementDescriptionExample (Rust)
header_commentsModule comment — file purpose"Application configuration management"
functionsFunctions and methodsfn add(a: i32, b: i32) -> i32
classesClassesclass User
structsStructs and recordsstruct Config
enumsEnumsenum Color
interfacesInterfaces, traits, protocolstrait Display
variablesModule-level variables and constantsconst MAX_SIZE: usize
importsImports and dependenciesuse std::fs

Markdown Only

ElementDescriptionExample
headingsHeadings with level, text, and preview{ level: 1, title: "Vibe Analyzer", preview: "Universal knowledge base..." }
linksLinks{ text: "documentation", url: "https://example.com/docs" }
code_blocksCode block languages["bash", "rust"]
frontmatterYAML metadata{ title: "...", tags: "...", author: "..." }

Element Support by Language

LanguageFunctionsClassesStructsEnumsInterfacesVariablesImports
Rust
Python
JavaScript
TypeScript
Java
Go
C#
Kotlin
Swift
Dart
Bash
Batch
ArkTS

Multilingual Search Tags

Each extracted element receives tags in three languages. This allows searching for elements in Russian, English, or Chinese — regardless of the source code language.

Element TypeENRUZH
Functionsfunctionsфункции函数
Classesclassesклассы
Structsstructsструктуры结构体
Enumsenumsперечисления枚举
Interfacesinterfacesинтерфейсы接口
Variablesvariablesпеременные变量
Importsimportsимпорты导入
Headings (MD)headingsзаголовки标题
Code blocks (MD)code_blocksблоки_кода代码块
Links (MD)linksссылки链接
Module commentsheader_comments

Doc Comment Formats

Vibe Analyzer extracts documentation from specially formatted comments. Regular comments (//, #) are ignored.

LanguageDoc CommentModule CommentExample
Rust/// or /** *///! or /*! *//// Adds two numbers
Python"""...""" (docstring)"""...""" at file start"""Adds two numbers"""
JavaScript/** */ (JSDoc)/** */ at file start/** @param {number} a */
TypeScript/** */ (JSDoc)/** */ at file start/** @param a First number */
Java/** */ (Javadoc)/** */ at file start/** @param a First number */
Kotlin/** */ (KDoc)/** */ at file start/** @param a First number */
C#/// or /** *//** */ or /// at file start/// <summary>Adds two numbers</summary>
Swift/// or /** *//// or /** */ at file start/// - Parameters: a: First number
Dart////// at file start/// Adds two numbers
Go// (any before declaration)// at file start// Add adds two numbers
Bash## or # before function## at script start## Module documentation for Bash testing
Batch:: before label:: at script start:: Module documentation for Batch testing
ArkTS/** *//** */ at file start/** Async function example */

AST Example

Source code (sample.py):

"""
Module documentation for Python testing
"""

import os
import sys
from datetime import datetime
from typing import List, Optional

# Regular comment - ignored


def add(a: int, b: int) -> int:
    """Adds two numbers"""
    return a + b


def multiply(a: int, b: int) -> int:
    """Multiplies two numbers"""
    return a * b


async def fetch_data(url: str) -> str:
    """Async function example"""
    return "data"


class User:
    """User class"""

    def __init__(self, name: str, age: int):
        """Constructor"""
        self.name = name
        self.age = age

    def get_name(self) -> str:
        """Get user name"""
        return self.name


class Config:
    """Config class"""
    debug: bool = False
    max_size: int = 1024


class Color:
    """Color enum (using class constants)"""
    RED = 1
    GREEN = 2
    BLUE = 3


MAX_SIZE: int = 1024
DEFAULT_TIMEOUT: int = 30

APP_NAME: str = "vibe-analyzer"

# Regular comment at the end - ignored

Extracted AST:

{
  "functions": [
    {
      "signature": "def add(a: int, b: int) -> int",
      "comments": ["Adds two numbers"]
    },
    {
      "signature": "def multiply(a: int, b: int) -> int",
      "comments": ["Multiplies two numbers"]
    },
    {
      "signature": "async def fetch_data(url: str) -> str",
      "comments": ["Async function example"]
    }
  ],
  "classes": [
    {
      "signature": "class User",
      "comments": ["User class"]
    },
    {
      "signature": "class Config",
      "comments": ["Config class"]
    },
    {
      "signature": "class Color",
      "comments": ["Color enum (using class constants)"]
    }
  ],
  "variables": [
    {
      "signature": "MAX_SIZE: int"
    },
    {
      "signature": "DEFAULT_TIMEOUT: int"
    },
    {
      "signature": "APP_NAME: str"
    }
  ],
  "imports": ["os", "sys", "datetime", "typing"],
  "header_comments": ["Module documentation for Python testing"],
  "tags": [
    "header_comments",
    "imports",
    "импорты",
    "导入",
    "variables",
    "переменные",
    "变量",
    "functions",
    "функции",
    "函数",
    "classes",
    "классы",
    "类"
  ]
}

Limitations

  • Maximum file size for AST parsing: 10 MB (MAX_AST_FILE_SIZE constant)
  • Default ignored directories: target, node_modules, __pycache__, .venv, venv, .git, .idea
  • Default ignored files: .DS_Store, Thumbs.db, *.hprof, *.log
  • Binary files: detected by extension, name, and content analysis, excluded from parsing
  • Nested elements: methods inside classes are extracted as functions, variables inside functions/classes are not extracted

Configuration

Vibe Analyzer uses a JSON5 configuration file. JSON5 is an extended version of JSON with support for comments, trailing commas, and other convenient features.

Location

The configuration file is located at:

~/.vibe-analyzer/config.json5

The file is created automatically with default settings the first time any CLI command is run.

Configuration Structure

The configuration consists of four sections: version, opensearch, mcp, ollama, and sources.

Full Example with Comments

{
  // Configuration version — do not modify manually
  "version": "0.0.1",

  // OpenSearch connection
  "opensearch": {
    // OpenSearch server URL
    "host": "http://192.168.1.10:9200"
  },

  // MCP server
  "mcp": {
    // Bind address:
    // 0.0.0.0 — accessible from all interfaces (for Docker, remote connections)
    // 127.0.0.1 — local only
    "host": "0.0.0.0",

    // Server port (default: 9020)
    "port": 9020,

    // MCP protocol version:
    // '2024-11-05' — stable
    // '2025-03-26' — improved streaming
    // '2025-06-18' — latest
    // 'latest' — auto-detect
    "protocol": "latest"
  },

  // Ollama LLM servers — specify multiple for load distribution
  "ollama": [
    {
      // Ollama API endpoint
      "host": "http://192.168.1.10:11434",

      // Model for enrichment
      "model": "qwen2.5-coder:3b-instruct",

      // Maximum characters per LLM request
      // Files are grouped into batches until the total size exceeds this limit
      "max_chunk_chars": 4000,

      // Maximum files per LLM request
      "max_chunk_files": 3,

      // Request timeout in seconds
      "timeout_secs": 60,

      // Generation temperature (0.0 — deterministic, 1.0 — creative)
      "temperature": 0.1,

      // Seed for reproducible results (same seed → same output)
      "seed": 42,

      // Model context window size
      "num_ctx": 4096,

      // Maximum tokens in response
      "num_predict": 2048,

      // Which AST elements to include in the prompt
      "ast_imports": false,
      "ast_variables": false,
      "ast_functions": true,
      "ast_enums": true,
      "ast_interfaces": true
    },
    {
      // Second host for load distribution
      "host": "http://localhost:11434",
      "model": "qwen2.5-coder:3b-instruct",
      "max_chunk_chars": 4000,
      "max_chunk_files": 3,
      "timeout_secs": 60,
      "temperature": 0.1,
      "seed": 42,
      "num_ctx": 4096,
      "num_predict": 2048,
      "ast_imports": false,
      "ast_variables": false,
      "ast_functions": true,
      "ast_enums": true,
      "ast_interfaces": true
    }
  ],

  // Knowledge sources for indexing — absolute project paths
  "sources": ["/Users/keygenqt/Documents/Gitcode/Projects/vibe-analyzer"]
}

Sections in Detail

version

"version": "0.0.1"

Configuration file version. Do not modify manually — updated automatically during config migration between versions.

opensearch

"opensearch": {
  "host": "http://192.168.1.10:9200"
}
ParameterTypeDefaultDescription
hoststringhttp://localhost:9200OpenSearch server URL. Can point to a local or remote server

mcp

"mcp": {
  "host": "0.0.0.0",
  "port": 9020,
  "protocol": "latest"
}
ParameterTypeDefaultDescription
hoststring127.0.0.1Server bind address. 0.0.0.0 — accessible externally (Docker, remote clients), 127.0.0.1 — local only
portinteger9020MCP server port
protocolstringlatestMCP protocol version: 2024-11-05, 2025-03-26, 2025-06-18, or latest

ollama

The ollama section is an array of Ollama server configurations. One or more hosts can be specified for load distribution.

"ollama": [
  {
    "host": "http://192.168.1.10:11434",
    "model": "qwen2.5-coder:3b-instruct",
    "max_chunk_chars": 4000,
    "max_chunk_files": 3,
    "timeout_secs": 60,
    "temperature": 0.1,
    "seed": 42,
    "num_ctx": 4096,
    "num_predict": 2048,
    "ast_imports": false,
    "ast_variables": false,
    "ast_functions": true,
    "ast_enums": true,
    "ast_interfaces": true
  }
]

Main Parameters

ParameterTypeDefaultDescription
hoststringhttp://localhost:11434Ollama API endpoint
modelstringqwen2.5-coder:3b-instructModel name for enrichment. Must be pre-loaded via ollama pull

Batch Processing Parameters

ParameterTypeDefaultDescription
max_chunk_charsinteger4000Maximum characters per LLM request. Files are grouped into batches until the total size exceeds the limit
max_chunk_filesinteger3Maximum files per request. Even if the character limit is not reached, no more than this number of files will be in a batch

Generation Parameters

ParameterTypeDefaultDescription
timeout_secsinteger60Ollama request timeout in seconds
temperaturefloat0.1Generation temperature. 0.0 — maximally deterministic, 1.0 — maximally creative. Low values are recommended for code enrichment
seedinteger42Random generator seed. Same seed guarantees reproducible results
num_ctxinteger4096Model context window size in tokens
num_predictinteger2048Maximum tokens in response

AST Element Filters

Determines which AST elements are included in the LLM prompt. Disabling unnecessary elements reduces request size and speeds up processing.

ParameterTypeDefaultDescription
ast_importsbooleanfalseInclude imports in the prompt
ast_variablesbooleanfalseInclude variables in the prompt
ast_functionsbooleantrueInclude functions in the prompt
ast_enumsbooleantrueInclude enums in the prompt
ast_interfacesbooleantrueInclude interfaces in the prompt

sources

"sources": [
  "/Users/keygenqt/Documents/Gitcode/Projects/vibe-analyzer",
  "/home/user/projects/my-backend",
  "/home/user/docs/architecture"
]
ParameterTypeDefaultDescription
sourcesarray of strings[]List of absolute paths to knowledge sources. Each source can be a code project, a documentation folder, or both

Managing sources is easier via CLI rather than editing the config manually:

# Add a source
vibe-analyzer source add /path/to/project

# Remove a source
vibe-analyzer source remove --target /path/to/project

# List all sources
vibe-analyzer source list

Ollama Clustering

When multiple hosts are specified in the ollama section, Vibe Analyzer distributes files via competing consumers:

  • All workers read from a single shared channel
  • The fastest worker takes the next file
  • This maximizes utilization of all hosts
  • If any host fails, all workers stop
  • At the end, per-host statistics are reported

This approach allows:

  • Faster enrichment through parallel processing on multiple GPUs/servers
  • Scaling by adding more hosts to the configuration

Configuration Validation

Vibe Analyzer validates the configuration at startup and applies safe defaults if parameters are missing or invalid:

  • max_chunk_chars → minimum 1000, maximum 100000
  • limit in search queries → always in the 1–10 range
  • Invalid paths → normalized to absolute form
  • Missing sections → created with default values

Configuration Migration

When updating Vibe Analyzer, the configuration may automatically migrate to a new format. The configuration version (version) tracks the current format and applies migrations when necessary.

Overriding the Configuration Directory

For testing or custom scenarios, the configuration directory can be overridden:

# Set a custom directory
vibe-analyzer --config-dir /custom/path source list

The default is ~/.vibe-analyzer/.

Dev Section

An optional section for debugging. Usually absent in production config — added only when needed:

{
  "dev": {
    "log_level": "trace",
    "spdx_data_path": "tests/mcp/fixtures/config/spdx"
  }
}
ParameterTypeDefaultDescription
log_levelstringdefaultLog level: default, trace, debug, info, warn, error. default means info
spdx_data_pathstring~/.vibe-analyzer/spdxPath to SPDX data for license detection. Downloaded automatically on first run

CLI Reference

Vibe Analyzer provides a command-line interface for managing knowledge sources, scanning, exporting, indexing, and running the MCP server.

General Syntax

vibe-analyzer [global options] <command> [subcommand] [options]

Global Options

OptionDescription
--config-dir <path>Use a custom config directory instead of ~/.vibe-analyzer/
--helpShow help
--versionShow version

Commands

source — Source Management

Add, remove, and list knowledge sources.

vibe-analyzer source <subcommand>
SubcommandDescription
add <path>Adds a new directory or file to the sources list. The path is automatically converted to absolute
remove --target <path>Removes a source from the configuration. Accepts full path or unique directory name
listShows all added sources with absolute paths

Examples:

# Add a project
vibe-analyzer source add /home/user/projects/my-app

# Add a documentation directory
vibe-analyzer source add /home/user/docs

# Remove by full path
vibe-analyzer source remove --target /home/user/projects/my-app

# Remove by directory name (if unique)
vibe-analyzer source remove --target my-app

# List all sources
vibe-analyzer source list

Example source list output:

Configured sources:
- /Users/keygenqt/Documents/Gitcode/Projects/vibe-analyzer
- /home/user/projects/my-backend

scan — Scanning and Indexing

Extract code structure via AST parsing, optional LLM enrichment, and OpenSearch indexing.

vibe-analyzer scan <subcommand>
SubcommandDescription
astAST parsing only. Fast code structure extraction without LLM. Results can be exported to a file
analyzeFull cycle: AST parsing → LLM enrichment. Does not index to OpenSearch automatically. Results can be exported to a file
indexOpenSearch indexing. Runs scan analyze with incremental updates and writes results to indices

scan ast

vibe-analyzer scan ast [options]
OptionDescription
--target <path>Process a specific source. If not specified — all sources are processed
--format <format>Export format: json (default), json5, toon, xml
-o, --output <path>Export path. If not specified — file is created in ~/Downloads/

Examples:

# AST for all sources
vibe-analyzer scan ast

# AST for a specific project
vibe-analyzer scan ast --target my-app

# AST with JSON5 export
vibe-analyzer scan ast --target my-app --format json5

# AST export to a specific file
vibe-analyzer scan ast --target my-app --format json --output /path/to/output.json

scan analyze

vibe-analyzer scan analyze [options]
OptionDescription
--target <path>Process a specific source. If not specified — all sources are processed
--format <format>Export format: json (default), json5, toon, xml
-o, --output <path>Export path. If not specified — file is created in ~/Downloads/

Before enrichment, it checks:

  1. That Ollama hosts are configured
  2. That all Ollama servers are reachable (healthcheck)
  3. Model warm-up on all servers

If any server is unavailable — the command fails with an error.

Examples:

# Full cycle for all sources
vibe-analyzer scan analyze

# Full cycle for a specific project
vibe-analyzer scan analyze --target my-app

# With export
vibe-analyzer scan analyze --target my-app --format json5

scan index

vibe-analyzer scan index [options]
OptionDescription
--target <path>Index a specific source. If not specified — all sources
--forceForce full reindexing. Ignores hashes and processes all files again

What scan index does:

  1. Checks OpenSearch availability
  2. Cleans up orphaned data (OpenSearch documents no longer in the source)
  3. If not --force — loads hashes of already indexed files for incremental update
  4. Runs scan_enriches (AST + LLM), skipping files with unchanged hashes
  5. Indexes project metadata (meta)
  6. Indexes file contents (files)
  7. Indexes file analysis (files_analysis)
  8. Prints a report

Example scan index output:

Indexing completed. Sources: 1, Files: 287, Analysis: 242 (took 45.3s)

Or, if all files are already indexed:

Index is up to date — all files are already indexed and database is in sync

Examples:

# Incremental indexing for all sources
vibe-analyzer scan index

# Index a specific project
vibe-analyzer scan index --target my-app

# Force full reindexing
vibe-analyzer scan index --target my-app --force

stats — Statistics

View information and statistics for indexed projects.

vibe-analyzer stats <subcommand>

stats info

vibe-analyzer stats info [options]
OptionDescription
--target <path>Show statistics for a specific project. If not specified — all projects

Requires an active OpenSearch connection and indexed data.

Example output:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Language                  Files           Lines     AST Objects            Size
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Rust                        194           15146            1631       498.26 KB
Markdown                     33            2884             296       102.64 KB
Python                        4             163              23         2.85 KB
TypeScript                    1             125              14         1.90 KB
Java                          1             115              14         1.71 KB
Swift                         1              99              13         1.35 KB
Kotlin                        1              98              14         1.38 KB
C#                            1              97              13         1.58 KB
ArkTs                         1              83               9         1.11 KB
JavaScript                    1              73              12         1.07 KB
Dart                          1              68              13         1.09 KB
Go                            1              63              12           895 B
Bash                          1              55               9           882 B
Batch                         1              48               9           748 B
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Other                        45            5776                       137.72 KB
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total                       287           24893            2082       755.11 KB
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Stats generated in 0.5s

Columns:

ColumnDescription
LanguageProgramming language. Other — files without AST (images, configs, binaries)
FilesNumber of files for this language
LinesTotal line count
AST ObjectsNumber of extracted AST elements (functions, classes, structs, enums, interfaces, variables, imports, Markdown headings)
SizeTotal file size

stats tree

vibe-analyzer stats tree --target <project> [options]
OptionDescription
--target <path>Project to display the tree for (required)
-L, --level <number>Maximum tree depth (default: 3)

Requires an active OpenSearch connection and indexed data. The tree is built from indexed paths, not by traversing the filesystem.

Example output:

vibe-analyzer
|-- Cargo.lock
|-- Cargo.toml
|-- LICENSE
|-- README.crates.md
|-- README.md
|-- book
|   |-- book.toml
|   `-- src
|       |-- SUMMARY.md
|       `-- index.html
|-- data
|   |-- other
|   |   `-- logo.png
|   |-- preview
|   |   `-- preview-webui.png
|   `-- prompts
|       `-- comment-rules.md
|-- docker
|   |-- open-webui
|   |   `-- docker-compose.yml
|   `-- opensearch
|       `-- docker-compose.yml
|-- src
|   |-- analyzer
|   |   |-- analyzer.rs
|   |   |-- mod.rs
|   |   |-- ollama_client.rs
|   |   `-- ollama_cluster.rs
|   |-- cli
|   |   |-- mod.rs
|   |   |-- scan.rs
|   |   |-- serve.rs
|   |   |-- source.rs
|   |   `-- stats.rs
|   |-- configs
|   |   `-- ...
|   |-- main.rs
|   `-- utils
|       `-- ...
`-- tests
    |-- mcp
    |   `-- ...
    `-- parsers
        `-- ...

36 directories, 74 files

Summary at the end of the output with directory count, file count, and build time:

Found 36 directories, 74 files in 0.5s

serve — MCP Server

Start, stop, and check MCP server status.

vibe-analyzer serve <subcommand>

serve start

vibe-analyzer serve start [options]
OptionDescription
--host <address>Bind address. Overrides the config value
--port <port>Server port. Overrides the config value
--workdir <path>Working directory (default: current directory)
--protocol <version>MCP protocol version. Overrides the config value

The server runs in foreground mode. Use a system service manager or terminal multiplexer to run in the background.

All parameters are optional — if not specified, values from config.json5 are used.

Examples:

# Start with config settings
vibe-analyzer serve start

# Start on a specific port
vibe-analyzer serve start --port 9020

# Start on localhost only
vibe-analyzer serve start --host 127.0.0.1 --port 8080

serve stop and serve status

vibe-analyzer serve stop
vibe-analyzer serve status

Note: the stop and status commands are reserved but not yet implemented.

Export Formats

When using the --format option with scan ast and scan analyze, four formats are available:

FormatKeyExtensionDescription
JSONjson.jsonCompact JSON without extra whitespace — minimal file size
JSON5json5.json5JSON5 with comments and trailing commas — human-readable
TOONtoon.toonTOON format — token-efficient output, optimized for LLMs
XMLxml.xmlXML with pretty-print formatting

If the export path is not specified via --output, the file is saved to ~/Downloads/ with an auto-generated name.

Exit Codes

CodeDescription
0Successful execution
1Error (invalid parameters, service unavailable, parsing error)

MCP Tools

Vibe Analyzer provides 11 MCP tools that AI models can call to search code and documentation. Each tool returns a structured response — no document stuffing into the context.

General Concept

In traditional RAG, a search engine finds documents and adds them to the prompt. Vibe Analyzer works differently:

AI model → selects a tool → calls MCP → receives a structured response

Rules for the AI model (embedded in ServerInfo.instructions):

  1. Use tools, respond concisely
  2. One call is enough — no need to call multiple tools in sequence
  3. Only the listed tools

Tool Categories

CategoryToolsPurpose
Adminadmin_syncReindex all projects
Getget_file_content, get_file_astRetrieve file contents and AST
Showshow_projects, show_stats, show_treeProject info: list, statistics, file tree
Search — Codesearch_by_code_imports, search_by_code_functions, search_by_code_classes, search_by_code_variablesCode search: imports, functions, classes, variables
Search — Docssearch_documentation, search_knowledgeMarkdown documentation and knowledge base search

Admin

admin_sync

Triggers reindexing of all projects in the background.

When to call: the user says “update”, “sync”, “reindex”, “refresh”.

Parameters: none.

Response:

{
  "result": "Started",
  "message": "Indexing started. Projects are updating now."
}

Or, if indexing is already running:

{
  "result": "AlreadyRunning",
  "message": "Indexing is already running. Please wait."
}

Get

get_file_content

Returns the full contents of a file.

When to call: the user asks to see file contents, open a file.

Parameters:

ParameterTypeRequiredDescription
pathstringyesFile path. Supports partial matching and wildcards. Can be relative or absolute

Response:

{
  "root": "/path/to/project",
  "path": "src/main.rs",
  "language": "Rust",
  "content": "fn main() {\n println!(\"Hello\");\n}\n"
}

get_file_ast

Returns the full AST of a file: imports, functions, classes, structs, enums, headings.

When to call: the user asks about file structure, functions in a file, AST.

Parameters:

ParameterTypeRequiredDescription
pathstringyesFile path. Can be relative or absolute

Response:

{
  "root": "/path/to/project",
  "path": "src/main.rs",
  "language": "Rust",
  "ast": {
    "header_comments": ["Vibe Analyzer - Main entry point."],
    "imports": ["clap::Parser", "crate::cli::scan::ScanAction"],
    "functions": [{ "signature": "async fn main()", "comments": [] }],
    "structs": [{ "signature": "struct App", "comments": [] }],
    "enums": [{ "signature": "enum Commands", "comments": [] }],
    "tags": [
      "functions",
      "функции",
      "函数",
      "structs",
      "структуры",
      "结构体",
      "enums",
      "перечисления",
      "枚举",
      "imports",
      "импорты",
      "导入",
      "header_comments"
    ]
  }
}

Show

show_projects

Shows all indexed projects with names and brief descriptions.

When to call: the user asks “what projects are available”, “list projects”.

Parameters: none.

Response:

{
  "projects": [
    {
      "path": "/path/to/project",
      "name": "vibe-analyzer",
      "summary": "Agentic RAG engine for code and knowledge bases"
    }
  ],
  "total": 1
}

show_stats

Shows project statistics: language breakdown, file count, lines of code, AST objects.

When to call: the user asks about statistics, file count, codebase size.

Parameters:

ParameterTypeRequiredDescription
targetstringnoProject name. If not specified — statistics for all projects

Response:

{
  "target": null,
  "languages": [
    {
      "language": "Rust",
      "files": 194,
      "lines": 15146,
      "ast_objects": 1631,
      "size_human": "498.26 KB"
    },
    {
      "language": "Markdown",
      "files": 33,
      "lines": 2884,
      "ast_objects": 296,
      "size_human": "102.64 KB"
    }
  ],
  "total": { "files": 287, "lines": 24893, "ast_objects": 2082, "size_human": "755.11 KB" }
}

show_tree

Shows the file and directory tree of a project.

When to call: the user asks about project structure, file tree, folders.

Parameters:

ParameterTypeRequiredDefaultDescription
targetstringnoall projectsProject name
levelintegerno3Maximum depth (1–10)

Response:

{
  "target": "vibe-analyzer",
  "tree": "vibe-analyzer\n|-- Cargo.toml\n|-- src\n|   |-- main.rs\n|   |-- cli\n|   |   |-- mod.rs\n|   |   `-- scan.rs\n|   `-- utils\n|       `-- ...\n`-- tests\n    `-- ...",
  "total_files": 74,
  "total_dirs": 36
}

Search — Code

search_by_code_imports

Finds imports and dependencies in code.

When to call: the user asks about imports, dependencies, libraries used. For “all imports”, use an empty query or *.

Parameters:

ParameterTypeRequiredDefaultDescription
querystringyesSearch query
targetstringnoall projectsProject name
limitintegerno3Maximum results

Response:

{
  "query": "serde",
  "results": [
    {
      "project": "/path/to/project",
      "path": "src/main.rs",
      "language": "Rust",
      "header_comments": ["Vibe Analyzer - Main entry point."],
      "imports": ["serde::Deserialize", "serde::Serialize"]
    }
  ]
}

search_by_code_functions

Finds functions and methods in code.

When to call: the user asks about functions, methods, procedures.

Parameters:

ParameterTypeRequiredDefaultDescription
querystringyesSearch query
targetstringnoall projectsProject name
limitintegerno3Maximum results

Response:

{
  "query": "scan_source",
  "results": [
    {
      "project": "/path/to/project",
      "path": "src/scanner/scanner.rs",
      "language": "Rust",
      "header_comments": ["Core scanning functionality for codebase analysis."],
      "functions": [
        {
          "signature": "pub async fn scan_source(...)",
          "comments": ["Scans a source and returns complete analysis results"]
        }
      ]
    }
  ]
}

search_by_code_classes

Finds classes, structs, interfaces, and traits.

When to call: the user asks about classes, structs, interfaces, types, traits, abstract classes, implements, extends.

Parameters:

ParameterTypeRequiredDefaultDescription
querystringyesSearch query
targetstringnoall projectsProject name
limitintegerno3Maximum results

Response:

{
  "query": "AppConfig",
  "results": [
    {
      "project": "/path/to/project",
      "path": "src/configs/app.rs",
      "language": "Rust",
      "header_comments": ["Application configuration management for vibe-analyzer."],
      "classes": [],
      "structs": [
        {
          "signature": "pub struct AppConfig",
          "comments": ["Main application configuration structure"]
        }
      ],
      "interfaces": []
    }
  ]
}

search_by_code_variables

Finds variables, constants, and enums.

When to call: the user asks about variables, constants, enums, global variables, static fields.

Parameters:

ParameterTypeRequiredDefaultDescription
querystringyesSearch query
targetstringnoall projectsProject name
limitintegerno3Maximum results

Response:

{
  "query": "MAX_SIZE",
  "results": [
    {
      "project": "/path/to/project",
      "path": "src/utils/constants.rs",
      "language": "Rust",
      "header_comments": ["Application constants and configuration defaults."],
      "variables": [
        {
          "signature": "pub const MAX_AST_FILE_SIZE: u64",
          "comments": ["Maximum file size for AST parsing (10 MB)"]
        }
      ],
      "enums": []
    }
  ]
}

Search — Docs

search_documentation

Searches all Markdown documentation files. This is the default tool for non-code questions.

When to call: “who is”, “what is”, “how does”, “rules”, “processes”, “guides”, “legends” questions.

Search priority: Markdown files with knowledge: true in the frontmatter receive a significant boost (5.0) and appear first. This separates the knowledge base (legends, guidelines) from regular documentation. Example frontmatter:

---
knowledge: true
---

Parameters:

ParameterTypeRequiredDefaultDescription
querystringyesSearch query. Supports Cyrillic, Latin, CJK
limitintegerno3Maximum results

Response:

{
  "query": "architecture",
  "results": [
    {
      "project": "/path/to/project",
      "path": "docs/architecture.md",
      "frontmatter": { "title": "Vibe Analyzer Architecture" },
      "headings": [
        { "level": 1, "title": "Architecture", "preview": "Overview of Vibe Analyzer's design" }
      ],
      "links": [{ "text": "Quick Start", "url": "./getting-started.md" }],
      "code_blocks": ["bash", "rust"]
    }
  ]
}

search_knowledge

Alias for search_documentation. Completely identical in parameters and response.

When to call: the user asks about the knowledge base, guidelines, standards, characters.


Anti-Hallucination Protection

Tool Name Aliases (160+)

Models often distort tool names. AliasHandler intercepts the call and replaces the name with the correct one:

ALIAS_HANDLER: Resolving 'search_functions' -> 'search_by_code_functions'

Parameter Normalization

MechanismExample
Wildcard replacement* and ? in query → space
Whitespace trimming" search query ""search query"
query: "*" handlingReturns None (all elements)
limit cappingAlways in 1–10 range. Values ≤ 1 → default (3)
Fuzzy path matchingPartial match and wildcards for path in get_file_content
target normalizationSearch by exact path or unique directory name

Auto Language Detection

When searching documentation, the system detects scripts in the query:

  • Cyrillic → search using Russian tags
  • Latin → search using English tags
  • CJK → search using Chinese tags

Mixed queries search across all detected scripts simultaneously.

Soft Error Handling

Invalid parameters don’t cause errors, they are normalized to safe values:

  • Invalid target → search across all projects
  • limit > 10 → capped to 10
  • Non-existent path → returns an empty result, not an error

AST Parsing

Vibe Analyzer uses tree-sitter — an incremental parser that builds a concrete syntax tree (CST) from source code. The CST is then traversed to extract meaningful elements: functions, classes, imports, variables, documentation, and multilingual search tags.

How It Works

Source code → tree-sitter → CST → recursive traversal → AstData (structured data + tags)

Process for each file:

  1. Parser selection — the appropriate LanguageParser is chosen from the static PARSERS registry based on file extension
  2. CST parsing — tree-sitter builds the tree
  3. Recursive traversal — the visit_node function traverses all nodes and collects meaningful elements
  4. Post-processing — deduplication, sorting, trimming function/class bodies
  5. Tag generationwith_tags() adds multilingual tags (EN/RU/ZH)

Parser Registry

ExtensionsParser
rsRustParser
pyPythonParser
js, jsxJavaScriptParser
ts, tsxTypeScriptParser
javaJavaParser
goGoParser
csCSharpParser
kt, ktsKotlinParser
swiftSwiftParser
dartDartParser
sh, bash, zshBashParser
bat, cmdBatchParser
ets, arktsArkTsParser
md, markdownMarkdownParser

Each parser implements the LanguageParser trait:

pub trait LanguageParser: Send + Sync {
    fn parse(&self, content: &str) -> Result<AstData>;
    fn language_name(&self) -> &'static str;
}

AstData Structure

pub struct AstData {
    pub header_comments: Vec<String>,                 // Module comments
    pub imports: Vec<String>,                         // Imports
    pub variables: Vec<AstDataVariable>,              // Variables/constants
    pub functions: Vec<AstDataFunction>,              // Functions/methods
    pub classes: Vec<AstDataClass>,                   // Classes
    pub structs: Vec<AstDataStruct>,                  // Structs
    pub enums: Vec<AstDataEnum>,                      // Enums
    pub interfaces: Vec<AstDataInterface>,            // Interfaces/traits
    pub frontmatter: Option<HashMap<String, String>>, // Frontmatter (Markdown)
    pub headings: Vec<AstHeading>,                    // Headings (Markdown)
    pub links: Vec<AstLink>,                          // Links (Markdown)
    pub code_blocks: Vec<String>,                     // Code block languages (Markdown)
    pub tags: Vec<String>,                            // Multilingual tags
}

Each element (function, class, etc.) has a signature and doc comments:

pub struct AstDataFunction {
    pub signature: String,     // "pub async fn scan_source(source_path: &Path, ...)"
    pub comments: Vec<String>, // ["Scans a source and returns complete analysis results"]
}

Comment Extraction

Vibe Analyzer distinguishes three types of comments:

1. Header Comments

Describe the purpose of the entire file. Stored in header_comments.

Detection rules:

  • The comment is at the beginning of the file (first node in the CST)
  • Or all preceding sibling nodes are also comments (is_module_comment)
  • Not inside a function or class

Syntax by language:

LanguageSyntaxParser
Rust//! or /*! */visit_nodeline_comment/block_comment starts with //!
Python"""...""" at the beginningvisit_nodeexpression_statementstring (first node)
JS/TS/ArkTS/** */ at the beginningvisit_nodecomment starts with /**, first sibling
Java/Kotlin/** */ at the beginningvisit_nodeblock_comment starts with /**, first sibling
C#/** */ or /// at the beginningvisit_nodecomment starts with /** or ///
Swift/// at the beginningvisit_nodecomment starts with ///, first or after another comment
Dart/// at the beginningvisit_nodecomment starts with ///, is_module_comment
Go// at the beginningvisit_nodecomment starts with //, first sibling
Bash##visit_nodecomment starts with ##
Batch::visit_nodecomment starts with ::, not inside a label

2. Doc Comments

Describe a specific element (function, class, struct). Stored in the comments field of the corresponding object.

Extraction algorithm:

  1. Take the target node (function, class, etc.)
  2. Walk backwards through sibling nodes
  3. Collect all doc comments, skipping attributes/annotations
  4. Stop at the first non-doc node
  5. Reverse the list (from farthest to closest)

Syntax by language:

LanguageSyntaxExtraction function
Rust///, /** */, /*! */extract_rust_doc_comments — walks prev_sibling, skips attribute_item
Python"""...""" (docstring)extract_python_docstring — finds first string in blockexpression_statement
JavaScript/** */ (JSDoc)extract_js_doc_comments — walks prev_sibling
TypeScript/** */ (JSDoc)extract_ts_doc_comments — walks prev_sibling
Java/** */ (Javadoc)extract_java_doc_comments — walks prev_sibling, only block_comment
Kotlin/** */ (KDoc)extract_kotlin_doc_comments — finds /** */ in prefix before node
C#/// or /** */extract_csharp_doc_comments — walks prev_sibling, skips attribute_list
Swift/// or /** */extract_swift_doc_comments — finds /** */ in prefix before node
Dart///extract_dart_doc_comments — walks prev_sibling
Go//extract_go_doc_comments — walks lines in prefix before node
Bash# before functionextract_bash_doc_comments — walks prev_sibling, functions only
Batch:: before labelextract_batch_doc_comments — walks prev_sibling
ArkTS/** */extract_arkts_doc_comments — walks prev_sibling, skips export_declaration

3. Regular Comments

Everything else — //, #, REM, ; — is ignored by the parser.

Signature Extraction

Signatures of functions, classes, and other elements are trimmed at the first delimiter ({, =, ;, :) — the body is not stored.

Language-specific details:

LanguageDetail
PythonTrailing : is trimmed from the signature
JS/TSArrow functions — signature is extracted from the variable declaration with =>
TSexport prefix is added for exported elements
GoMethods with receivers (func (s *Service) Method()) are detected and extracted as functions
KotlinTrimmed at { or = for expression bodies
BatchFunctions are :label, variables are trimmed from set

Language-Specific Implementation Details

  • Python — does not use child_by_field_name("body") due to a tree-sitter bug; manual traversal is used instead
  • Swift — classes, structs, and enums come in a single class_declaration node, distinguished by text
  • TypeScript/ArkTS — doc comments for exported elements are looked up on the parent export_statement
  • Java — methods inside classes are not extracted as separate top-level functions
  • Go — methods with receivers are detected via is_method and are not duplicated
  • Kotlinsealed class is skipped
  • Batch — variables inside labels are not extracted
  • Markdown — frontmatter is parsed manually, heading previews are cleaned of formatting

AST Export

Results can be exported in several formats:

# AST only
vibe-analyzer scan ast

# With export
vibe-analyzer scan ast --target my-app --format json5 --output analysis.json5

Supported formats: JSON, JSON5, TOON, XML.

LLM Enrichment

After AST parsing, Vibe Analyzer can enrich results via Ollama: adding a description and search tags to each file, and a brief summary to the project based on the README.

How It Works

AST data → batching → Ollama request → description + tags for each file

1. Project Summarization

The README (if present) is sent to Ollama with the prompt “write 2-3 sentences about the project”. The result is saved in summary.

2. File Enrichment

Batching. Files are grouped into batches based on two config limits:

  • max_chunk_chars — maximum characters per request (default 4000)
  • max_chunk_files — maximum files per request (default 3)

The prompt does not contain the files themselves, but their AST: functions, classes, structs, and other elements. Which elements to include is controlled by the flags ast_imports, ast_variables, ast_functions, ast_enums, ast_interfaces.

Prompt. Ollama receives a JSON template:

{
  "files": [
    {
      "path": "src/main.rs",
      "description": "FILL_DESCRIPTION",
      "tags": ["TAG1", "TAG2", "TAG3"]
    }
  ]
}

The model must fill in description and tags while preserving the structure. The prompt strictly requires: don’t skip files, don’t change paths, copy the JSON as-is.

Response. Ollama returns the completed JSON:

{
  "files": [
    {
      "path": "src/main.rs",
      "description": "Main entry point for the CLI application",
      "tags": ["entry-point", "cli", "argument-parsing"]
    }
  ]
}

3. Parallel Processing

If multiple Ollama hosts are configured, files are distributed among them:

  • All hosts read from a single channel
  • The fastest one processes the most
  • If any host errors — all stop (error_flag)
  • At the end, per-host statistics are reported: how many files each processed

4. JSON Repair

LLMs often corrupt JSON: add comments, wrap in markdown blocks, drop quotes. clean_llm_json fixes this:

  • Extracts JSON from ``` blocks
  • Adds missing key quotes
  • Removes trailing commas
  • Balances unclosed braces

5. Retries

If Ollama returns fewer files than were in the batch — up to 5 retries with delay. If after 5 attempts it still doesn’t match — an error with recommendations to reduce max_chunk_chars or switch models.

Generation Parameters

The following are passed from config to the Ollama request:

  • temperature: 0.1 — low temperature for stable results
  • seed: 42 — fixed seed for reproducibility
  • num_ctx: 4096 — context window size
  • num_predict: 2048 — maximum tokens in response
  • timeout_secs: 60 — request timeout

Model Warm-Up

Before enrichment begins, for each Ollama host:

  1. Availability check (GET /)
  2. Model presence check (GET /api/tags)
  3. Empty request to load the model into memory (POST /api/generate with empty prompt)

Enrichment Result

After processing, each file receives description and tags from the LLM, and the project receives a summary based on the README.

Exporting Results

AST parsing and LLM enrichment results can be exported to a file for analysis, debugging, or use in other tools:

# AST with LLM enrichment
vibe-analyzer scan analyze --target my-app

# With format and path specified
vibe-analyzer scan ast --target my-app --format json5 --output analysis.json5

Search and Indexing

Vibe Analyzer stores all data in OpenSearch and uses multilingual analyzers for search.

Three Indices

Three indices are created for each project:

IndexPurposeContents
vibe_metaMetadata1 document per project: summary, license, README, statistics
vibe_files_{hash}ContentOne document per file: full contents (not indexed for search, store only)
vibe_files_analysis_{hash}SearchOne document per text file: AST, description, tags

OpenSearch is configured with three analyzers:

  • russian_analyzer (type russian) — stemming for Russian
  • english_analyzer (type english) — stemming for English
  • chinese_analyzer (type chinese) — segmentation for Chinese

Each text field in vibe_files_analysis has three sub-fields — one per analyzer. This allows searching for “функции”, “functions”, and “函数” with correct morphology for each language.

Search Mechanics

Documentation Search (search_documentation)

The most complex query. Algorithm:

  1. Script detection in the query — Cyrillic, Latin, CJK
  2. Word extraction (longer than 2 characters)
  3. Wildcard search on headings with 10.0 boost + stemming for long words
  4. Language-specific match queries — for each detected script, a separate query to the corresponding sub-field with fuzziness
  5. Boost for knowledge documents — if the frontmatter contains knowledge: true, the document gets a 5.0 boost

Ranking priority:

  • Headings (headings.title) — 10.0 boost
  • Preview (headings.preview) — 2.0 boost
  • Links (links.text) — 2.0 boost
  • Tags (tags) — 1.0 boost

Each search type has its own strategy:

  • Importsmatch on the ast.imports field + tags
  • Functionsmatch_phrase_prefix on signatures + match on comments (nested queries)
  • Classes/structs/interfaces — three nested queries in should with minimum_should_match: 1
  • Variables/enumsmatch on signatures and comments (nested queries)

All code searches use fuzziness: AUTO for fuzzy matching and boost tags higher than specific fields.

Incremental Indexing

Vibe Analyzer doesn’t re-index files unnecessarily:

  1. Fetching hashes from OpenSearch via Scroll API — GET /{index}/_search?scroll=1m
  2. Comparison — a BLAKE3 hash is computed for each file and compared against the indexed one
  3. Skipping unchanged — files with matching hashes are not processed

If the --force flag is passed, hashes are ignored — all files are indexed.

Bulk Indexing

All documents are written to OpenSearch in batches via the Bulk API in NDJSON format:

{"index": {"_index": "vibe_files_xxx", "_id": "src/main.rs"}}
{"root": "/project", "path": "src/main.rs", "content": "..."}
{"index": {"_index": "vibe_files_xxx", "_id": "src/lib.rs"}}
{"root": "/project", "path": "src/lib.rs", "content": "..."}

The document ID is the file path (path). This ensures that re-indexing updates the existing document rather than creating a duplicate.

Orphaned Data Cleanup

cleanup runs automatically during indexing:

  1. Index removal for deleted projects
  2. Document removal for files no longer on disk (comparing paths in the index and on the filesystem)
  3. Meta-document removal for projects removed from the configuration

Project Statistics

show_stats_search collects aggregated statistics across all indexed files via the Scroll API. This enables:

  • Project reports — language breakdown, file count, lines, AST objects
  • Data presence checks — if statistics are empty, indexing hasn’t been performed or the project hasn’t been added
  • Codebase size estimation — total size, text and binary file counts

Aggregation runs across all documents from files_analysis:

  • Language grouping (via get_language_name)
  • AST object counting: sum of functions, classes, structs, enums, interfaces, variables, imports, headings, links, code blocks
  • Other — files without a detectable language
  • Languages sorted by lines of code descending

Integrations

Vibe Analyzer provides an MCP server that AI assistants can connect to via the Model Context Protocol. Once connected, the model gains 11 tools for searching code and documentation.

How It Works from the User’s Perspective

The user communicates with the AI assistant in natural language. The model decides which tool to call. Examples from real testing scenarios:

Code Search

User QueryToolWhat Happens
“Find add functions in the samples project”search_by_code_functionsSearches for functions with add in the signature, returns files and signatures
“What classes are in samples?”search_by_code_classesReturns all classes, structs, interfaces
“Show all enums in samples”search_by_code_variablesEnums are also searched through this tool
“What libraries are used in samples?”search_by_code_importsList of all imports in the project
“List files that have the MAX_VALUE constant”search_by_code_variablesSearch by constant name

File Viewing

User QueryTool
“Show the contents of src/main.rsget_file_content
“Show the structure of main.pyget_file_ast
“What functions are in src/main.rs?”get_file_ast
“Open utils.pyget_file_content
User QueryTool
“Who is Zizikosh?”search_documentation
“Tell me about Kukyrbur’s abilities”search_documentation
“Find Python coding guidelines”search_documentation
“Show the release process”search_documentation
“Find the code review checklist”search_documentation

Project Navigation

User QueryTool
“What projects are in the database?”show_projects
“Show the tree of the samples project”show_tree
“How many files are in knowledge?”show_stats
“Show overall statistics for all projects”show_stats

Administration

User QueryTool
“Update the index”admin_sync
“Reindex projects”admin_sync

How to Phrase Queries

The model understands queries in natural language. You don’t need to use exact tool names — plain language is enough.

Good:

  • “Find add functions in the samples project”
  • “What classes are in samples?”
  • “Show the contents of src/main.rs”
  • “Who is Zizikosh?”

Unnecessary (the model will understand via AliasHandler anyway, but it’s better to avoid):

  • “Call search_by_code_functions with query=add”
  • “Use the get_file_content tool for path=src/main.rs”

Important Notes

  • Project names — you can use the full path or directory name: "samples" or "/path/to/samples"
  • File paths — relative to the project root: "src/main.rs", partial matching is supported
  • Result limit — default 3, maximum 10. If the model requests “all”, the limit is automatically raised
  • One call is enough — the model is trained to respond after a single tool call, no need to ask again

Connecting to Open WebUI

  1. Start the MCP server:

    vibe-analyzer serve start
    
  2. In Open WebUI settings, add a new MCP server:

    • URL: http://localhost:9020
    • Transport: Streamable HTTP
  3. Tools appear automatically

Connecting to Claude Desktop

Add to the configuration:

{
  "mcpServers": {
    "vibe-analyzer": {
      "url": "http://localhost:9020",
      "transport": "streamable-http"
    }
  }
}

MCP Protocol

Supported versions: 2024-11-05, 2025-03-26, 2025-06-18, latest. Configured in the settings:

{
  "mcp": {
    "host": "127.0.0.1",
    "port": 9020,
    "protocol": "latest"
  }
}

Security

  • Server without authentication — for trusted networks or localhost
  • Default host 127.0.0.1 (local only)
  • 0.0.0.0 — for access from Docker containers or other machines
  • Server only reads data, admin_sync is the only tool that triggers background indexing

Testing

Vibe Analyzer uses two types of tests: unit tests for parsers and end-to-end tests for MCP tools.

Parser Unit Tests

Each of the 13 languages has a test that verifies AST parsing correctness using snapshot testing:

Source file → parser → AST → comparison with reference JSON

Example test (Rust):

#[test]
fn test_rust_parser() {
    let code = fs::read_to_string("tests/parsers/fixtures/rust/sample.rs").unwrap();
    let json = fs::read_to_string("tests/parsers/fixtures/rust/sample.json").unwrap();
    let expected: serde_json::Value = serde_json::from_str(&json).unwrap();

    let ast = parse_ast(&code, "rs").unwrap().unwrap();
    let actual = serde_json::to_value(&ast).unwrap();

    assert_eq!(actual, expected);
}

Fixture structure:

tests/parsers/fixtures/
├── rust/
│   ├── sample.rs      ← source code
│   └── sample.json    ← expected AST
├── python/
│   ├── sample.py
│   └── sample.json
├── markdown/
│   ├── sample.md
│   └── sample.json
└── ... (a pair of files per language)

All parser tests:

TestFileLanguage
test_rust_parserrust_test.rsRust (3 tests: sample, sample2, sample3)
test_python_parserpython_test.rsPython
test_javascript_parserjavascript_test.rsJavaScript
test_typescript_parsertypescript_test.rsTypeScript
test_java_parserjava_test.rsJava
test_go_parsergo_test.rsGo
test_csharp_parsercsharp_test.rsC#
test_kotlin_parserkotlin_test.rsKotlin
test_swift_parserswift_test.rsSwift
test_dart_parserdart_test.rsDart
test_bash_parserbash_test.rsBash
test_batch_parserbatch_test.rsBatch
test_arkts_parsertest_arkts.rsArkTS
test_python_parsermarkdown_test.rsMarkdown

Run:

cargo test --test parsers_test

End-to-End MCP Tool Tests

E2E tests verify the full cycle: an AI model receives a query, selects a tool, calls it, and returns a response.

How It Works

Scenario (JSON) → Ollama model → MCP tool call → result verification

Two-turn dialog:

  1. Turn 1 (with tools): the model receives a query and must call exactly one tool
  2. Turn 2 (without tools): the model receives the tool result and must provide a final text response

If the model calls a second tool instead of responding — it’s an error.

Test Scenarios

Scenarios are stored in JSON files:

tests/mcp/fixtures/scenarios/
├── admin_sync.json
├── get_file_ast.json
├── get_file_content.json
├── search_by_code_classes.json
├── search_by_code_functions.json
├── search_by_code_imports.json
├── search_by_code_variables.json
├── search_documentation.json
├── show_projects.json
├── show_stats.json
└── show_tree.json

Example scenario (search_by_code_functions.json):

{
  "tool": "search_by_code_functions",
  "queries": [
    "Find add functions in the 'samples' project",
    "What methods are in 'samples'",
    "Show all main functions in 'samples'",
    "Find calculate functions in 'samples'",
    "List files that have the multiply function"
  ]
}

Each scenario contains 5 queries in Russian and English — simple, one-sentence, without specifying the exact tool name.

Models for Testing

const MODELS: &[&str] = &[
    "qwen2.5-coder:3b-instruct",
    "qwen2.5-coder:7b-instruct",
    "qwen2.5-coder:14b-instruct",
];

By default, tests run on qwen2.5-coder:3b-instruct — the smallest model that should work correctly.

Extracting JSON from Model Responses

The model may return a response in different formats. extract_json handles all variants:

Response FormatHandling
```json { ... } ```Extracted from the markdown block
``` { ... } ```Extracted from the block without a language specifier
{ ... }Used as-is

Parsing Tool Calls

parse_tool_call looks for the tool name in several JSON fields (models name them differently):

let name = parsed
    .get("name")       // standard
    .or_else(|| parsed.get("function"))  // OpenAI-style
    .or_else(|| parsed.get("tool"))      // alternative
    .or_else(|| parsed.get("method"))    // another variant
    .or_else(|| parsed.get("call"));     // and another

Test Infrastructure

A custom framework was developed for E2E tests that automatically sets up the entire environment:

  • OpenSearch — via Docker Compose with fixtures from tests/mcp/fixtures/opensearch/docker-compose.yml
  • MCP server — started automatically on port 9021
  • Fixtures — test projects samples and knowledge with legendary characters
  • Ollama — must be running beforehand with the required model

The framework manages the entire lifecycle: starting services, indexing fixtures, running scenarios, saving reports, and stopping the environment on completion.

Reports

After each query, an intermediate report is saved; after each scenario, a final one:

{
  "test_name": "search_by_code_functions",
  "model": "qwen2.5-coder:3b-instruct",
  "timestamp": "2026-04-28T12:00:00Z",
  "queries": [
    {
      "query": "Find add functions in the 'samples' project",
      "tool_calls": [
        {
          "name": "search_by_code_functions",
          "args": "{\"query\":\"add\",\"target\":\"samples\"}",
          "result": "[{...}]"
        }
      ],
      "response": "Found function add in file src/lib.rs...",
      "duration_ms": 1234
    }
  ],
  "summary": {
    "total_queries": 5,
    "successful_tool_calls": 5,
    "total_duration_ms": 6170,
    "avg_response_time_ms": 1234
  }
}

Running

# Parser unit tests only (fast)
cargo test --test parsers_test

# Full E2E tests (require Docker + Ollama)
cargo test --test mcp_test -- --ignored --nocapture

Logging

Tests write a structured log to tests/reports/<timestamp>/mcp_test.log and simultaneously output to the terminal. Output is filtered by level: INFO shows progress, DEBUG shows model responses, TRACE shows everything including raw docker and MCP server output.

Expected Model Behavior

The test verifies that the model:

  1. Called a tool on the first turn — if not, error Model did not call a tool
  2. Did not call a non-existent tool — if it did, error TOOL_NOT_FOUND
  3. The tool returned a non-null result — if null, error tool returned null
  4. Provided a text response on the second turn — if it called another tool, error Model called second tool

Recipes

Practical examples of using Vibe Analyzer through an AI assistant.

Exploring an Unfamiliar Project

Goal: you’re seeing a project for the first time and want to quickly understand its structure.

Assistant dialog:

You: What projects are in the database?
AI: [show_projects] Project "backend" available — payment processing service.

You: Show the structure of backend
AI: [show_tree] backend/
├── src/
│   ├── main.rs
│   ├── api/
│   │   ├── mod.rs
│   │   ├── payments.rs
│   │   └── refunds.rs
│   ├── db/
│   │   ├── mod.rs
│   │   └── models.rs
│   └── config.rs
├── migrations/
├── tests/
└── Cargo.toml

You: Show statistics for backend
AI: [show_stats]
Rust        45 files    8,200 lines    420 AST objects
SQL         12 files    1,500 lines      —
Markdown     5 files      300 lines     25 AST objects
Total       62 files   10,000 lines    445 AST objects

You: What are the main functions in api/payments.rs?
AI: [get_file_ast for api/payments.rs]
- process_payment(amount, currency)
- validate_card(card_number)
- create_transaction(order_id)
- refund_payment(transaction_id)

Finding Dependencies

Goal: find out what libraries are used in a project.

You: What libraries are used in backend?
AI: [search_by_code_imports]
- serde (serialization)
- tokio (async runtime)
- sqlx (PostgreSQL)
- reqwest (HTTP client)
- stripe (payment API)

You: Where is stripe used?
AI: [search_by_code_imports with query="stripe"]
- src/api/payments.rs
- src/api/refunds.rs
- src/api/webhooks.rs

Finding an Implementation

Goal: find where a specific function is implemented.

You: Find the calculate_total function
AI: [search_by_code_functions with query="calculate_total"]
Found in src/orders.rs:
  pub fn calculate_total(items: &[OrderItem]) -> f64
  "Calculates total order amount with discounts applied"

Goal: find rules or processes.

You: Find Python coding guidelines
AI: [search_documentation with query="Python coding guidelines"]
Found in docs/guidelines/python.md:
  # Python Coding Guidelines
  Follow PEP 8. Use snake_case, type annotations, docstrings.

You: What is the release process?
AI: [search_documentation with query="release process"]
Found in docs/process/release.md:
  # Release Process
  1. All milestone tasks closed
  2. Create a release branch from develop
  3. Update version in Cargo.toml
  4. Merge into main
  5. Notify the team

Goal: find information about project characters or legends.

You: Who is Kirikur?
AI: [search_documentation with query="Kirikur"]
Found in knowledge/legends/kirikur_programmer.md:
  # Kirikur - Programmer
  Backend developer, stack: Rust, Python, PostgreSQL.
  Works at a food delivery startup.
  Rewrote legacy code in Rust in 3 months.
  Favorite phrase: "It works on my machine"

You: What is Bakobur's dream?
AI: [search_documentation with query="Bakobur's dream"]
Found in knowledge/legends/bakobur_homeless.md:
  Dream: Warm socks and a doghouse for Sharik

Incremental Update

Goal: update the index after code changes.

You: Update the index
AI: [admin_sync]
Indexing started. Projects are updating.

# After some time, you can check:
You: Show backend statistics
AI: [show_stats]
Rust        47 files    8,500 lines    445 AST objects
...

Inspecting a Specific File

Goal: view the contents or structure of a specific file.

You: Show the contents of Cargo.toml
AI: [get_file_content]
[package]
name = "backend"
version = "0.1.0"
edition = "2021"

[dependencies]
serde = { version = "1", features = ["derive"] }
tokio = { version = "1", features = ["full"] }
sqlx = { version = "0.7", features = ["postgres", "runtime-tokio"] }

You: What structs are in db/models.rs?
AI: [get_file_ast]
- struct Order { id, user_id, status, total, created_at }
- struct OrderItem { id, order_id, product_id, quantity, price }
- enum OrderStatus { Pending, Confirmed, Shipped, Delivered, Cancelled }

Tips

  • Start broadshow_projectsshow_treeshow_stats
  • Refine with searchsearch_by_code_functions, search_documentation
  • Inspect detailsget_file_content, get_file_ast
  • Update the index after changesadmin_sync
  • Use natural language — the model will choose the right tool automatically