Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Landing

<title>Vibe Analyzer</title>

<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta property="og:title" content="Vibe Analyzer" />
<meta property="og:description" content="Agentic RAG engine for code and knowledge bases" />

<link rel="icon" href="assets/images/favicon.png" />
Vibe
Analyzer
v0.0.5
Codebase analysis and Agentic RAG engine
AST extraction, LLM enrichment, static analysis, OpenSearch indexing. 11 MCP tools give AI models agency over search — they decide what to find and how. Minimal context, no embeddings, single tool call = complete answer.
Get Started

# Quick Start

## 1. Install OpenSearch ## 2. Install Open WebUI

## 3. Install Vibe Analyzer cargo install vibe-analyzer

## 4. Add source — project or knowledge vibe-analyzer source add {path}

## 5. Index projects vibe-analyzer scan index

## 6. Start MCP vibe-analyzer serve start

Introduction

What is Vibe Analyzer

Vibe Analyzer is a codebase analysis and Agentic RAG engine. It extracts structure from source code via AST parsing, enriches it with LLM, runs static analysis for 14 languages, and indexes everything into OpenSearch. AI assistants access knowledge through 11 MCP tools.

The Problem with Traditional RAG

Traditional RAG works like this:

Query → Embeddings → Find similar documents → Load into prompt → Response

Problems:

  • 📈 Found documents are added to the prompt in their entirety
  • 💾 The larger the project, the more VRAM is required
  • 🔍 Relevance drops as context volume grows
  • 💸 Each query becomes more expensive

How Agentic RAG Works

Vibe Analyzer flips the paradigm:

Query → AI model selects an MCP tool → Tool returns a structured response

Advantages:

  • 📉 Minimal context — the model receives only what the tool returns
  • 🧠 No embeddings — keyword and AST search via OpenSearch
  • 🔗 One tool call = complete answer, no document stuffing
  • ♾️ Context size stays constant regardless of project size

Key Features

  • 🌳 AST parsing for 14 programming languages
  • 🔍 Static analysis: code markers (TODO, FIXME, HACK), warnings (unwrap, panic) across all languages
  • 💡 LLM enrichment: technical debt, bugs, refactoring suggestions, file summaries
  • 🖥️ Multi-node LLM cluster — distribute enrichment across local and cloud models
  • 🌐 Multi-provider support: Ollama, DeepSeek, Qwen
  • 📄 Export AST and analysis results to JSON, JSON5, TOML, TOON, XML, YAML
  • 📝 Semantic and morphological search across code and documentation
  • ⚡ Incremental indexing (modified files only)
  • 📦 Self-contained MCP tools (one call — complete response)
  • 🗂️ Multilingual support (RU, EN, ZH)
  • 🦀 Built in Rust — fast and memory-efficient

Who This Is For

  • Development teams — index your codebase, let AI assistants answer architecture questions
  • Developers under NDA — the entire stack runs locally: OpenSearch, Ollama, MCP server. No data ever leaves to external APIs
  • Private projects — models from 3B parameters run on your hardware
  • Technical writers — store documentation in Markdown files and search it in any language
  • Open-source projects — give contributors a quick way to understand the code

What’s Next

Quick Start

Prerequisites

  • OpenSearch — storage and search for indexed data
  • LLM provider — Ollama (local), DeepSeek, or Qwen for code enrichment

Both can be run locally via Docker.

Installation

# Via Cargo (recommended)
cargo install vibe-analyzer

# Or build from source
git clone https://gitcode.com/keygenqt_vz/vibe-analyzer.git
cd vibe-analyzer
cargo build --release

Starting Services

# OpenSearch
cd docker/opensearch && docker-compose up -d

# Open WebUI (optional, for AI assistant connection)
cd docker/open-webui && docker-compose up -d

Configuration

Config file: ~/.vibe-analyzer/config.json5 (created automatically on first run).

See Configuration for all options and detailed reference.

Usage

# Add a source
vibe-analyzer source add /path/to/project

# AST only (no LLM)
vibe-analyzer analyze export

# AST + LLM enrichment
vibe-analyzer analyze export -m meta,debt,errors

# Code files only, JSON5 format
vibe-analyzer analyze export -m debt -t code -f json5

# Index to OpenSearch
vibe-analyzer analyze index --target my-project

# Start MCP server for AI assistants
vibe-analyzer serve start

# View statistics
vibe-analyzer stats info

What’s Next

Architecture

Overview

Vibe Analyzer consists of four main components that work sequentially:

Code sources
      │
      ▼
┌─────────────┐
│   Scanner   │  AST parsing, structure extraction
└─────────────┘
      │
      ▼
┌─────────────┐
│   Analyzer  │  LLM enrichment, descriptions, tags
└─────────────┘
      │
      ▼
┌─────────────┐
│   Indexer   │  Writing to OpenSearch
└─────────────┘
      │
      ▼
┌─────────────┐
│  MCP Server │  HTTP API for AI assistants
└─────────────┘

Components

Scanner

The Scanner handles initial source code processing:

  • File system traversal — recursive directory scanning respecting .gitignore and default exclusion patterns (.git, target, node_modules, etc.)
  • Language detection — selects the appropriate tree-sitter parser based on file extension
  • AST parsing — extracts code structure: functions, classes, imports, variables, enums, interfaces, structs
  • Metadata collection — line count, file size, BLAKE3 content hash
  • License detection — searches for a LICENSE file and identifies the license type via askalono and SPDX
  • README detection — priority: root > subdirectories, .md > .txt > no extension
  • Statistics collection — aggregation by language, file count, lines of code

Analyzer

The Analyzer enriches scanner results using LLM and static analysis:

  • Static analysis — finds code markers (TODO, FIXME, HACK) and warnings (unwrap, panic, bare except) across 14 languages
  • LLM enrichment — four modes: Meta (summaries), Debt (technical debt), Errors (bugs), Advice (refactoring)
  • Prompt generation — builds a request for each file containing the AST structure
  • Request distribution — with multiple cluster nodes configured, prompts are distributed across workers via vibe-cluster. Local models (Ollama) get priority, with automatic load balancing across all nodes
  • Batch processing — files are grouped into batches limited by max_chunk_chars (Meta mode). Other modes process one file per prompt
  • Controlled generation — configurable parameters temperature, seed, num_ctx, num_predict for reproducible results
  • Project summarization — a separate request generates a brief description of the entire source

Indexer

The Indexer manages writing data to OpenSearch:

  • Three indices per source:
    • vibe_meta — project metadata (summary, license, statistics, README)
    • vibe_files_{hash} — full file contents
    • vibe_files_analysis_{hash} — AST, enriched descriptions, and search tags
  • Bulk operations — batch writing for maximum performance
  • Incremental updates — BLAKE3 hash comparison, only changed files are re-processed
  • Cleanup — removes stale data no longer present in the source

MCP Server

The MCP server provides an API for AI assistants:

  • Protocol — Model Context Protocol (MCP) via Streamable HTTP transport
  • 11 tools — admin, get, search, and show categories
  • Anti-Hallucination Protection — parameter normalization, tool name aliases, auto language detection
  • Logging — middleware for tracking all requests
  • CORS — cross-origin request support for web interfaces

Indexing Lifecycle

Full Indexing

1. source add → save path to config
2. scan index → check OpenSearch → cleanup orphaned data → AST parsing → LLM enrichment → indexing

Incremental Updates

1. scan index → load hashes from OpenSearch → compare with files on disk
2. New/modified → AST parsing → LLM enrichment → indexing
3. Deleted → removal from OpenSearch
4. Unchanged → skip

Export without Indexing

1. scan ast → traverse files → AST parsing → export to file
2. scan analyze → traverse files → AST parsing → LLM enrichment → export to file

scan ast and scan analyze do not touch OpenSearch — file export only.

OpenSearch Indices

vibe_meta

One document per project: summary, license, README, aggregated statistics (files, lines, size).

vibe_files_{hash}

One document per file: full contents. The content field is not indexed for search — only stored for retrieval via get_file_content.

vibe_files_analysis_{hash}

One document per text file. Contains AST (functions, classes, imports, etc.), file metadata, and multilingual search tags. The description and tags fields are added after LLM enrichment.

LLM Cluster

Vibe Analyzer uses vibe-cluster for distributing enrichment prompts across multiple LLM providers:

  • Multi-provider — supports Ollama (local), DeepSeek, and Qwen (cloud) simultaneously
  • Load balancing — prompts are distributed across all available nodes via atomic work-stealing
  • Local-first priority — local Ollama models get prompts before cloud providers for faster response
  • Parallel connections — configurable parallel parameter for multiple workers per provider
  • Automatic retry — network errors and server failures trigger automatic retries
  • Exclusive mode — eject and reload models between prompts for clean context
  • Per-node statistics — after enrichment, reports how many files each node processed

This approach allows:

  • Faster enrichment through parallel processing on multiple GPUs and cloud APIs
  • Cost optimization — local models handle most work, cloud used only when needed
  • Scaling — add more nodes to the cluster configuration as needed

Anti-Hallucination Protection

Protection against AI model hallucinations when calling tools:

MechanismDescription
Name aliases150+ alternative tool names (e.g., search_code_functionssearch_by_code_functions)
Parameter normalizationWildcard replacement, whitespace trimming, type casting
Bounds validationlimit always in 1–10 range, level capped
Auto language detectionDetects Cyrillic, Latin, and CJK in search queries
Soft error handlingInvalid parameters don’t cause errors, they are normalized to safe values

Performance

  • Rust — native execution without GC overhead
  • Parallel parsing — each file processed independently
  • Bulk OpenSearch writes — thousands of documents per operation
  • Streaming processing — files are processed as they are discovered, without waiting for the entire directory
  • Incremental updates — only changed files are re-indexed when updating a source

Supported Languages

Vibe Analyzer supports AST parsing, static analysis, and LLM enrichment for 14 languages.

Full List

LanguageExtensionsASTStatic AnalysisEnrichment
Rust.rs
Python.py
JavaScript.js
TypeScript.ts
Java.java
Go.go
C#.cs
Kotlin.kt
Swift.swift
Dart.dart
Bash.sh
Batch.bat
ArkTS.ets
Markdown.md

Extracted Elements

Code

ElementDescription
functionsFunctions and methods with signatures and doc comments
classesClass declarations
structsStruct and record declarations
enumsEnum declarations
interfacesInterface, trait, and protocol declarations
variablesModule-level variables and constants
importsImport statements and dependencies
header_commentsFile-level documentation comments

Markdown

ElementDescription
headingsHeadings with level, title, and preview
linksLinks with text and URL
code_blocksFenced code block languages
frontmatterYAML frontmatter metadata

Static Analysis

ElementDescription
markersCode markers: TODO, FIXME, HACK, XXX, and 17 more
warningsPotential issues: unwrap, panic, empty catch, console.log

Doc Comment Formats

Vibe Analyzer extracts documentation from specially formatted comments. Regular comments (//, #) are ignored.

LanguageDoc CommentModule CommentExample
Rust/// or /** *///! or /*! *//// Adds two numbers
Python"""...""" (docstring)"""...""" at file start"""Adds two numbers"""
JavaScript/** */ (JSDoc)/** */ at file start/** @param {number} a */
TypeScript/** */ (JSDoc)/** */ at file start/** @param a First number */
Java/** */ (Javadoc)/** */ at file start/** @param a First number */
Kotlin/** */ (KDoc)/** */ at file start/** @param a First number */
C#/// or /** *//** */ or /// at file start/// <summary>Adds two numbers</summary>
Swift/// or /** *//// or /** */ at file start/// - Parameters: a: First number
Dart////// at file start/// Adds two numbers
Go// (any before declaration)// at file start// Add adds two numbers
Bash## or # before function## at script start## Module documentation for Bash testing
Batch:: before label:: at script start:: Module documentation for Batch testing
ArkTS/** *//** */ at file start/** Async function example */

Each element receives tags in EN, RU, and ZH for language-agnostic search.

Limitations

  • Maximum file size: 10 MB
  • Ignored directories: target, node_modules, .git, .idea, etc.
  • Binary files: excluded from parsing
  • Nested elements: methods inside classes extracted as functions; variables inside functions are not extracted

Configuration

Vibe Analyzer uses a JSON5 configuration file. JSON5 is an extended version of JSON with support for comments, trailing commas, and other convenient features.

Location

~/.vibe-analyzer/config.json5

The file is created automatically with default settings the first time any CLI command is run.

Configuration Structure

The configuration consists of these sections: version, log_level, opensearch, mcp, analyze, export, cluster, and sources.

Full Example with Comments

{
  // Configuration version (do not modify)
  "version": "0.0.5",

  // Log level: cli, error, warn, info, debug, trace
  "log_level": "cli",

  // OpenSearch connection
  "opensearch": {
    "host": "http://localhost:9200"
  },

  // MCP server
  "mcp": {
    "host": "127.0.0.1",
    "port": 9020,
    "protocol": "latest"
  },

  // Analysis settings
  "analyze": {
    "max_chunk_chars": 3000,
    "include": {
      "meta": false,
      "debt": false,
      "advice": false,
      "errors": false
    }
  },

  // Default export settings
  "export": {
    "format": "json",
    "output_dir": "/Users/keygenqt/Downloads",
    "types": {
      "code": true,
      "markdown": true,
      "text": true,
      "binary": true
    },
    "include": {
      "imports": true,
      "functions": true,
      "variables": true,
      "enums": true,
      "interfaces": true,
      "classes": true,
      "structs": true,
      "comments": true,
      "header_comments": true,
      "headings": true,
      "links": true,
      "code_blocks": true,
      "markers": false,
      "warnings": false,
      "body": false
    }
  },

  // LLM cluster nodes (local and cloud models)
  "cluster": [
    {
      "provider": "ollama",
      "host": "http://localhost:11434",
      "model": "qwen2.5-coder:3b-instruct",
      "timeout_secs": 60,
      "temperature": 0.1,
      "seed": 42,
      "num_ctx": 4096,
      "num_predict": 2048,
      "parallel": 1
    },
    {
      "provider": "deepseek",
      "host": "https://api.deepseek.com/v1",
      "model": "deepseek-v4-flash",
      "api_key": "sk-...",
      "timeout_secs": 120,
      "temperature": 0.1,
      "seed": 42,
      "num_ctx": 4096,
      "num_predict": 2048,
      "parallel": 2
    }
  ],

  // Knowledge sources for indexing
  "sources": ["/Users/keygenqt/Documents/Gitcode/Projects/vibe-analyzer"]
}

Sections in Detail

version

{
  "version": "0.0.5"
}

Configuration file version. Do not modify manually — updated automatically during config migration between versions.

log_level

{
  "log_level": "cli"
}

Application log level. Supported values: cli, error, warn, info, debug, trace. The cli level disables tracing output for clean command-line output.

opensearch

{
  "opensearch": {
    "host": "http://localhost:9200"
  }
}
ParameterTypeDefaultDescription
hoststringhttp://localhost:9200OpenSearch server URL. Can point to a local or remote server

mcp

{
  "mcp": {
    "host": "127.0.0.1",
    "port": 9020,
    "protocol": "latest"
  }
}
ParameterTypeDefaultDescription
hoststring127.0.0.1Server bind address. 0.0.0.0 — accessible externally (Docker, remote clients), 127.0.0.1 — local only
portinteger9020MCP server port
protocolstringlatestMCP protocol version: 2024-11-05, 2025-03-26, 2025-06-18, or latest

analyze

{
  "analyze": {
    "max_chunk_chars": 3000,
    "include": {
      "meta": false,
      "debt": false,
      "advice": false,
      "errors": false
    }
  }
}
ParameterTypeDefaultDescription
max_chunk_charsinteger3000Maximum characters per LLM request batch

Enrichment features:

FeatureDescription
metaGenerate summary and search tags for the file
debtDetect technical debt: TODO markers, magic numbers
adviceSuggest refactoring, naming, and test improvements
errorsFind bugs: unsafe calls, swallowed errors, spelling mistakes

When all features are disabled, only AST data is returned.

export

{
  "export": {
    "format": "json",
    "output_dir": "/Users/keygenqt/Downloads",
    "types": {
      "code": true,
      "markdown": true,
      "text": true,
      "binary": true
    },
    "include": {
      "imports": true,
      "functions": true,
      "variables": true,
      "markers": false,
      "warnings": false,
      "body": false
    }
  }
}
ParameterTypeDefaultDescription
formatstringjsonDefault export format: json, json5, toml, toon, xml, yaml
output_dirstring~/DownloadsDefault output directory for exported files

File types (types):

TypeDescription
codeSource code files (Rust, Python, etc.)
markdownMarkdown documentation files
textText files without AST parser (configs)
binaryBinary files (images, archives)

AST elements (include):

ElementDescription
importsImport statements
functionsFunction signatures
variablesVariable declarations
enumsEnum declarations
interfacesInterface/trait declarations
classesClass declarations
structsStruct declarations
commentsDocumentation comments
header_commentsFile header comments
headingsMarkdown headings
linksMarkdown links
code_blocksMarkdown code blocks
markersStatic analysis markers (TODO, FIXME, HACK)
warningsStatic analysis warnings (unwrap, panic)
bodyBody content (functions, classes, headings)

cluster

{
  "cluster": [
    {
      "provider": "ollama",
      "host": "http://localhost:11434",
      "model": "qwen2.5-coder:3b-instruct",
      "timeout_secs": 60,
      "temperature": 0.1,
      "seed": 42,
      "num_ctx": 4096,
      "num_predict": 2048,
      "parallel": 1
    }
  ]
}
ParameterTypeDefaultDescription
providerstringollamaProvider type: ollama, deepseek, qwen
hoststringhttp://localhost:11434API endpoint URL
modelstringqwen2.5-coder:3b-instructModel name. Ollama: pre-loaded via ollama pull
timeout_secsinteger60Request timeout in seconds
temperaturefloat0.1Generation temperature (0.0 — deterministic, 1.0 — creative)
seedinteger42Random seed for reproducible results
num_ctxinteger4096Context window size in tokens
num_predictinteger2048Maximum tokens in response
api_keystring(none)API key for cloud providers (not needed for Ollama)
parallelinteger1Number of parallel workers for this provider

Multiple nodes can be specified for load distribution across local and cloud models.

sources

{
  "sources": ["/Users/keygenqt/Documents/Gitcode/Projects/vibe-analyzer"]
}
ParameterTypeDefaultDescription
sourcesarray of strings[]List of absolute paths to sources

Managing sources via CLI:

vibe-analyzer source add /path/to/project
vibe-analyzer source remove --target /path/to/project
vibe-analyzer source list

Configuration Validation

Vibe Analyzer validates the configuration at startup and applies safe defaults if parameters are missing or invalid:

  • max_chunk_chars → minimum 1000, maximum 100000
  • Invalid paths → normalized to absolute form
  • Missing sections → created with default values

Overriding the Configuration File

vibe-analyzer --config /custom/path/config.json5 source list

Default is ~/.vibe-analyzer/config.json5.

CLI Reference

Vibe Analyzer provides a command-line interface for managing knowledge sources, analysis, exporting, indexing, and running the MCP server.

General Syntax

vibe-analyzer [global options] <command> [subcommand] [options]

Global Options

OptionDescription
--config <path>Path to config file (default: ~/.vibe-analyzer/config.json5)
--helpShow help
--versionShow version

Commands

source — Source Management

Add, remove, and list knowledge sources.

vibe-analyzer source <subcommand>
SubcommandDescription
add <path>Adds a new directory or file to the sources list. The path is automatically converted to absolute
remove --target <path>Removes a source from the configuration. Accepts full path or unique directory name
listShows all added sources with absolute paths

Examples:

vibe-analyzer source add /home/user/projects/my-app
vibe-analyzer source remove --target my-app
vibe-analyzer source list

analyze — Analysis and Export

Extract code structure, run LLM enrichment, and export results.

vibe-analyzer analyze export [options]
OptionDescription
--target <path>Process a specific source. If not specified — all sources are processed
-m, --mode <mode>Analysis modes: ast, meta, debt, errors, advice (comma-separated)
-t, --type <type>File types to export: code, markdown, text, binary (comma-separated)
-f, --format <format>Export format: json (default), json5, toml, toon, xml, yaml
-p, --path <path>Source path for direct file/directory scanning
-o, --output <path>Export path. If not specified — file is created in ~/Downloads/

Examples:

# AST only, all sources
vibe-analyzer analyze export

# AST + LLM enrichment for code files only
vibe-analyzer analyze export -m meta,debt -t code

# Errors in markdown documentation only
vibe-analyzer analyze export -m errors -t markdown

# Full analysis with JSON5 export
vibe-analyzer analyze export -m meta,debt,errors,advice -f json5

index — OpenSearch Indexing

Full cycle: AST parsing → LLM enrichment → write to OpenSearch.

vibe-analyzer analyze index [options]
OptionDescription
--target <path>Index a specific source. If not specified — all sources
--forceForce full reindexing. Ignores hashes and processes all files again

Examples:

vibe-analyzer analyze index
vibe-analyzer analyze index --target my-app
vibe-analyzer analyze index --target my-app --force

stats — Statistics

View information and statistics for indexed projects.

vibe-analyzer stats <subcommand>

stats info

vibe-analyzer stats info [options]
OptionDescription
--target <path>Show statistics for a specific project. If not specified — all projects

Example output:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Language                  Files           Lines     AST Objects            Size
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Rust                        194           15146            1631       498.26 KB
Markdown                     33            2884             296       102.64 KB
...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total                       287           24893            2082       755.11 KB
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

stats tree

vibe-analyzer stats tree --target <project> [options]
OptionDescription
--target <path>Project to display the tree for (required)
-L, --level <number>Maximum tree depth (default: 3)

serve — MCP Server

Start, stop, and check MCP server status.

vibe-analyzer serve <subcommand>
SubcommandDescription
startStart MCP server
stopStop running server (reserved)
statusShow server status (reserved)

Examples:

vibe-analyzer serve start
vibe-analyzer serve start --port 9020 --host 127.0.0.1

Export Formats

Six formats available with --format:

FormatKeyExtensionDescription
JSONjson.jsonCompact JSON without extra whitespace — minimal file size
JSON5json5.json5JSON5 with comments and trailing commas — human-readable
TOMLtoml.tomlTOML format
TOONtoon.toonTOON format — token-efficient output, optimized for LLMs
XMLxml.xmlXML with pretty-print formatting
YAMLyaml.yamlYAML format

MCP Tools

Vibe Analyzer provides 11 MCP tools that AI models can call to search code and documentation. Each tool returns a structured response — no document stuffing.

How It Works

AI model → selects a tool → calls MCP → receives a structured response

Rules for the AI model: use tools, respond concisely, one call is enough.

Tools

ToolParametersDescription
admin_syncReindex all projects in the background
get_file_contentpath (required)Full file content by path. Supports partial matching and wildcards
get_file_astpath (required)Full AST: imports, functions, classes, structs, enums, headings
search_by_code_importsquery (required), target, limit (3)Search by imports/dependencies
search_by_code_functionsquery (required), target, limit (3)Search by functions/methods
search_by_code_classesquery (required), target, limit (3)Search by classes/structs/interfaces
search_by_code_variablesquery (required), target, limit (3)Search by variables/constants/enums
search_documentationquery (required), limit (3)Search all markdown docs. Multilingual (RU/EN/ZH)
search_knowledgequery (required), limit (3)Alias for search_documentation
show_projectssize (10)List all indexed projects with descriptions
show_statstargetProject statistics by language, files, lines of code
show_treetarget, level (3)Directory tree of files and folders

All limit parameters are capped to 1–10 range. target accepts full path or unique directory name.

Anti-Hallucination Protection

MechanismDescription
Name aliases (160+)Fixes distorted tool names automatically
Parameter normalizationWildcards, whitespace, invalid values → safe defaults
Auto language detectionCyrillic → Russian tags, Latin → English, CJK → Chinese
Soft error handlingInvalid parameters don’t cause errors, they’re normalized

LLM Enrichment

After AST parsing, Vibe Analyzer enriches results via vibe-cluster across multiple providers in four analysis modes: Meta, Debt, Errors, and Advice.

How It Works

AST data → static analysis → batching → LLM request → enrichment results

1. Static Analysis

Before LLM processing, code is analyzed programmatically across 14 languages:

  • Markers: TODO, FIXME, HACK, and 18 other markers
  • Warnings: unsafe calls (unwrap, panic), swallowed exceptions, debug statements

These are included in the prompt so the model can build on them.

2. File Enrichment

Batching. Files are grouped into batches limited by max_chunk_chars (default 3000) from the analyze config section.

The prompt contains the AST structure: functions, classes, structs, and other elements. Which elements to include is controlled by export.include in config.

Analysis modes:

ModeDescription
metaGenerate summary and search tags for each file
debtDetect technical debt: code duplication, magic numbers, complex logic
errorsFind bugs: unsafe calls, swallowed errors
adviceSuggest refactoring, naming, and test improvements

Prompt. For Meta, the model receives a JSON template to fill. For Debt/Errors/Advice, the model returns a JSON array of issues found.

Response. Example for Meta:

{
  "files": [
    {
      "path": "src/main.rs",
      "description": "Main entry point for the CLI application",
      "tags": ["entry", "point", "cli", "argument", "parsing", "rust", "binary"]
    }
  ]
}

3. Parallel Processing

If multiple cluster nodes are configured, prompts are distributed via vibe-cluster:

  • Local models (Ollama) get priority over cloud providers
  • Nodes process prompts in parallel via atomic work-stealing
  • Configurable parallel parameter for multiple workers per provider
  • At the end, per-node statistics are reported

4. JSON Repair

LLMs often corrupt JSON: add comments, wrap in markdown blocks, drop quotes. clean_llm_json fixes this automatically.

5. Retries

If a node returns fewer results than expected — automatic retries with delay. After all retries are exhausted, the error is recorded in the result.

Generation Parameters

Configured per cluster node:

ParameterDefaultDescription
temperature0.1Low temperature for stable results
seed42Fixed seed for reproducibility
num_ctx4096Context window size in tokens
num_predict2048Maximum tokens in response
timeout_secs60Request timeout in seconds

Model Warm-Up

Before enrichment, for each cluster node:

  1. Availability check
  2. For Ollama nodes: model presence check and warm-up request to load the model into memory

Exporting Results

# AST with LLM enrichment
vibe-analyzer analyze export -m meta,debt

# Code only, JSON5 format
vibe-analyzer analyze export -m errors -t code -f json5

Supported export formats: JSON, JSON5, TOML, TOON, XML, YAML.

Search and Indexing

Vibe Analyzer stores all data in OpenSearch and uses multilingual analyzers for search.

Three Indices

Three indices are created for each project:

IndexPurposeContents
vibe_metaMetadata1 document per project: summary, license, README, statistics
vibe_files_{hash}ContentOne document per file: full contents (not indexed for search, store only)
vibe_files_analysis_{hash}SearchOne document per text file: AST, description, tags

Multilingual Search

OpenSearch is configured with three analyzers:

  • russian_analyzer (type russian) — stemming for Russian
  • english_analyzer (type english) — stemming for English
  • chinese_analyzer (type chinese) — segmentation for Chinese

Each text field in vibe_files_analysis has three sub-fields — one per analyzer. This allows searching for “функции”, “functions”, and “函数” with correct morphology for each language.

Search Mechanics

Documentation Search (search_documentation)

The most complex query. Algorithm:

  1. Script detection in the query — Cyrillic, Latin, CJK
  2. Word extraction (longer than 2 characters)
  3. Wildcard search on headings with 10.0 boost + stemming for long words
  4. Language-specific match queries — for each detected script, a separate query to the corresponding sub-field with fuzziness
  5. Boost for knowledge documents — if the frontmatter contains knowledge: true, the document gets a 5.0 boost

Ranking priority:

  • Headings (headings.title) — 10.0 boost
  • Preview (headings.preview) — 2.0 boost
  • Links (links.text) — 2.0 boost
  • Tags (tags) — 1.0 boost

Each search type has its own strategy:

  • Importswildcard on the imports keyword field + tags
  • Functionsmatch_phrase_prefix on signatures + match on comments (nested queries)
  • Classes/structs/interfaces — three nested queries in should with minimum_should_match: 1
  • Variables/enumsmatch_phrase_prefix on signatures + match on comments (nested queries)

All code searches use fuzziness: AUTO for fuzzy matching and boost tags higher than specific fields.

Incremental Indexing

Vibe Analyzer doesn’t re-index files unnecessarily:

  1. Fetching hashes from OpenSearch via Scroll API — GET /{index}/_search?scroll=1m
  2. Comparison — a BLAKE3 hash is computed for each file and compared against the indexed one
  3. Skipping unchanged — files with matching hashes are not processed

If the --force flag is passed, hashes are ignored — all files are indexed.

Bulk Indexing

All documents are written to OpenSearch in batches via the Bulk API in NDJSON format:

{"index": {"_index": "vibe_files_xxx", "_id": "src/main.rs"}}
{"root": "/project", "path": "src/main.rs", "content": "..."}
{"index": {"_index": "vibe_files_xxx", "_id": "src/lib.rs"}}
{"root": "/project", "path": "src/lib.rs", "content": "..."}

The document ID is the file path (path). This ensures that re-indexing updates the existing document rather than creating a duplicate.

Orphaned Data Cleanup

cleanup runs automatically during indexing:

  1. Index removal for deleted projects
  2. Document removal for files no longer on disk (comparing paths in the index and on the filesystem)
  3. Meta-document removal for projects removed from the configuration

Project Statistics

show_stats_search collects aggregated statistics across all indexed files via the Scroll API. This enables:

  • Project reports — language breakdown, file count, lines, AST objects
  • Data presence checks — if statistics are empty, indexing hasn’t been performed or the project hasn’t been added
  • Codebase size estimation — total size, text and binary file counts

Aggregation runs across all documents from files_analysis:

  • Language grouping (via get_language_name)
  • AST object counting: sum of functions, classes, structs, enums, interfaces, variables, imports, headings, links, code blocks
  • Other — files without a detectable language
  • Languages sorted by lines of code descending

Integrations

Vibe Analyzer provides an MCP server that AI assistants can connect to via the Model Context Protocol. Once connected, the model gains 11 tools for searching code and documentation.

How It Works from the User’s Perspective

The user communicates with the AI assistant in natural language. The model decides which tool to call. Examples from real testing scenarios:

Code Search

User QueryToolWhat Happens
“Find add functions in the samples project”search_by_code_functionsSearches for functions with add in the signature, returns files and signatures
“What classes are in samples?”search_by_code_classesReturns all classes, structs, interfaces
“Show all enums in samples”search_by_code_variablesEnums are also searched through this tool
“What libraries are used in samples?”search_by_code_importsList of all imports in the project
“List files that have the MAX_VALUE constant”search_by_code_variablesSearch by constant name

File Viewing

User QueryTool
“Show the contents of src/main.rsget_file_content
“Show the structure of main.pyget_file_ast
“What functions are in src/main.rs?”get_file_ast
“Open utils.pyget_file_content
User QueryTool
“Who is Zizikosh?”search_documentation
“Tell me about Kukyrbur’s abilities”search_documentation
“Find Python coding guidelines”search_documentation
“Show the release process”search_documentation
“Find the code review checklist”search_documentation

Project Navigation

User QueryTool
“What projects are in the database?”show_projects
“Show the tree of the samples project”show_tree
“How many files are in knowledge?”show_stats
“Show overall statistics for all projects”show_stats

Administration

User QueryTool
“Update the index”admin_sync
“Reindex projects”admin_sync

How to Phrase Queries

The model understands queries in natural language. You don’t need to use exact tool names — plain language is enough.

Good:

  • “Find add functions in the samples project”
  • “What classes are in samples?”
  • “Show the contents of src/main.rs”
  • “Who is Zizikosh?”

Unnecessary (the model will understand via AliasHandler anyway, but it’s better to avoid):

  • “Call search_by_code_functions with query=add”
  • “Use the get_file_content tool for path=src/main.rs”

Important Notes

  • Project names — you can use the full path or directory name: "samples" or "/path/to/samples"
  • File paths — relative to the project root: "src/main.rs", partial matching is supported
  • Result limit — default 3, maximum 10. If the model requests “all”, the limit is automatically raised
  • One call is enough — the model is trained to respond after a single tool call, no need to ask again

Connecting to Open WebUI

  1. Start the MCP server:

    vibe-analyzer serve start
    
  2. In Open WebUI settings, add a new MCP server:

    • URL: http://localhost:9020
    • Transport: Streamable HTTP
  3. Tools appear automatically

Connecting to Claude Desktop

Add to the configuration:

{
  "mcpServers": {
    "vibe-analyzer": {
      "url": "http://localhost:9020",
      "transport": "streamable-http"
    }
  }
}

MCP Protocol

Supported versions: 2024-11-05, 2025-03-26, 2025-06-18, latest. Configured in the settings:

{
  "mcp": {
    "host": "127.0.0.1",
    "port": 9020,
    "protocol": "latest"
  }
}

Security

  • Server without authentication — for trusted networks or localhost
  • Default host 127.0.0.1 (local only)
  • 0.0.0.0 — for access from Docker containers or other machines
  • Server only reads data, admin_sync is the only tool that triggers background indexing

Testing

Vibe Analyzer uses end-to-end tests powered by the vibe-tests framework to verify MCP tool relevance.

How It Works

engine_config! → engine.test("natural language query") → LLM selects tool → tool returns result → verify

The framework:

  • Starts the MCP server automatically
  • Runs queries against real Ollama models (3B, 7B)
  • Verifies the model selected the correct tool
  • Saves structured JSON reports with timing, tool calls, and responses

Test Scenarios

All 11 MCP tools are covered, each with 5 queries in Russian and English:

tests/mcp/
├── admin_sync.rs
├── get_file_ast.rs
├── get_file_content.rs
├── search_by_code_classes.rs
├── search_by_code_functions.rs
├── search_by_code_imports.rs
├── search_by_code_variables.rs
├── search_documentation.rs
├── show_projects.rs
├── show_stats.rs
└── show_tree.rs

Total: 60 queries across 11 tools × 2 models = 120 tests.

Example test:

#[tokio::test]
async fn test_search_functions_add_ru() {
    let engine = vibe_tests::engine().await;
    let result = engine.test("Найди функции add в проекте 'samples'").await;
    assert!(result.success);
    assert!(result.models.iter().all(|m| m.tool.as_deref() == Some("search_by_code_functions")));
}

Test Infrastructure

  • OpenSearch — via Docker Compose
  • MCP server — started automatically on port 9021
  • Fixtures — test projects samples (code) and knowledge (docs, legends)
  • Multi-model — each query tested against multiple Ollama models for robustness

Running

# Full E2E tests (require Docker + Ollama)
cargo test --test mcp_test -- --nocapture

Expected Model Behavior

The test verifies that the model:

  1. Called a tool — the correct tool for the query
  2. Did not call a non-existent tool — alias resolution works
  3. The tool returned a result — non-empty response

Reports

After each run, a structured JSON report is saved with per-query details: model, tool, args, response, duration, and success status.

Recipes

Practical examples of using Vibe Analyzer through an AI assistant. Based on the test fixtures: samples (code) and knowledge (documentation).

Exploring an Unfamiliar Project

Goal: you’re seeing a project for the first time and want to quickly understand its structure.

You: What projects are available?
AI: [show_projects]
- knowledge — guidelines and processes for code writing
- samples — sample code in Rust and Python

You: Show the tree of samples
AI: [show_tree]
samples/
├── python_sample/
│   ├── main.py
│   ├── models.py
│   └── utils.py
├── rust_lib/
│   ├── Cargo.toml
│   └── src/
│       └── lib.rs
└── rust_sample/
    ├── Cargo.toml
    └── src/
        └── main.rs

You: Show statistics for knowledge
AI: [show_stats]
Markdown    15 files    663 lines    112 AST objects

You: What functions are in utils.py?
AI: [get_file_ast for utils.py]
- add(a: int, b: int) -> int — Returns sum of two numbers
- multiply(a: int, b: int) -> int — Returns product of two numbers
- divide(a: int, b: int) -> float — Divides two numbers safely

Finding Dependencies

Goal: find out what libraries are used in a project.

You: What libraries are used in samples?
AI: [search_by_code_imports]
- thiserror::Error (Rust)
- super::* (Rust)
- dataclasses (Python)
- typing (Python)
- utils, models (Python)

Finding an Implementation

Goal: find where a specific function is implemented.

You: Find the multiply function
AI: [search_by_code_functions with query="multiply"]
Found in 2 files:
- python_sample/utils.py: def multiply(a: int, b: int) -> int
- rust_lib/src/lib.rs: pub fn multiply(a: i32, b: i32) -> i32

Goal: find rules or processes.

You: Find Python coding guidelines
AI: [search_documentation with query="Python coding rules"]
Found in knowledge/guidelines/code/python.md:
  # Python Coding Guidelines
  Follow PEP 8. Use snake_case, type annotations, docstrings.

You: What is the release process?
AI: [search_documentation with query="release process"]
Found in knowledge/guidelines/process/release.md:
  # Release Process
  1. All milestone tasks closed
  2. Create a release branch from develop
  3. Update version in Cargo.toml/package.json
  4. Merge into main
  5. Notify the team

Goal: find information about project characters or legends.

You: Who is Zizikosh?
AI: [search_documentation with query="Zizikosh"]
Found in knowledge/legends/zizikosh_blogger.md:
  # Zizikosh - Blogger
  Video blogger, unpacks packages from AliExpress.
  150K subscribers.

You: What is Bakobur's dream?
AI: [search_documentation with query="Bakobur's dream"]
Found in knowledge/legends/bakobur_homeless.md:
  Dream: Warm socks and a doghouse for Sharik

Inspecting a Specific File

Goal: view the contents or structure of a specific file.

You: Show the contents of Cargo.toml
AI: [get_file_content]
[package]
name = "rust_lib"
version = "0.1.0"
edition = "2021"

[dependencies]
anyhow = "1.0"
thiserror = "1.0"

You: What structs are in lib.rs?
AI: [get_file_ast]
- struct User { name, age }
- struct Calculator { value }
- enum MathError { DivisionByZero, OutOfRange }
- enum Operation { Add, Multiply, Divide }

Incremental Update

You: Update the index
AI: [admin_sync]
Indexing started. Projects are updating.

Tips

  • Start broadshow_projectsshow_treeshow_stats
  • Refine with searchsearch_by_code_functions, search_documentation
  • Inspect detailsget_file_content, get_file_ast
  • Update the index after changesadmin_sync
  • Use natural language — the model will choose the right tool automatically