Semantic Code Analysis Plugin¶

The file_analysis plugin's semantic_code_analysis tool performs a semantic analysis of a source file and returns a structured report with per-symbol analyses, a deduplicated issue list, and a UTC timestamp. It supports two analysis strategies:

LSP path (default) — queries a language server for the file's symbol tree, then analyses each symbol bottom-up (leaves first, roots last) with an LLM, passing prior analyses as context.
Fast path — when fast_path_max_bytes is set and the file is smaller than the threshold, the LSP is skipped and the full file is passed to the LLM in a single call, which identifies and analyses symbols itself. Useful for small utility files where LSP startup overhead is not worth the cost.

Both paths produce the same report structure and are fully interchangeable.

Quick start¶

Ensure your settings declare an LLM and point the tool at it:

# ~/.config/aifred-tk/settings.yml
llms:
  my-llm:
    provider: openai
    model: gpt-4o-mini

tools:
  semantic_code_analysis:
    lsp_llm:
      type: ref
      ref: my-llm
    fast_llm:
      type: ref
      ref: my-llm

Then invoke the tool:

aifred-tk semantic_code_analysis --file-path src/auth.py --language python \
  --prompt "identify security issues"

Configuration¶

LLM settings¶

The tools.semantic_code_analysis.lsp_llm and tools.semantic_code_analysis.fast_llm keys are required. They accept a reference to a named LLM or an inline definition. See LLMs for the full format.

# Reference a named LLM
tools:
  semantic_code_analysis:
    lsp_llm:
      type: ref
      ref: my-llm
    fast_llm:
      type: ref
      ref: my-llm

# Or define inline
tools:
  semantic_code_analysis:
    lsp_llm:
      type: custom
      provider: openai
      model: gpt-4o-mini
    fast_llm:
      type: custom
      provider: ollama
      model: gemma4:e4b
      host: 127.0.0.1
      port: 11434

Enabling and disabling¶

The tool respects the standard enabled flag:

tools:
  semantic_code_analysis:
    enabled: false

Usage¶

CLI¶

aifred-tk semantic_code_analysis --file-path PATH --language LANG --prompt TEXT \
  [--output-file PATH] [--output-format {json,yaml,markdown}] \
  [--max-retries INT] [--local-symbols-only BOOL] [--fast-path-max-bytes INT] \
  [--excluded-symbol-kinds KIND ...]

Option	Description
`--file-path PATH`	Path to the source file to analyse. Required.
`--language LANG`	Programming language identifier (see Supported languages). Required.
`--prompt TEXT`	Analysis request applied to every symbol (max 500 characters). Required.
`--output-file PATH`	Optional path to write the report. When omitted the result is returned inline.
`--output-format {json,yaml,markdown}`	Output format. `json` (default) returns the full structured report; `yaml` returns the same data in YAML block format; `markdown` returns a human-readable summary.
`--max-retries INT`	Maximum LLM call attempts per symbol before skipping it. Default 3, range 1–10. LSP path only.
`--local-symbols-only BOOL`	When `true` (default), restricts analysis to symbols defined in the target file. LSP path only.
`--fast-path-max-bytes INT`	When greater than `0`, files strictly smaller than this byte count bypass the LSP. Default `0` (disabled). See Fast path.
`--excluded-symbol-kinds KIND ...`	Optional symbol kind names to skip (case-insensitive). To specify multiple kinds, repeat the flag (e.g., `--excluded-symbol-kinds Function --excluded-symbol-kinds Class`). Applies to both paths. See Filtering by symbol kind.

MCP¶

The tool is registered automatically when the MCP server starts. Invoke it as aifred_semantic_code_analysis with the required file_path, language, and prompt arguments, plus optional output_file, output_format, max_retries, local_symbols_only, fast_path_max_bytes, and excluded_symbol_kinds.

Fast path¶

By default the tool always uses the LSP, regardless of file size. Set fast_path_max_bytes to a positive integer to enable the fast path for files below that threshold:

# Files smaller than 16 KB skip the LSP
aifred-tk semantic_code_analysis --file-path utils.py --language python \
  --prompt "describe each function" --fast-path-max-bytes 16384

# Via MCP (fast_path_max_bytes in bytes)
fast_path_max_bytes: 16384

When the fast path is taken:

The entire file is read and forwarded to the LLM in a single call.
The LLM identifies symbols, estimates their line ranges, and produces the analysis in one shot.
start_col and end_col are always 0 in the returned records (column positions require LSP).
--max-retries and --local-symbols-only have no effect; only one LLM call is made.
The report structure (file_path, language, analyzed_at, symbols, issues, errors) is identical to the LSP path — callers do not need to distinguish between the two.

Choosing a threshold — a value between 8 KB and 32 KB works well for most projects. Files in that range are typically small utility modules or single-class files where LSP startup cost (often 2–5 s) outweighs the analysis time. Larger files benefit from the LSP path because the bottom-up context passing improves analysis quality on complex symbol hierarchies.

Output¶

Inline JSON result (`status: ok`)¶

Returned when output_file is not set and output_format is json:

{
  "status": "ok",
  "report": {
    "file_path": "src/auth.py",
    "language": "python",
    "analyzed_at": "2025-05-03T10:22:00.123456+00:00",
    "symbols": [
      {
        "fqn": "AuthService",
        "kind": "Class",
        "start_line": 12,
        "end_line": 80,
        "analysis": "Manages user authentication ...",
        "issues": ["Tokens are stored in plaintext."],
        "error": null,
        "children": [
          {
            "fqn": "AuthService.login",
            "kind": "Function",
            "start_line": 20,
            "end_line": 45,
            "analysis": "Validates credentials and issues a token.",
            "issues": ["Tokens are stored in plaintext."],
            "error": null,
            "children": []
          }
        ]
      }
    ],
    "issues": ["Tokens are stored in plaintext."],
    "errors": []
  }
}

Inline YAML result¶

When output_file is not set and output_format is yaml, the tool returns a raw YAML string (no status wrapper). The YAML structure mirrors the JSON report exactly:

file_path: src/auth.py
language: python
analyzed_at: '2025-05-03T10:22:00.123456+00:00'
symbols:
- fqn: AuthService
  kind: Class
  start_line: 12
  end_line: 80
  analysis: Manages user authentication ...
  issues:
  - Tokens are stored in plaintext.
  error: null
  children:
  - fqn: AuthService.login
    kind: Function
    start_line: 20
    end_line: 45
    analysis: Validates credentials and issues a token.
    issues:
    - Tokens are stored in plaintext.
    error: null
    children: []
issues:
- Tokens are stored in plaintext.
errors: []

Inline markdown result¶

When output_file is not set and output_format is markdown, the tool returns a raw markdown string directly (no status wrapper).

Written to file (`status: written`)¶

Returned when output_file is set and the write succeeds:

{
  "status": "written",
  "output_file": "/tmp/report.json",
  "report": { "..." : "..." }
}

When output_format is yaml or markdown and output_file is set, the rendered content is written to the file and the returned dict still contains the structured report dict.

Error (`status: error`)¶

{"status": "error", "message": "File not found: src/missing.py"}

{"status": "error", "message": "File is ignored by .aiignore: secrets/db.env"}

{"status": "error", "message": "LSP error: unsupported language 'cobol'"}

{"status": "error", "message": "No symbols found (language server returned an empty result)."}

{"status": "error", "message": "Agent error: <provider error details>"}

How it works¶

After validating the file and resolving the workspace path, the tool selects one of two strategies:

LSP path (default)¶

Symbol extraction — an LSP server for the target language returns the file's symbol tree.
Container name inference — missing parent names are recovered using source range containment.
Local symbol filtering — when local_symbols_only is true, symbols defined outside the target file (e.g. imported names) are removed.
Kind filtering — when excluded_symbol_kinds is non-empty, symbols of matching kinds are dropped. No LLM call is made for excluded symbols.
Bottom-up sort — symbols are ordered by line span (smallest/leaves first, largest/roots last).
Iterative LLM analysis — each symbol is analysed in turn; the accumulated analyses of all prior symbols are supplied as read-only context, letting container symbols be synthesized from their children's descriptions.
Post-processing — flat records are re-nested into a tree by FQN hierarchy, issues are deduplicated, and the report is rendered as JSON, YAML, or markdown.

Fast path (`fast_path_max_bytes > 0` and file size below threshold)¶

File read — the full file content is read into memory.
Single LLM call — the file is forwarded to the LLM with the analysis prompt. The LLM identifies symbols, builds the FQN hierarchy, estimates line ranges, and produces analyses and issues in one response.
Kind filtering — when excluded_symbol_kinds is non-empty, matching symbols and their entire subtrees are removed recursively from the LLM's output before any further processing.
Post-processing — the filtered symbol tree is converted to SymbolRecord objects (with start_col/end_col set to 0), issues are deduplicated recursively across the tree, and the report is rendered identically to the LSP path.

Filtering by symbol kind¶

Pass excluded_symbol_kinds to skip entire categories of symbols before any LLM call is made. Matching is case-insensitive. Both the LSP and fast paths honour this parameter.

# Skip properties and module-level variables (LSP path)
aifred-tk semantic_code_analysis --file-path src/models.py --language python \
  --prompt "describe what each symbol does" \
  --excluded-symbol-kinds Property --excluded-symbol-kinds Variable

# Via MCP
excluded_symbol_kinds:
  - Property
  - Variable

When to use this:

Focus analysis on high-signal symbols (functions, methods, classes) and ignore low-signal ones (properties, constants, type aliases).
Reduce token usage and LLM cost on files with many trivial fields or constants.
Avoid noise in the issues list from symbols that are intentionally simple.

Valid kind names (from the LSP SymbolKind spec; fast-path LLM also recognises all of them):

Array, Boolean, Class, Constant, Constructor, Enum, EnumMember, Event, Field, Function, Interface, Key, Method, Module, Namespace, Null, Number, Object, Operator, Package, Property, String, Struct, TypeParameter, Variable

On the LSP path, the kind names come directly from the language server's SymbolKind enum and must match one of the values above. On the fast path the LLM infers kinds; pass the same names and the filter will match whatever the LLM produces.

Effect on parent/child relationships:

LSP path — excluded symbols are removed from the flat list before the bottom-up sort. Parent symbols whose children are all excluded will still appear in the report (they are only skipped if their own kind is also excluded).
Fast path — exclusion is recursive: excluding a parent kind drops the entire subtree.

Supported languages¶

python, typescript, javascript, java, rust, go, kotlin, csharp, ruby, dart

Limits and security¶

.aiignore — before reading any file, the tool checks .aiignore files from the target path up to the filesystem root. Matching files are refused immediately.
Size limit — files larger than 1 MB (1,048,576 bytes) are rejected before any LLM call.
Prompt injection isolation — on the LSP path, the analysis prompt, symbol source, and prior analyses are each wrapped in XML isolation markers (<analysis_request>, <symbol_source>, <previously_analyzed>). On the fast path, the prompt and full file source are wrapped in <analysis_request> and <source> respectively. In both cases, embedded instructions inside the file cannot act as agent directives.
Retry behaviour — on the LSP path, when an LLM call fails, the tool retries up to max_retries times. If all attempts fail, that symbol is recorded with an error field and analysis continues for remaining symbols. Failed symbols appear in report.errors. The fast path makes a single LLM call; pydantic-ai handles validation retries internally.
local_symbols_only — defaults to true. Setting it to false includes imported and re-exported symbols returned by the language server, which may increase token usage significantly. Has no effect on the fast path.
excluded_symbol_kinds — kind matching is case-insensitive and applied before any LLM call. On the LSP path, if all symbols are excluded the tool returns status: error with a "No symbols found" message. On the fast path the filter is applied after the single LLM call, so excluded kinds still consume tokens during extraction; use the LSP path if pre-filtering matters.
Column positions — start_col and end_col are always 0 on the fast path because column positions are not available without LSP. Line numbers are LLM estimates and may be approximate.