Semantic Code Analysis Plugin¶
The file_analysis plugin's semantic_code_analysis tool performs a semantic analysis of a
source file and returns a structured report with per-symbol analyses, a deduplicated issue list,
and a UTC timestamp. It supports two analysis strategies:
- LSP path (default) — queries a language server for the file's symbol tree, then analyses each symbol bottom-up (leaves first, roots last) with an LLM, passing prior analyses as context.
- Fast path — when
fast_path_max_bytesis set and the file is smaller than the threshold, the LSP is skipped and the full file is passed to the LLM in a single call, which identifies and analyses symbols itself. Useful for small utility files where LSP startup overhead is not worth the cost.
Both paths produce the same report structure and are fully interchangeable.
Quick start¶
Ensure your settings declare an LLM and point the tool at it:
# ~/.config/aifred-tk/settings.yml
llms:
my-llm:
provider: openai
model: gpt-4o-mini
tools:
semantic_code_analysis:
lsp_llm:
type: ref
ref: my-llm
fast_llm:
type: ref
ref: my-llm
Then invoke the tool:
aifred-tk semantic_code_analysis --file-path src/auth.py --language python \
--prompt "identify security issues"
Configuration¶
LLM settings¶
The tools.semantic_code_analysis.lsp_llm and tools.semantic_code_analysis.fast_llm
keys are required. They accept a reference to a named LLM or an inline definition.
See LLMs for the full format.
# Reference a named LLM
tools:
semantic_code_analysis:
lsp_llm:
type: ref
ref: my-llm
fast_llm:
type: ref
ref: my-llm
# Or define inline
tools:
semantic_code_analysis:
lsp_llm:
type: custom
provider: openai
model: gpt-4o-mini
fast_llm:
type: custom
provider: ollama
model: gemma4:e4b
host: 127.0.0.1
port: 11434
Enabling and disabling¶
The tool respects the standard enabled flag:
Usage¶
CLI¶
aifred-tk semantic_code_analysis --file-path PATH --language LANG --prompt TEXT \
[--output-file PATH] [--output-format {json,yaml,markdown}] \
[--max-retries INT] [--local-symbols-only BOOL] [--fast-path-max-bytes INT] \
[--excluded-symbol-kinds KIND ...]
| Option | Description |
|---|---|
--file-path PATH |
Path to the source file to analyse. Required. |
--language LANG |
Programming language identifier (see Supported languages). Required. |
--prompt TEXT |
Analysis request applied to every symbol (max 500 characters). Required. |
--output-file PATH |
Optional path to write the report. When omitted the result is returned inline. |
--output-format {json,yaml,markdown} |
Output format. json (default) returns the full structured report; yaml returns the same data in YAML block format; markdown returns a human-readable summary. |
--max-retries INT |
Maximum LLM call attempts per symbol before skipping it. Default 3, range 1–10. LSP path only. |
--local-symbols-only BOOL |
When true (default), restricts analysis to symbols defined in the target file. LSP path only. |
--fast-path-max-bytes INT |
When greater than 0, files strictly smaller than this byte count bypass the LSP. Default 0 (disabled). See Fast path. |
--excluded-symbol-kinds KIND ... |
Optional symbol kind names to skip (case-insensitive). To specify multiple kinds, repeat the flag (e.g., --excluded-symbol-kinds Function --excluded-symbol-kinds Class). Applies to both paths. See Filtering by symbol kind. |
MCP¶
The tool is registered automatically when the MCP server starts. Invoke it as
aifred_semantic_code_analysis with the required file_path, language, and prompt arguments,
plus optional output_file, output_format, max_retries, local_symbols_only,
fast_path_max_bytes, and excluded_symbol_kinds.
Fast path¶
By default the tool always uses the LSP, regardless of file size. Set fast_path_max_bytes to a
positive integer to enable the fast path for files below that threshold:
# Files smaller than 16 KB skip the LSP
aifred-tk semantic_code_analysis --file-path utils.py --language python \
--prompt "describe each function" --fast-path-max-bytes 16384
When the fast path is taken:
- The entire file is read and forwarded to the LLM in a single call.
- The LLM identifies symbols, estimates their line ranges, and produces the analysis in one shot.
start_colandend_colare always0in the returned records (column positions require LSP).--max-retriesand--local-symbols-onlyhave no effect; only one LLM call is made.- The report structure (
file_path,language,analyzed_at,symbols,issues,errors) is identical to the LSP path — callers do not need to distinguish between the two.
Choosing a threshold — a value between 8 KB and 32 KB works well for most projects. Files in that range are typically small utility modules or single-class files where LSP startup cost (often 2–5 s) outweighs the analysis time. Larger files benefit from the LSP path because the bottom-up context passing improves analysis quality on complex symbol hierarchies.
Output¶
Inline JSON result (status: ok)¶
Returned when output_file is not set and output_format is json:
{
"status": "ok",
"report": {
"file_path": "src/auth.py",
"language": "python",
"analyzed_at": "2025-05-03T10:22:00.123456+00:00",
"symbols": [
{
"fqn": "AuthService",
"kind": "Class",
"start_line": 12,
"end_line": 80,
"analysis": "Manages user authentication ...",
"issues": ["Tokens are stored in plaintext."],
"error": null,
"children": [
{
"fqn": "AuthService.login",
"kind": "Function",
"start_line": 20,
"end_line": 45,
"analysis": "Validates credentials and issues a token.",
"issues": ["Tokens are stored in plaintext."],
"error": null,
"children": []
}
]
}
],
"issues": ["Tokens are stored in plaintext."],
"errors": []
}
}
Inline YAML result¶
When output_file is not set and output_format is yaml, the tool returns a raw YAML string
(no status wrapper). The YAML structure mirrors the JSON report exactly:
file_path: src/auth.py
language: python
analyzed_at: '2025-05-03T10:22:00.123456+00:00'
symbols:
- fqn: AuthService
kind: Class
start_line: 12
end_line: 80
analysis: Manages user authentication ...
issues:
- Tokens are stored in plaintext.
error: null
children:
- fqn: AuthService.login
kind: Function
start_line: 20
end_line: 45
analysis: Validates credentials and issues a token.
issues:
- Tokens are stored in plaintext.
error: null
children: []
issues:
- Tokens are stored in plaintext.
errors: []
Inline markdown result¶
When output_file is not set and output_format is markdown, the tool returns a raw
markdown string directly (no status wrapper).
Written to file (status: written)¶
Returned when output_file is set and the write succeeds:
When output_format is yaml or markdown and output_file is set, the rendered content is
written to the file and the returned dict still contains the structured report dict.
Error (status: error)¶
How it works¶
After validating the file and resolving the workspace path, the tool selects one of two strategies:
LSP path (default)¶
- Symbol extraction — an LSP server for the target language returns the file's symbol tree.
- Container name inference — missing parent names are recovered using source range containment.
- Local symbol filtering — when
local_symbols_onlyistrue, symbols defined outside the target file (e.g. imported names) are removed. - Kind filtering — when
excluded_symbol_kindsis non-empty, symbols of matching kinds are dropped. No LLM call is made for excluded symbols. - Bottom-up sort — symbols are ordered by line span (smallest/leaves first, largest/roots last).
- Iterative LLM analysis — each symbol is analysed in turn; the accumulated analyses of all prior symbols are supplied as read-only context, letting container symbols be synthesized from their children's descriptions.
- Post-processing — flat records are re-nested into a tree by FQN hierarchy, issues are deduplicated, and the report is rendered as JSON, YAML, or markdown.
Fast path (fast_path_max_bytes > 0 and file size below threshold)¶
- File read — the full file content is read into memory.
- Single LLM call — the file is forwarded to the LLM with the analysis prompt. The LLM identifies symbols, builds the FQN hierarchy, estimates line ranges, and produces analyses and issues in one response.
- Kind filtering — when
excluded_symbol_kindsis non-empty, matching symbols and their entire subtrees are removed recursively from the LLM's output before any further processing. - Post-processing — the filtered symbol tree is converted to
SymbolRecordobjects (withstart_col/end_colset to0), issues are deduplicated recursively across the tree, and the report is rendered identically to the LSP path.
Filtering by symbol kind¶
Pass excluded_symbol_kinds to skip entire categories of symbols before any LLM call is made.
Matching is case-insensitive. Both the LSP and fast paths honour this parameter.
# Skip properties and module-level variables (LSP path)
aifred-tk semantic_code_analysis --file-path src/models.py --language python \
--prompt "describe what each symbol does" \
--excluded-symbol-kinds Property --excluded-symbol-kinds Variable
When to use this:
- Focus analysis on high-signal symbols (functions, methods, classes) and ignore low-signal ones (properties, constants, type aliases).
- Reduce token usage and LLM cost on files with many trivial fields or constants.
- Avoid noise in the issues list from symbols that are intentionally simple.
Valid kind names (from the LSP SymbolKind spec; fast-path LLM also recognises all of them):
Array, Boolean, Class, Constant, Constructor, Enum, EnumMember, Event,
Field, Function, Interface, Key, Method, Module, Namespace, Null, Number,
Object, Operator, Package, Property, String, Struct, TypeParameter, Variable
On the LSP path, the kind names come directly from the language server's SymbolKind enum and
must match one of the values above. On the fast path the LLM infers kinds; pass the same names
and the filter will match whatever the LLM produces.
Effect on parent/child relationships:
- LSP path — excluded symbols are removed from the flat list before the bottom-up sort. Parent symbols whose children are all excluded will still appear in the report (they are only skipped if their own kind is also excluded).
- Fast path — exclusion is recursive: excluding a parent kind drops the entire subtree.
Supported languages¶
python, typescript, javascript, java, rust, go, kotlin, csharp, ruby, dart
Limits and security¶
.aiignore— before reading any file, the tool checks.aiignorefiles from the target path up to the filesystem root. Matching files are refused immediately.- Size limit — files larger than 1 MB (1,048,576 bytes) are rejected before any LLM call.
- Prompt injection isolation — on the LSP path, the analysis prompt, symbol source, and prior
analyses are each wrapped in XML isolation markers (
<analysis_request>,<symbol_source>,<previously_analyzed>). On the fast path, the prompt and full file source are wrapped in<analysis_request>and<source>respectively. In both cases, embedded instructions inside the file cannot act as agent directives. - Retry behaviour — on the LSP path, when an LLM call fails, the tool retries up to
max_retriestimes. If all attempts fail, that symbol is recorded with anerrorfield and analysis continues for remaining symbols. Failed symbols appear inreport.errors. The fast path makes a single LLM call; pydantic-ai handles validation retries internally. local_symbols_only— defaults totrue. Setting it tofalseincludes imported and re-exported symbols returned by the language server, which may increase token usage significantly. Has no effect on the fast path.excluded_symbol_kinds— kind matching is case-insensitive and applied before any LLM call. On the LSP path, if all symbols are excluded the tool returnsstatus: errorwith a "No symbols found" message. On the fast path the filter is applied after the single LLM call, so excluded kinds still consume tokens during extraction; use the LSP path if pre-filtering matters.- Column positions —
start_colandend_colare always0on the fast path because column positions are not available without LSP. Line numbers are LLM estimates and may be approximate.