Claude Code MCP Integration

LLM Checker includes a built-in Model Context Protocol (MCP) server. Once connected, Claude Code can detect your hardware, rank models, manage Ollama, run benchmarks, and execute any LLM Checker command — without leaving the Claude interface.

What the MCP Server Provides

The MCP server exposes the full LLM Checker surface as structured tools that Claude can call autonomously:

Hardware detection and tier analysis
Model compatibility scoring and ranked recommendations
Ollama model management (list, pull, run, remove)
Benchmarking, comparison, and optimization tools
Policy validation and audit export
Calibration artifact generation
Direct CLI execution for any allowlisted command

Setup

Global install (recommended)
npx (no global install)

Install LLM Checker globally

npm install -g llm-checker

Add to Claude Code

claude mcp add llm-checker -- llm-checker-mcp

Restart Claude Code

Restart Claude Code to load the new MCP server. You are ready to go.

Add to Claude Code with npx

No global install required. npx downloads and runs the MCP server on demand:

claude mcp add llm-checker -- npx llm-checker-mcp

Restart Claude Code

Restart Claude Code to load the new MCP server.

Generate the Setup Command

If you want the exact command for your environment printed to stdout (for scripting or manual config file editing), run:

llm-checker mcp-setup

This prints the claude mcp add command and the corresponding JSON config snippet. Useful flags:

Flag	Effect
`--apply`	Run the setup command automatically
`--json`	Output config as JSON only
`--npx`	Use npx transport instead of global binary

Available MCP Tools

Core Analysis

hw_detect

Detect your hardware — CPU, GPU, RAM, and acceleration backend (Metal, CUDA, ROCm, CPU).

check

Full compatibility analysis with all models ranked by score.

recommend

Top model picks by category: coding, reasoning, multimodal, and more.

installed

Rank your already-downloaded Ollama models by compatibility score.

search

Search the Ollama model catalog with filters for family, quantization, size, and use-case.

smart_recommend

Advanced recommendations using the full 4D scoring engine.

ollama_plan

Build a capacity plan for local models with recommended NUM_CTX, NUM_PARALLEL, and memory settings.

ollama_plan_env

Return ready-to-paste export ... env vars from the recommended or fallback plan profile.

policy_validate

Validate a policy file against the v1 schema and return structured validation output.

audit_export

Run policy compliance export (json/csv/sarif/all) for check or recommend flows.

calibrate

Generate calibration artifacts from a prompt suite with typed MCP inputs.

Ollama Management

ollama_list

List all downloaded models with params, quantization, family, and size.

ollama_pull

Download a model from the Ollama registry.

ollama_run

Run a prompt against a local model and receive tok/s metrics alongside the response.

ollama_remove

Delete a model to free disk space.

Advanced (MCP-exclusive)

These tools are only available through the MCP server and have no direct CLI equivalent.

ollama_optimize

Generate optimal Ollama env vars for your hardware — NUM_GPU, NUM_PARALLEL, FLASH_ATTENTION, and more.

benchmark

Benchmark a model with 3 standardized prompts, measuring tok/s, load time, and prompt eval.

compare_models

Head-to-head comparison of two models on the same prompt with speed and response side-by-side.

cleanup_models

Analyze installed models — find redundancies, cloud-only models, oversized models, and upgrade candidates.

project_recommend

Scan a project directory (languages, frameworks, size) and recommend the best model for that codebase.

ollama_monitor

Real-time system status: RAM usage, loaded models, and memory headroom analysis.

cli_help

List all allowlisted CLI commands exposed through MCP.

cli_exec

Execute any allowlisted llm-checker CLI command with custom args (policy, audit, calibrate, sync, ai-run, etc.).

Example Claude Prompts

After setup, ask Claude things like:

Hardware and model selection

“What’s the best coding model for my hardware?”
“What model should I use for this Rust project?”
“Do you see both my iGPU and dGPU?”

Benchmarking and comparison

“Benchmark qwen2.5-coder and show me the tok/s”
“Compare llama3.2 vs codellama for coding tasks”

Ollama management

“Clean up my Ollama — what should I remove?”
“Optimize my Ollama config for maximum performance”
“How much RAM is Ollama using right now?”

Claude will automatically call the right tools and return actionable results.

Documentation Index

​What the MCP Server Provides

​Setup

​Generate the Setup Command

​Available MCP Tools

​Core Analysis