Local AI code completion in Neovim with LMStudio and minuet-ai.nvim

◑

Local AI code completion in Neovim with LMStudio and minuet-ai.nvim | Jason Ngo

I had Copilot set up in Neovim for a while. It worked fine but I wanted something that didn't phone home, didn't require a subscription and would still work without internet. LMStudio runs models locally and exposes an OpenAI-compatible API, which makes it straightforward to wire up to Neovim's completion pipeline.

Here's the setup I landed on.

What you need

LMStudio installed and running locally
A FIM-capable model loaded in LMStudio (I use qwen2.5-coder, which has good FIM support and runs comfortably on a MacBook)
minuet-ai.nvim for the completion provider
blink.cmp as the completion engine

FIM (fill-in-the-middle) matters here. Chat models generate text from the end of a prompt. FIM models are trained to complete code given both the prefix and suffix around the cursor, which is what inline completion actually needs. Not every model LMStudio supports is FIM-capable, so check before picking one.

Detecting LMStudio at startup

The first thing the config does is check whether LMStudio is actually running. If it's not, minuet never loads and the completion pipeline falls back to LSP and buffer sources as normal.

local lmstudio_model = nil
pcall(function()
    local out = vim.fn.system('curl -s --connect-timeout 1 http://localhost:1234/v1/models 2>/dev/null')
    if vim.v.shell_error ~= 0 then return end
    local data = vim.json.decode(out)
    if data and data.data and data.data[1] then lmstudio_model = data.data[1].id end
end)
local lmstudio_running = lmstudio_model ~= nil

pcall means a failed curl (LMStudio not running, no network, anything) doesn't crash Neovim startup. The --connect-timeout 1 keeps it from stalling for several seconds if nothing is listening. If LMStudio is up, lmstudio_model gets set to the first loaded model's ID, which gets passed directly to minuet later.

Conditionally loading minuet

With vim.pack, plugins are just conditionally added to the spec list:

local pack_specs = {
    { src = 'https://github.com/saghen/blink.cmp', version = vim.version.range('*') },
    -- other lsp/completion plugins
}
 
if lmstudio_running then
    table.insert(pack_specs, { src = 'https://github.com/milanglacier/minuet-ai.nvim' })
end
 
vim.pack.add(pack_specs)

minuet only gets installed or loaded if LMStudio was detected. On machines where I don't run LMStudio, nothing changes.

Wiring minuet into blink.cmp

blink.cmp sources are configured as a list. minuet gets inserted at position 1 when running so it appears first in completions:

local blink_default_sources = { 'lazydev', 'lsp', 'dadbod', 'path', 'snippets', 'buffer' }
if lmstudio_running then table.insert(blink_default_sources, 1, 'minuet') end

In the blink.cmp sources config:

sources = {
    default = blink_default_sources,
    providers = {
        minuet = {
            name = 'minuet',
            module = 'minuet.blink',
            async = true,
            timeout_ms = 3000,
            score_offset = 50,
        },
    },
},

score_offset = 50 pushes minuet completions above LSP suggestions (which sit at 0). async = true means the completion menu doesn't wait for minuet before showing LSP results. timeout_ms = 3000 is the cutoff. If the model hasn't responded in 3 seconds the suggestion is dropped.

minuet config

require('minuet').setup({
    provider = 'openai_fim_compatible',
    n_completions = 5,
    context_window = 1024,
    provider_options = {
        openai_fim_compatible = {
            api_key = 'TERM',
            name = 'LMStudio',
            end_point = 'http://localhost:1234/v1/completions',
            model = lmstudio_model,
            optional = {
                max_tokens = 56,
                top_p = 0.9,
            },
        },

A few things worth noting:

api_key = 'TERM' - LMStudio doesn't require authentication but the field is mandatory in the provider spec. Any non-empty string works.

context_window = 1024 and max_tokens = 56 are intentionally small. Larger values make completions slower. For inline suggestions you don't need 4096 tokens of context or long outputs. You're completing a line or a block, not generating a function from scratch.

model = lmstudio_model uses the ID we detected at startup rather than hardcoding a model name. If you swap models in LMStudio, the config picks it up automatically.

How it feels in practice

Completions appear as a blink.cmp entry alongside LSP suggestions. Because it's async, the menu opens immediately with LSP results and minuet suggestions drop in once the model responds. Latency depends on the model and your hardware. On an M-series Mac with qwen2.5-coder-1.5b it's fast enough to not notice. Larger models are better but slower.

The main difference from Copilot is that it doesn't ghost-text complete entire lines automatically. It sits in the completion menu and you accept like any other suggestion. If you want the ghost-text experience you can configure that in minuet, but I prefer the menu approach since it stays consistent with how LSP completions work. The one thing I do miss from VS Code is Copilot's Next Edit Suggestions, which predicts your next edit across the file rather than just completing at the cursor. I haven't found a local equivalent for that yet.

Compared to Copilot

Copilot's ghost text is more fluid. The suggestions are better, especially for boilerplate. But it requires a subscription, sends your code to GitHub's servers and stops working if you're offline or the service is down.

The local setup trades suggestion quality for privacy and reliability. For most of my day-to-day editing the quality gap doesn't matter. For greenfield boilerplate I still reach for Claude Code rather than inline completion anyway.