DeepSeek V3.2 on CPU: Fixing the “Missing Opening Tag” Glitch

I hit a weird bug: the model’s “thinking” process starts correctly, but llama.cpp seems to “eat” the opening <think> tag. You only get the closing </think> at the very end. Because of this, Open WebUI fails to collapse the reasoning block, and the UI looks messy.

The Solution

After digging into the threads, I found that the internal chat template wasn’t being picked up correctly. You just need to explicitly point llama-server to the Jinja template file.

The Fix: Add the --chat-template-file flag to your startup command.

Here is the working command I ended up using (built from the latest llama.cpp master branch):

numactl --interleave=all llama-server \
  -m <your_model_path> \
  -t 32 \
  --flash-attention on \
  --no-mmap \
  --chat-template-file models/templates/deepseek-ai-DeepSeek-V3.2.jinja \
  --host 0.0.0.0 \
  --port 8000

Now the opening tag is preserved, and Open WebUI correctly collapses the “thought” block.

Hope this saves someone some debugging time if you run into the same issue!

Leave a Comment