I hit a weird bug: the model’s “thinking” process starts correctly, but llama.cpp seems to “eat” the opening <think> tag. You only get the closing </think> at the very end. Because of this, Open WebUI fails to collapse the reasoning block, and the UI looks messy.

The Solution
After digging into the threads, I found that the internal chat template wasn’t being picked up correctly. You just need to explicitly point llama-server to the Jinja template file.
The Fix: Add the --chat-template-file flag to your startup command.
Here is the working command I ended up using (built from the latest llama.cpp master branch):
numactl --interleave=all llama-server \ -m <your_model_path> \ -t 32 \ --flash-attention on \ --no-mmap \ --chat-template-file models/templates/deepseek-ai-DeepSeek-V3.2.jinja \ --host 0.0.0.0 \ --port 8000
Now the opening tag is preserved, and Open WebUI correctly collapses the “thought” block.
Hope this saves someone some debugging time if you run into the same issue!