I mentioned the 9B before, and it's a really popular model, not as thin as the 4B and still fairly fast. Great model. Now, the numbers as people see them vary in a wild range, so I ran metrics on the default model vs the Instruct model.
To convert a model to instruct, insert in the chat_template.jinja this line at the top:
{%- set enable_thinking = false %}
And then, you get these metrics
arc arc/e boolq hswag obkqa piqa wino
mxfp8 0.417,0.458,0.623,0.634,0.338,0.737,0.639
Qwen3.5-9B-Instruct
mxfp8 0.571,0.719,0.895,0.683,0.426,0.770,0.671
Models based on Qwen3.5-9B
nightmedia/Qwen3.5-9B-Text
mxfp8 0.419,0.460,0.623,0.634,0.338,0.738,0.639
DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-INSTRUCT
mxfp8 0.574,0.729,0.882,0.711,0.422,0.775,0.691
DavidAU/Qwen3.5-9B-Claude-4.6-HighIQ-INSTRUCT-HERETIC-UNCENSORED
mxfp8 0.574,0.755,0.869,0.714,0.410,0.780,0.691
DavidAU/Qwen3.5-9B-Claude-Pro-Auto-Variable-INSTRUCT
mxfp8 0.610,0.816,0.885,0.665,0.456,0.768,0.676
Qwen3.5-9B-Claude-Opus-Sonnet-Pro-Auto-Variable-HERETIC-UNCENSORED-INSTRUCT
mxfp8 0.624,0.820,0.886,0.663,0.442,0.763,0.681
DavidAU/Qwen3.5-9B-Polaris-HighIQ-INSTRUCT
mxfp8 0.624,0.828,0.891,0.656,0.442,0.768,0.680
You will see me changing the chat templates on these two models until I can figure out why sometimes a MoE increases in performance with a bad chat template(yes, this happens)
It becomes obvious that a thinking model underperforms the instruct. I am sure someone has a good explanation for this, because all the new Qwens do in the thinking tag is looping. Now they don't.
These are both Instruct models
https://huggingface.co/nightmedia/Qwen3.5-35B-A3B-Engineer-qx64-hi-mlx
https://huggingface.co/nightmedia/Qwen3.5-35B-A3B-Holodeck-qx86-hi-mlx
