In a reasoning test using Arena-Hard, Qwen 2.5-Max achieved 89.4% accuracy, and the result was higher than DeepSeek R1 and when tested on other benchmarks of coding and scientific reasoning, Qwen 2.5 ...
Some results have been hidden because they may be inaccessible to you