What Is Math Reasoning and Modeling

AI Models Still Struggle With Reasoning — And Here’s Why

Forbes contributors publish independent expert analyses and insights. I write about the economics of AI. What looks like intelligence in AI models may just be memorization. A closer look at benchmarks ...

Earth.com

AI struggles with simple math when distracted

Adding one irrelevant sentence to math problems causes AI systems to make confident mistakes over 300 percent more.

6don MSN

AI models are starting to crack high-level math problems

“I was curious to establish a baseline for when LLMs are effectively able to solve open math problems compared to where they ...

ExtremeTech

Microsoft's Phi-4-Reasoning Models Bring AI Math and Logic Skills to Smaller Devices

Microsoft has introduced a new set of small language models called Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning, which are described as "marking a new era for efficient AI." These ...

Business Insider

This DeepSeek demo shows how good the Chinese AI model is at math and reasoning

You're currently following this author! Want to unfollow? Unsubscribe via the link in your email. Follow Alistair Barr Every time Alistair publishes a story, you’ll get an alert straight to your inbox ...

ExtremeTech

Microsoft Unveils Phi-4: New AI Model for Mathematical Reasoning

Phi-4 will compete with other small models such as GPT-4o mini, Gemini 2.0 Flash, and Claude 3.5 Haiku. Share on Facebook (opens in a new window) Share on X (opens in a new window) Share on Reddit ...

Analytics Insight

Why Large Language Models Can't Always Solve Math Problems

Overview: Large Language Models predict text; they do not truly calculate or verify math.High scores on known Datasets do not ...

NextBigFuture

OpenAI o1 Model Sets New Math and Complex Reasoning Records

OpenAI o1 is a new large language model trained with reinforcement learning to perform complex reasoning. o1 thinks before it answers—it can produce a long internal chain of thought before responding ...

EurekAlert!

MathEval: a comprehensive benchmark for evaluating large language models on mathematical reasoning capabilities

This study introduces MathEval, a comprehensive benchmarking framework designed to systematically evaluate the mathematical reasoning capabilities of large language models (LLMs). Addressing key ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results