frontiermath news

About 13,800,000 results

Open links in new tab

Past 24 hours

arstechnica.com
https://arstechnica.com/ai/2024/11/new-secret-math-benchmark...
New secret math benchmark stumps AI models and PhDs alike
2 days ago · Epoch AI says it developed FrontierMath through collaboration with over 60 mathematicians from leading institutions. The problems underwent peer review to verify correctness and check for ambiguities.
phys.org
https://phys.org/news/2024-11-ai-hard-math-problems-poorly.html
Testing AI systems on hard math problems shows they still …
2 days ago · Testing thus far has demonstrated the difficulty found in FrontierMath. AIs that have scored well on traditional benchmarks have not been able to score any higher than 2%.
venturebeat.com
https://venturebeat.com/ai/ais-math-problem...
AI’s math problem: FrontierMath benchmark shows how far …
3 days ago · FrontierMath, a new benchmark from Epoch AI, challenges advanced AI systems with complex math problems, revealing how far AI still has to go before achieving true human-level reasoning.
epochai.org
https://epochai.org/frontiermath
FrontierMath | Epoch AI
Created in collaboration with over 60 mathematicians, FrontierMath spans the full spectrum of modern mathematics, from algebraic geometry to Zermelo–Fraenkel set theory. Impressions of our research-level problems
arxiv.org
https://arxiv.org/abs/2411.04872
[2411.04872] FrontierMath: A Benchmark for Evaluating Advanced ...
6 days ago · FrontierMath uses new, unpublished problems and automated verification to reliably evaluate models while minimizing risk of data contamination. Current state-of-the-art AI models solve under 2% of problems, revealing a vast gap between AI capabilities and the prowess of the mathematical community.
gadgets360.com
https://www.gadgets360.com/ai/news/epoch-ai...
Epoch AI Launches FrontierMath AI Benchmark to Test …
2 days ago · FrontierMath solves the problem by including new problems that are unique and have not been published anywhere, mitigating the risks associated with data contamination. Further, the benchmark includes a wide range of questions including computationally intensive problems in number theory, real analysis, and algebraic geometry, as well as topics ...
epochai.org
https://epochai.org/frontiermath/the-benchmark
FrontierMath: Evaluating Advanced Mathematical Reasoning in AI …
6 days ago · FrontierMath: a new benchmark of expert-level math problems designed to measure AI’s mathematical abilities. See how leading AI models perform against the collective mathematics community.
arxiv.org
https://arxiv.org/html/2411.04872
FrontierMath: A Benchmark for Evaluating Advanced …
FrontierMath uses new, unpublished problems and automated verification to reliably evaluate models while minimizing risk of data contamination. Current state-of-the-art AI models solve under 2% of problems, revealing a vast gap between AI capabilities and the prowess of the mathematical community.
arxiv.org
https://arxiv.org/pdf/2411.04872
[PDF]
FRONTIERM : A BENCHMARK FOR EVALUATING ADVANCED …
6 days ago · We introduce FrontierMath, a benchmark of hundreds of original, exceptionally challenging math-ematics problems crafted and vetted by expert mathematicians. The questions cover most major branches of modern mathematics—from computationally intensive problems in number theory and
ycombinator.com
https://news.ycombinator.com/item?id=42094546
FrontierMath: A benchmark for evaluating advanced ... - Hacker News
5 days ago · FrontierMath uses new, unpublished problems and automated verification to reliably evaluate models while minimizing risk of data contamination. Current state-of-the-art AI models solve under 2% of problems, revealing a vast gap between AI capabilities and the prowess of the mathematical community.
Some results have been removed
Pagination
- 1
- 2
- 3
- 4
- 5
- Next

New secret math benchmark stumps AI models and PhDs alike

Testing AI systems on hard math problems shows they still …

AI’s math problem: FrontierMath benchmark shows how far …

FrontierMath | Epoch AI

[2411.04872] FrontierMath: A Benchmark for Evaluating Advanced ...

Epoch AI Launches FrontierMath AI Benchmark to Test …

FrontierMath: Evaluating Advanced Mathematical Reasoning in AI …

FrontierMath: A Benchmark for Evaluating Advanced …

FRONTIERM : A BENCHMARK FOR EVALUATING ADVANCED …

FrontierMath: A benchmark for evaluating advanced ... - Hacker News