Results for "AI benchmark"

115 results found

AI Benchmark Prompt for GeoGuessr Fails After Model Update

A well-known prompt used to test AI geography skills no longer works on the O3 model, prompting debate about benchmark reliability and model drift.

May 21, 20262 min read

AI / Machine Learning

DeepMind Veteran Warns AI Benchmarks Are Not Enough

A former DeepMind researcher warns that current benchmarks fail to ensure AI safety. The call for new evaluation methods comes as AI systems grow more powerful.

May 22, 20263 min read

AI / Machine Learning

AI Coding Benchmarks Overlook Long-Term Code Health Risks

Current AI coding benchmarks measure one-shot performance but ignore quality erosion from repeated edits. This oversight could lead to unmaintainable codebases at scale.

May 21, 20263 min read

AI / Machine Learning

AI therapy startup claims 95% safety score in mental health benchmark

The Path claims its AI model scored 95 on the Vera-MH safety benchmark, far above rivals like ChatGPT. The startup was co-founded by Tony Robbins and Calm veterans.

May 21, 20263 min read

AI / Machine Learning

Antigravity 2.0 Dominates First OpenSCAD 3D LLM Benchmark

Antigravity 2.0 tops the OpenSCAD Architectural 3D LLM Benchmark, demonstrating superior ability to generate valid 3D models from natural language prompts.

May 22, 20263 min read

AI / Machine Learning

Google's Gemini 3.5 Flash Reshapes Enterprise AI Cost Equation

Google claims its new Gemini 3.5 Flash model can save enterprises over $1 billion annually by delivering near-frontier performance at triple the speed and half the cost.

May 20, 20262 min read

AI / Machine Learning

AI IQ site ignites debate by scoring large language models on the bell curve

A startup called AI IQ is assigning IQ scores to over 50 AI models. The project draws praise for clarity and criticism for oversimplifying machine intelligence.

May 20, 20262 min read

AI / Machine Learning

AI Bots Fool Nearly Half of Participants in New Online Test

Surfshark's experiment reveals 47% of people can't tell AI bots from humans online. The test challenges users to identify bots in simulated social interactions.

May 25, 20262 min read

AI / Machine Learning

AI coding boom creates production chaos, Resolve AI launches multi-agent fix

Resolve AI expands its platform with multi-agent investigation to tackle production failures caused by rapid AI code generation. The system uses coordinated agents that verify each other's findings.

May 21, 20263 min read

AI / Machine Learning

SpaceX Acquires xAI, Declares AI Its Core Business Ahead of IPO

SpaceX's IPO filing reveals AI as its primary market, projecting $26.5 trillion opportunity. The company positioned Grok against OpenAI and Anthropic.

May 21, 20262 min read

AI / Machine Learning

Enterprises stuck in AI's 'chat phase' as gap between insight and action widens

Many enterprises use AI only for chat and queries, failing to translate insights into business outcomes. A shift toward integrated execution is critical.

May 27, 20263 min read

AI / Machine Learning

Cerebras wafer-scale chip runs trillion-parameter model 7x faster than GPU clouds

Cerebras claims its wafer-scale chip runs a trillion-parameter AI model nearly seven times faster than GPU-based clouds, challenging Nvidia's dominance in inference.

May 20, 20263 min read

AI / Machine Learning

Open-source coding model NousCoder-14B matches big rivals in just 4 days

An open-source AI coding model trained in four days matches proprietary systems, highlighting the rapid progress of open-source alternatives in AI-assisted software development.

May 19, 20262 min read

AI / Machine Learning

OpenClaw AI Agent Steps Into the Physical World With a Robot Body

An AI coding agent named OpenClaw has been given a physical robot body, demonstrating how AI models can simplify robot building and deployment.

May 20, 20262 min read

AI / Machine Learning

Anthropic Surpasses OpenAI in Corporate AI Adoption for First Time

Anthropic's Claude overtakes OpenAI's ChatGPT in business AI adoption. But escalating costs and competition threaten its lead.

May 20, 20262 min read

AI / Machine Learning

Why Autonomous AI Fails Without a Body-Like Feedback System

AI systems that rely on pure autonomy often fail. A new framework compares AI to the human body, arguing that feedback loops build trust.

May 19, 20262 min read

AI / Machine Learning

Salesforce Turns Slackbot Into a Full AI Agent for the Enterprise

Salesforce rebuilt Slackbot from a simple notification tool into an AI agent that searches data, drafts documents and takes actions, intensifying workplace AI competition.

May 19, 20262 min read

Big Tech

AI demand forces a fundamental shift in enterprise data center strategy

Rising AI workloads are pushing companies to rethink infrastructure, moving from general-purpose servers to specialized GPU clusters and liquid-cooled data centers.

May 21, 20263 min read

AI / Machine Learning

AI Outpaces Human Patching, Making Vulnerability Windows Obsolete

AI-powered bug detection finds vulnerabilities faster than humans can patch. The industry shifts from reactive patching to building resilient software from the start.

May 21, 20263 min read

AI / Machine Learning

Anthropic Nears First Profit as AI Race Intensifies

Anthropic is set to report its first profitable quarter since founding in 2021, marking a milestone in the competitive AI landscape.

May 21, 20262 min read