Results for "AI coding benchmarks"
104 results found

AI Coding Benchmarks Overlook Long-Term Code Health Risks
Current AI coding benchmarks measure one-shot performance but ignore quality erosion from repeated edits. This oversight could lead to unmaintainable codebases at scale.

AI coding boom creates production chaos, Resolve AI launches multi-agent fix
Resolve AI expands its platform with multi-agent investigation to tackle production failures caused by rapid AI code generation. The system uses coordinated agents that verify each other's findings.

Cerebras wafer-scale chip runs trillion-parameter model 7x faster than GPU clouds
Cerebras claims its wafer-scale chip runs a trillion-parameter AI model nearly seven times faster than GPU-based clouds, challenging Nvidia's dominance in inference.

Antigravity 2.0 Dominates First OpenSCAD 3D LLM Benchmark
Antigravity 2.0 tops the OpenSCAD Architectural 3D LLM Benchmark, demonstrating superior ability to generate valid 3D models from natural language prompts.

Open-source coding model NousCoder-14B matches big rivals in just 4 days
An open-source AI coding model trained in four days matches proprietary systems, highlighting the rapid progress of open-source alternatives in AI-assisted software development.

AI therapy startup claims 95% safety score in mental health benchmark
The Path claims its AI model scored 95 on the Vera-MH safety benchmark, far above rivals like ChatGPT. The startup was co-founded by Tony Robbins and Calm veterans.

CC-Wiki turns AI coding sessions into searchable team knowledge bases
A new open-source tool, CC-Wiki, lets developers save and share Claude Code sessions as a wiki. It aims to solve the problem of lost context in AI-assisted coding workflows.

Software Engineering Faces a Defining Moment as AI Reshapes the Field
The software engineering profession is at a crossroads. AI coding assistants and market pressures are redefining roles, creating both opportunities and existential questions for developers.

OpenClaw AI Agent Steps Into the Physical World With a Robot Body
An AI coding agent named OpenClaw has been given a physical robot body, demonstrating how AI models can simplify robot building and deployment.

AI-Powered Web App Builders Create Security Risks for Development Teams
AI-powered web app builders speed up development but introduce serious security risks. Many teams skip proper review, leaving vulnerable code in production.

AI Tools Boost Skilled Workers More Than Novices, Studies Show
AI amplifies the productivity of experienced workers, widening the skill gap. Research indicates that technical expertise determines who benefits most from AI assistants.

Anthropic Surpasses OpenAI in Corporate AI Adoption for First Time
Anthropic's Claude overtakes OpenAI's ChatGPT in business AI adoption. But escalating costs and competition threaten its lead.

New AI Architecture Separates Prompts and Reasoning Into Parallel Streams
Researchers propose Multi-Stream LLMs, splitting prompts, thinking and I/O into parallel processes to boost efficiency and reduce latency.

DeepSeek locks in lower pricing for V4 Pro as AI competition heats up
DeepSeek makes its V4 Pro price cut permanent, strategically reducing costs to lure developers and challenge rivals in the fast-moving AI market.

Microsoft Agent 365 arrives as enterprises face shadow AI security threat
Microsoft's new agent management platform goes live amid rising risks from ungoverned AI agents in enterprises.

AI Pricing Models Face a Hard Reset
The era of cheap AI access is ending. Providers are shifting from subsidized pricing to sustainable models, forcing developers and businesses to adapt.

Google Brings Gemini AI to Smartwatches With Wear OS 7
Google's Wear OS 7 integrates Gemini AI and a new widget system for smarter, faster smartwatch interactions. The update targets improved notifications and app fluidity.

Google Adds AI Detection to Circle to Search Feature
Google is rolling out AI detection to its Circle to Search tool, letting users see if an image was generated or edited by AI. The feature uses metadata and content credentials.

Google Search's AI Overhaul Raises Alarms About Internet Quality
Google's shift to AI-powered search threatens web traffic and content quality.

Google's Gemini 3.5 Flash Reshapes Enterprise AI Cost Equation
Google claims its new Gemini 3.5 Flash model can save enterprises over $1 billion annually by delivering near-frontier performance at triple the speed and half the cost.