Results for "AI reliability"
154 results found

Why Companies Are Quietly Bringing Back Workers After AI Replacements
After replacing staff with AI, many firms are now rehiring humans to fix errors and ensure safe, reliable operations. Human oversight is proving essential.

AI Benchmark Prompt for GeoGuessr Fails After Model Update
A well-known prompt used to test AI geography skills no longer works on the O3 model, prompting debate about benchmark reliability and model drift.

Google AI search now pulls expert advice from Reddit
Google's AI-powered search results will now include Reddit posts as expert sources. The change aims to improve answer quality but raises questions about content reliability.

AI coding boom creates production chaos, Resolve AI launches multi-agent fix
Resolve AI expands its platform with multi-agent investigation to tackle production failures caused by rapid AI code generation. The system uses coordinated agents that verify each other's findings.

Starbucks Drops Faulty AI Inventory System That Failed to Count
Starbucks scrapped an AI inventory tool after it repeatedly miscounted stock. The system’s failure highlights challenges in retail automation.

Google's AI Still Struggles to Spell Its Own Name
Google's latest AI models continue to fail at basic spelling, even for the company's own name. The issue highlights deeper limitations in how large language models process text.

Amazon Claims Major Advance in Data Center Speed for AI Workloads
Amazon says its new networking technology dramatically accelerates data flow in its cloud data centers, solving a key bottleneck for AI training and other intensive workloads.

Orbital AI Data Centers Face Months-Long Outage Risks, Experts Warn
Hyperscalers eye space-based AI compute, but experts flag severe operational risks including months-long outages due to physical access limits and radiation.

Google's Gemini Leaks Its Own System Prompt in User Chat
A user discovered that Google's Gemini AI revealed its internal system prompt during a conversation, raising questions about AI transparency and safety.

Grok's Government Adoption Lags, Undermining xAI's Growth Story
Grok appears in only 3 of 400+ government AI use cases per Reuters. The low adoption undercuts xAI's growth story tied to a potential massive SpaceX IPO.

Antigravity 2.0 Dominates First OpenSCAD 3D LLM Benchmark
Antigravity 2.0 tops the OpenSCAD Architectural 3D LLM Benchmark, demonstrating superior ability to generate valid 3D models from natural language prompts.

Tech Lobbying Weakens Climate Rules for Data Centers
Tech companies lobbied to kill stricter clean energy rules for gas-powered data centers, weakening climate pledges.

AI Coding Benchmarks Overlook Long-Term Code Health Risks
Current AI coding benchmarks measure one-shot performance but ignore quality erosion from repeated edits. This oversight could lead to unmaintainable codebases at scale.

OpenClaw AI Agent Steps Into the Physical World With a Robot Body
An AI coding agent named OpenClaw has been given a physical robot body, demonstrating how AI models can simplify robot building and deployment.

Google's Gemini 3.5 Flash Reshapes Enterprise AI Cost Equation
Google claims its new Gemini 3.5 Flash model can save enterprises over $1 billion annually by delivering near-frontier performance at triple the speed and half the cost.

AI IQ site ignites debate by scoring large language models on the bell curve
A startup called AI IQ is assigning IQ scores to over 50 AI models. The project draws praise for clarity and criticism for oversimplifying machine intelligence.

Anthropic Surpasses OpenAI in Corporate AI Adoption for First Time
Anthropic's Claude overtakes OpenAI's ChatGPT in business AI adoption. But escalating costs and competition threaten its lead.

Why Autonomous AI Fails Without a Body-Like Feedback System
AI systems that rely on pure autonomy often fail. A new framework compares AI to the human body, arguing that feedback loops build trust.

Salesforce Turns Slackbot Into a Full AI Agent for the Enterprise
Salesforce rebuilt Slackbot from a simple notification tool into an AI agent that searches data, drafts documents and takes actions, intensifying workplace AI competition.

AI demand forces a fundamental shift in enterprise data center strategy
Rising AI workloads are pushing companies to rethink infrastructure, moving from general-purpose servers to specialized GPU clusters and liquid-cooled data centers.