Results for "evaluation methods"

19 results found

DeepMind Veteran Warns AI Benchmarks Are Not Enough

A former DeepMind researcher warns that current benchmarks fail to ensure AI safety. The call for new evaluation methods comes as AI systems grow more powerful.

May 22, 20263 min read

AI / Machine Learning

AI therapy startup claims 95% safety score in mental health benchmark

The Path claims its AI model scored 95 on the Vera-MH safety benchmark, far above rivals like ChatGPT. The startup was co-founded by Tony Robbins and Calm veterans.

May 21, 20263 min read

Big Tech

How Pull Requests Are Replacing Whiteboards in Tech Hiring

A growing number of tech companies are replacing traditional whiteboard interviews with real-world coding tasks using pull requests. This shift aims to evaluate candidates more fairly and accurately.

May 21, 20263 min read

AI / Machine Learning

Open-source coding model NousCoder-14B matches big rivals in just 4 days

An open-source AI coding model trained in four days matches proprietary systems, highlighting the rapid progress of open-source alternatives in AI-assisted software development.

May 19, 20262 min read

AI / Machine Learning

AI Coding Benchmarks Overlook Long-Term Code Health Risks

Current AI coding benchmarks measure one-shot performance but ignore quality erosion from repeated edits. This oversight could lead to unmaintainable codebases at scale.

May 21, 20263 min read

AI / Machine Learning

Antigravity 2.0 Dominates First OpenSCAD 3D LLM Benchmark

Antigravity 2.0 tops the OpenSCAD Architectural 3D LLM Benchmark, demonstrating superior ability to generate valid 3D models from natural language prompts.

May 22, 20263 min read

AI / Machine Learning

AI Benchmark Prompt for GeoGuessr Fails After Model Update

A well-known prompt used to test AI geography skills no longer works on the O3 model, prompting debate about benchmark reliability and model drift.

May 21, 20262 min read

Tech Policy & Regulation

France Leads EU's Charge Away From US Tech Giants

France is replacing Zoom and Microsoft Teams with homegrown tools, and other EU countries are following. The Trump-era push for digital sovereignty is reshaping Europe's tech landscape.

May 21, 20263 min read

AI / Machine Learning

AI coding boom creates production chaos, Resolve AI launches multi-agent fix

Resolve AI expands its platform with multi-agent investigation to tackle production failures caused by rapid AI code generation. The system uses coordinated agents that verify each other's findings.

May 21, 20263 min read

Startups / Funding

Mercury Hits $5.2B Valuation as Fintech Startup Pursues Own Banking License

Mercury raised $200M at a $5.2B valuation and secured regulatory approval to establish its own bank. The digital banking startup serves over 300,000 companies and reported $650M in annualized revenue.

May 20, 20263 min read

Startups / Funding

Fresha Reaches $1 Billion Valuation With KKR Investment

Beauty booking platform Fresha secured $80M from KKR, pushing its valuation to $1 billion. The funding underscores growth in service marketplace tech.

May 22, 20263 min read

Startups / Funding

Secretive AI Startup Hark Raises $700M at $6 Billion Valuation

Hark, Brett Adcock's stealth AI startup, raised a massive $700M Series A, valuing the 'universal' interface company at $6 billion.

May 21, 20262 min read

Startups / Funding

General Catalyst bets $63M on India's travel payments startup Scapia

General Catalyst leads $63 million funding round in Scapia, an Indian travel booking and payments startup. The investment doubles the company's valuation.

May 21, 20262 min read

Startups / Funding

Typewise Hires AI Growth Engineer as Startup Expands Reach

Typewise, the YC-backed keyboard startup, is hiring an AI Growth Engineer for Zurich or remote. The move signals a push to integrate AI into growth and product development.

May 21, 20262 min read

Startups / Funding

Quantum Physics, AI Join Forces to Supercharge Enzyme Engineering

Imperagen raises £5 million to blend quantum physics simulations with AI for faster, more precise enzyme design, aiming to green industrial processes.

May 21, 20262 min read

Startups / Funding

China Robotics Investment Hits Record as Embodied AI Startups Attract Billions

China-based robotics startups raised $5.6 billion through mid-2026, matching the 2021 peak. Embodied AI companies drive the surge, with several startups reaching billion-dollar valuations.

May 20, 20263 min read

Startups / Funding

SoftBank CEO's Recent Bets Raise Alarm Among Executives

Insiders at SoftBank worry that Masayoshi Son's recent investment decisions signal a losing streak. The CEO known for bold bets may be overpaying for deals.

May 20, 20263 min read

AI / Machine Learning

Why Autonomous AI Fails Without a Body-Like Feedback System

AI systems that rely on pure autonomy often fail. A new framework compares AI to the human body, arguing that feedback loops build trust.

May 19, 20262 min read

AI / Machine Learning

Google’s Gemini Voice Push Redefines How We Talk to AI

Google is leaning into voice interaction with Gemini, encouraging users to speak naturally. The shift capitalizes on voice dictation’s popularity and aims to make AI conversations feel human.

May 21, 20263 min read