Results for "GPT-4"
7 results found

A New Open Source Dataset Aims to Solve AI's Math Reasoning Gap
Researchers at MIT and Columbia University released ATLAS, a dataset of 320,000 autoformalized mathematical statements for training AI reasoning systems.

Study Finds Politeness in AI Prompts Can Impact Model Accuracy
Research reveals that prompt tone significantly influences LLM accuracy. Polite prompts may boost performance while impolite ones degrade it.

AI Benchmark Prompt for GeoGuessr Fails After Model Update
A well-known prompt used to test AI geography skills no longer works on the O3 model, prompting debate about benchmark reliability and model drift.

Antigravity 2.0 Dominates First OpenSCAD 3D LLM Benchmark
Antigravity 2.0 tops the OpenSCAD Architectural 3D LLM Benchmark, demonstrating superior ability to generate valid 3D models from natural language prompts.

AI Bots Fool Nearly Half of Participants in New Online Test
Surfshark's experiment reveals 47% of people can't tell AI bots from humans online. The test challenges users to identify bots in simulated social interactions.

AI IQ site ignites debate by scoring large language models on the bell curve
A startup called AI IQ is assigning IQ scores to over 50 AI models. The project draws praise for clarity and criticism for oversimplifying machine intelligence.

SpaceX Acquires xAI, Declares AI Its Core Business Ahead of IPO
SpaceX's IPO filing reveals AI as its primary market, projecting $26.5 trillion opportunity. The company positioned Grok against OpenAI and Anthropic.