Results for "inference optimization"
39 results found

New Technique Losslessly Compresses KV Cache Up to 4x for Faster AI Inference
Speculative KV coding compresses key-value cache up to 4x without loss, potentially cutting memory costs and enabling larger models on existing hardware.

Cerebras wafer-scale chip runs trillion-parameter model 7x faster than GPU clouds
Cerebras claims its wafer-scale chip runs a trillion-parameter AI model nearly seven times faster than GPU-based clouds, challenging Nvidia's dominance in inference.

AI Pricing Models Face a Hard Reset
The era of cheap AI access is ending. Providers are shifting from subsidized pricing to sustainable models, forcing developers and businesses to adapt.

Tampering Threats Emerge for Encrypted AI Reasoning Systems
Privacy-preserving AI models that process encrypted data may be vulnerable to undetectable manipulation, researchers warn. The finding challenges assumptions about security in confidential computing.

Intel Unveils Massive Memory AI Chip for Data Centers
Intel reveals its next-gen data center GPU with up to 480GB of LPDDR5X memory at Computex.

Microsoft Unveils Desktop AI Dev Box That Runs 120B-Parameter Models Locally
Microsoft's Surface RTX Spark Dev Box lets developers run large AI models on local hardware with 128GB unified memory, bypassing cloud costs. The device challenges the per-token pricing model that has dominated AI economics since ChatGPT's launch.

AI demand forces a fundamental shift in enterprise data center strategy
Rising AI workloads are pushing companies to rethink infrastructure, moving from general-purpose servers to specialized GPU clusters and liquid-cooled data centers.

Outdated Networks Threaten AI Progress for Many Organizations
AI's potential is limited by weak networking infrastructure. Many organizations lack the connectivity needed to support advanced AI workloads.

Apple’s AI Strategy Takes Shape: Gradual Rollout and New Leadership
Since WWDC 2024, Apple has quietly built its AI ecosystem with Apple Intelligence, Siri upgrades and a reshuffled team. The slow but deliberate pace signals a long-term bet on privacy and integration.

LLMs Do Math Without Numbers: New Research Reveals Hidden Process
New analysis shows large language models solve arithmetic using pattern matching and embeddings, not explicit numbers. The findings challenge assumptions about AI reasoning.

AI data centers spark memory chip shortage that could raise car and medical device prices
A coalition of nine U.S. trade groups warns the Trump administration that AI-driven demand for DRAM chips is squeezing supply, threatening price hikes across automotive, medical and telecom sectors through 2027.

Healthcare AI's Real Challenge Isn't Better Algorithms, It's Broken Systems
Healthcare AI fails in practice due to fragmented data and legacy systems, not weak algorithms. Real progress requires infrastructure modernization, not better models.

OpenRouter's $113M Series B Signals AI Middleware Boom
OpenRouter raised $113 million to connect developers to multiple AI models. The Series B round underscores growing investor confidence in AI infrastructure companies.

Nvidia Charts a Single Computing Path for Autonomous Devices
Jensen Huang says every edge device will become autonomous. Nvidia promotes one computing pattern from cloud to robotics.

HP's New Workstation Packs 784GB Memory for Trillion-Parameter AI Models
HP announced the ZGX Fury GB300, a workstation with 784GB unified memory and Nvidia GB300 GPU, handling trillion-parameter AI models. It targets enterprise workloads but at a high price.

Samsung Reveals HBM5 Memory Prototype With In-Package Cooling
Samsung showed its first HBM5 memory prototype at Computex, pairing the next-gen AI memory with a new in-package cooling system called Heat Path Block to tackle thermal challenges.

PyTorch Custom Operations Give Developers Deeper Control Over Model Performance
PyTorch's custom operation support lets developers write optimized CUDA kernels, balancing research flexibility with production efficiency.

New Programming Language CPPL Bridges Prompts and Circuits
A novel language called CPPL lets developers program circuits using AI-style prompts. It could reshape how hardware is designed for machine learning workloads.

Anthropic and OpenAI Take Rivalry to Midterm Elections
The AI companies are escalating their feud into political spending for the midterms, signaling a new era of tech influence in elections.

US Government Takes $2B Equity Stakes in IBM and Quantum Computing Firms
The US government acquires $2 billion in equity stakes in quantum computing companies, including IBM, marking a new era of public-private investment in critical technology.