Results for "inference optimization"

39 results found

New Technique Losslessly Compresses KV Cache Up to 4x for Faster AI Inference

Speculative KV coding compresses key-value cache up to 4x without loss, potentially cutting memory costs and enabling larger models on existing hardware.

Jun 7, 20263 min read

AI / Machine Learning

Cerebras wafer-scale chip runs trillion-parameter model 7x faster than GPU clouds

Cerebras claims its wafer-scale chip runs a trillion-parameter AI model nearly seven times faster than GPU-based clouds, challenging Nvidia's dominance in inference.

May 20, 20263 min read

AI / Machine Learning

AI Pricing Models Face a Hard Reset

The era of cheap AI access is ending. Providers are shifting from subsidized pricing to sustainable models, forcing developers and businesses to adapt.

May 22, 20262 min read

CyberSecurity

Tampering Threats Emerge for Encrypted AI Reasoning Systems

Privacy-preserving AI models that process encrypted data may be vulnerable to undetectable manipulation, researchers warn. The finding challenges assumptions about security in confidential computing.

Jun 2, 20262 min read

AI / Machine Learning

Intel Unveils Massive Memory AI Chip for Data Centers

Intel reveals its next-gen data center GPU with up to 480GB of LPDDR5X memory at Computex.

Jun 2, 20262 min read

Gadgets / Consumer Tech

Microsoft Unveils Desktop AI Dev Box That Runs 120B-Parameter Models Locally

Microsoft's Surface RTX Spark Dev Box lets developers run large AI models on local hardware with 128GB unified memory, bypassing cloud costs. The device challenges the per-token pricing model that has dominated AI economics since ChatGPT's launch.

Jun 6, 20264 min read

Big Tech

AI demand forces a fundamental shift in enterprise data center strategy

Rising AI workloads are pushing companies to rethink infrastructure, moving from general-purpose servers to specialized GPU clusters and liquid-cooled data centers.

May 21, 20263 min read

AI / Machine Learning

Outdated Networks Threaten AI Progress for Many Organizations

AI's potential is limited by weak networking infrastructure. Many organizations lack the connectivity needed to support advanced AI workloads.

May 21, 20262 min read

Big Tech

Apple’s AI Strategy Takes Shape: Gradual Rollout and New Leadership

Since WWDC 2024, Apple has quietly built its AI ecosystem with Apple Intelligence, Siri upgrades and a reshuffled team. The slow but deliberate pace signals a long-term bet on privacy and integration.

Jun 5, 20262 min read

AI / Machine Learning

LLMs Do Math Without Numbers: New Research Reveals Hidden Process

New analysis shows large language models solve arithmetic using pattern matching and embeddings, not explicit numbers. The findings challenge assumptions about AI reasoning.

Jun 7, 20262 min read

Tech Policy & Regulation

AI data centers spark memory chip shortage that could raise car and medical device prices

A coalition of nine U.S. trade groups warns the Trump administration that AI-driven demand for DRAM chips is squeezing supply, threatening price hikes across automotive, medical and telecom sectors through 2027.

Jun 7, 20263 min read

AI / Machine Learning

Healthcare AI's Real Challenge Isn't Better Algorithms, It's Broken Systems

Healthcare AI fails in practice due to fragmented data and legacy systems, not weak algorithms. Real progress requires infrastructure modernization, not better models.

May 26, 20263 min read

Startups / Funding

OpenRouter's $113M Series B Signals AI Middleware Boom

OpenRouter raised $113 million to connect developers to multiple AI models. The Series B round underscores growing investor confidence in AI infrastructure companies.

May 31, 20263 min read

Big Tech

Nvidia Charts a Single Computing Path for Autonomous Devices

Jensen Huang says every edge device will become autonomous. Nvidia promotes one computing pattern from cloud to robotics.

Jun 5, 20262 min read

Gadgets / Consumer Tech

HP's New Workstation Packs 784GB Memory for Trillion-Parameter AI Models

HP announced the ZGX Fury GB300, a workstation with 784GB unified memory and Nvidia GB300 GPU, handling trillion-parameter AI models. It targets enterprise workloads but at a high price.

Jun 6, 20263 min read

AI / Machine Learning

Samsung Reveals HBM5 Memory Prototype With In-Package Cooling

Samsung showed its first HBM5 memory prototype at Computex, pairing the next-gen AI memory with a new in-package cooling system called Heat Path Block to tackle thermal challenges.

Jun 6, 20262 min read

Software Development

PyTorch Custom Operations Give Developers Deeper Control Over Model Performance

PyTorch's custom operation support lets developers write optimized CUDA kernels, balancing research flexibility with production efficiency.

Jun 7, 20263 min read

AI / Machine Learning

New Programming Language CPPL Bridges Prompts and Circuits

A novel language called CPPL lets developers program circuits using AI-style prompts. It could reshape how hardware is designed for machine learning workloads.

May 25, 20263 min read

Tech Policy & Regulation

Anthropic and OpenAI Take Rivalry to Midterm Elections

The AI companies are escalating their feud into political spending for the midterms, signaling a new era of tech influence in elections.

May 20, 20262 min read

Tech Policy & Regulation

US Government Takes $2B Equity Stakes in IBM and Quantum Computing Firms

The US government acquires $2 billion in equity stakes in quantum computing companies, including IBM, marking a new era of public-private investment in critical technology.

May 21, 20262 min read