Skip to main content
← All Tags

AI Infrastructure

189 articles in this category (Page 2 of 8)

AI NewsAI InfrastructureMachine Learning

Tilde Research Aurora: Solving the Neuron Death Crisis in Muon Optimizers

Tilde Research introduces Aurora, a leverage-aware optimizer that fixes Muon's neuron death flaw, achieving 100x data efficiency and a new SoTA on modded-nanoGPT.

Read more
AI NewsAI InfrastructureMachine Learning

Meta and Stanford Propose Fast Byte Latent Transformer to Slash Inference Bandwidth by Over 50%

Meta and Stanford researchers introduced BLT-D, reducing byte-level inference memory bandwidth by over 50% without tokenization.

Read more
AI NewsAI InfrastructureLarge Language Model

Sakana AI and NVIDIA Introduce TwELL: 20.5% Faster LLM Inference via Unstructured Sparsity

Sakana AI and NVIDIA introduced TwELL and custom CUDA kernels, achieving 20.5% inference and 21.9% training speedups in LLMs by exploiting activation sparsity.

Read more
EarningsContractsAI Infrastructure

Babcock & Wilcox (BW) Surges on Q1 Earnings Beat and $2.4B AI Contract: 5-Day Increase Expected

BW is poised for a short-term breakout following a massive Q1 earnings beat, a 1,971% surge in bookings, and a $2.4B AI data center contract.

BW
Read more
AI InfrastructureEarnings AnalysisM&A

IREN Limited (IREN): 21-Day Bullish Outlook Driven by $3.4B NVIDIA AI Cloud Contract Despite Earnings Miss

IREN's landmark $3.4B NVIDIA contract and $70 share purchase warrants signal strong medium-term upside, counterbalancing recent earnings misses and heavy capital expenditures.

IREN
Read more
AI NewsAI InfrastructureSoftware Engineering

NVIDIA Releases cuda-oxide: A Native Rust-to-PTX Compiler for SIMT GPU Kernels

NVIDIA AI researchers released cuda-oxide, an experimental Rust-to-CUDA compiler backend that compiles SIMT GPU kernels directly to PTX, achieving 868 TFLOPS on B200 GPUs.

Read more
AI NewsMachine LearningAI Infrastructure

Adaptive Parallel Reasoning: Scaling Inference with Dynamic Control

Adaptive Parallel Reasoning (APR) allows LLMs to dynamically spawn concurrent threads, reducing latency compared to linear sequential reasoning which can take hours.

Read more
AI NewsAI InfrastructureOpen Source

LightSeek Foundation Releases TokenSpeed: An Open-Source Inference Engine for Agentic AI

LightSeek Foundation's TokenSpeed is an open-source LLM inference engine that outperforms TensorRT-LLM by 11% in throughput on NVIDIA B200 GPUs for agentic coding workloads.

Read more
AI NewsAI InfrastructureSoftware Engineering

OpenAI Releases MRC Protocol: Scaling AI Supercomputing to 131,000 GPUs

OpenAI's new MRC protocol enables 131,000 GPU clusters with 33% fewer optics and microsecond failure recovery for frontier AI model training.

Read more
SemiconductorsEarningsAI Infrastructure

NVIDIA (NVDA) 21-Day Outlook: Earnings Catalyst and Blackwell Ramp Drive Bullish Momentum

NVIDIA's upcoming May 20 earnings report, backed by massive free cash flow generation and 100% bullish news sentiment, signals a strong upward trajectory.

NVDA
Read more
AI NewsAgentic AIAI Infrastructure

Building a Groq-Powered Agentic Research Assistant with LangGraph and Sub-Agents

Build a high-performance research assistant using Groq's inference endpoint, LangGraph, and Llama-3.3-70b to automate multi-step workflows with agentic memory.

Read more
AI NewsAgentic AIAI Infrastructure

CopilotKit Introduces Enterprise Intelligence Platform for Persistent Agentic Memory

CopilotKit launches the Enterprise Intelligence Platform to provide agentic applications with persistent memory and state across sessions and devices.

Read more
AI NewsAI InfrastructureLarge Language Model

Google AI Releases MTP Drafters for Gemma 4: Accelerating Inference by 3x

Google AI releases MTP drafters for Gemma 4, using speculative decoding to deliver up to 3x faster inference without quality loss.

Read more
AI NewsAI InfrastructureLanguage Model

Zyphra ZAYA1-8B: A 760M Parameter MoE Model Outperforming Claude 4.5 on Math

Zyphra's ZAYA1-8B uses 760M active parameters to outperform Claude 4.5 Sonnet on math benchmarks using novel Markovian RSA test-time compute.

Read more
EarningsTechnical AnalysisAI Infrastructure

DOCN 5-Day Outlook: Blowout Q1 Earnings Clash with Extreme Overbought Technicals and Dilution Risks

DigitalOcean's blowout Q1 earnings and raised guidance face immediate headwinds from an extreme 90.80 RSI and a massive secondary share offering.

DOCN
Read more
AI NewsCloud EngineeringAI Infrastructure

Architectural Strategies for Cross-Cloud Multi-Agent Systems Deployment

Deploying cross-cloud Multi-Agent Systems requires replacing synchronous HTTP with asynchronous brokers to prevent 40-second timeout failures.

Read more
AI NewsAI InfrastructureMachine Learning

Zyphra's TSP Strategy Achieves 2.6x Throughput for Large-Scale AI Training

Zyphra introduces Tensor and Sequence Parallelism (TSP), a hardware-aware strategy delivering 2.6x throughput over TP+SP baselines using 1,024 AMD MI300X GPUs.

Read more
AI NewsAI InfrastructureSoftware Engineering

Mitigating Tokenization Drift: How Spacing and Formatting Impact LLM Performance

Tokenization drift causes model degradation through minor formatting changes, with rewording instructions potentially cutting token overlap to 50%.

Read more
AI NewsAI InfrastructureLanguage Model

Mastering LLM Post-Training: A Practical Guide to SFT, DPO, and GRPO with TRL

Learn to align LLMs using the TRL library, covering SFT, Reward Modeling, DPO, and GRPO for reasoning tasks, optimized for limited hardware like NVIDIA T4 GPUs.

Read more
AI NewsAI InfrastructureMachine Learning

Qwen-Scope: Open-Source Sparse AutoEncoders for LLM Interpretability and Steering

Qwen AI releases Qwen-Scope, an open-source suite of 14 Sparse AutoEncoders (SAEs) for Qwen3/3.5 models, enabling inference-time steering and benchmark analysis without model runs.

Read more
AI NewsAI InfrastructureLarge Language Model

NVIDIA NeMo RL Accelerates LLM Post-Training with Lossless Speculative Decoding

NVIDIA Research integrates speculative decoding into NeMo RL v0.6.0, achieving a 1.8x rollout generation speedup at 8B scale and projecting a 2.5x end-to-end training speedup for 235B models.

Read more
AI NewsAI InfrastructureLarge Language Models

Moonshot AI Releases FlashKDA: 2.22x Faster Prefill for Kimi Delta Attention

Moonshot AI open-sources FlashKDA, a CUTLASS-based kernel delivering up to 2.22x prefill speedups for Kimi Delta Attention on NVIDIA H20 GPUs.

Read more
AI NewsAI InfrastructureMachine Learning

FlashQLA: High-Performance Linear Attention Library for NVIDIA Hopper GPUs

The Qwen Team has released FlashQLA, a linear attention kernel library achieving up to 3x speedup on NVIDIA Hopper GPUs for Gated Delta Network architectures.

Read more
AI NewsAI InfrastructureLarge Language Model

Top 10 KV Cache Compression Techniques for LLM Inference

KV cache compression reduces memory overhead by up to 93.3%, enabling larger batch sizes and higher throughput for long-context LLM inference.

Read more