AI Infrastructure

210 articles in this category (Page 3 of 9)

EarningsContractsAI Infrastructure

Babcock & Wilcox (BW) Surges on Q1 Earnings Beat and $2.4B AI Contract: 5-Day Increase Expected

BW is poised for a short-term breakout following a massive Q1 earnings beat, a 1,971% surge in bookings, and a $2.4B AI data center contract.

May 11, 2026BW

AI InfrastructureEarnings AnalysisM&A

IREN Limited (IREN): 21-Day Bullish Outlook Driven by $3.4B NVIDIA AI Cloud Contract Despite Earnings Miss

IREN's landmark $3.4B NVIDIA contract and $70 share purchase warrants signal strong medium-term upside, counterbalancing recent earnings misses and heavy capital expenditures.

May 11, 2026IREN

AI NewsAI InfrastructureSoftware Engineering

NVIDIA Releases cuda-oxide: A Native Rust-to-PTX Compiler for SIMT GPU Kernels

NVIDIA AI researchers released cuda-oxide, an experimental Rust-to-CUDA compiler backend that compiles SIMT GPU kernels directly to PTX, achieving 868 TFLOPS on B200 GPUs.

May 9, 2026

AI NewsMachine LearningAI Infrastructure

Adaptive Parallel Reasoning: Scaling Inference with Dynamic Control

Adaptive Parallel Reasoning (APR) allows LLMs to dynamically spawn concurrent threads, reducing latency compared to linear sequential reasoning which can take hours.

May 8, 2026

AI NewsAI InfrastructureOpen Source

LightSeek Foundation Releases TokenSpeed: An Open-Source Inference Engine for Agentic AI

LightSeek Foundation's TokenSpeed is an open-source LLM inference engine that outperforms TensorRT-LLM by 11% in throughput on NVIDIA B200 GPUs for agentic coding workloads.

May 7, 2026

AI NewsAI InfrastructureSoftware Engineering

OpenAI Releases MRC Protocol: Scaling AI Supercomputing to 131,000 GPUs

OpenAI's new MRC protocol enables 131,000 GPU clusters with 33% fewer optics and microsecond failure recovery for frontier AI model training.

May 7, 2026

SemiconductorsEarningsAI Infrastructure

NVIDIA (NVDA) 21-Day Outlook: Earnings Catalyst and Blackwell Ramp Drive Bullish Momentum

NVIDIA's upcoming May 20 earnings report, backed by massive free cash flow generation and 100% bullish news sentiment, signals a strong upward trajectory.

May 7, 2026NVDA

AI NewsAgentic AIAI Infrastructure

Building a Groq-Powered Agentic Research Assistant with LangGraph and Sub-Agents

Build a high-performance research assistant using Groq's inference endpoint, LangGraph, and Llama-3.3-70b to automate multi-step workflows with agentic memory.

May 6, 2026

AI NewsAI InfrastructureLarge Language Model

Google AI Releases MTP Drafters for Gemma 4: Accelerating Inference by 3x

Google AI releases MTP drafters for Gemma 4, using speculative decoding to deliver up to 3x faster inference without quality loss.

May 6, 2026

AI NewsAgentic AIAI Infrastructure

CopilotKit Introduces Enterprise Intelligence Platform for Persistent Agentic Memory

CopilotKit launches the Enterprise Intelligence Platform to provide agentic applications with persistent memory and state across sessions and devices.

May 6, 2026

AI NewsAI InfrastructureLanguage Model

Zyphra ZAYA1-8B: A 760M Parameter MoE Model Outperforming Claude 4.5 on Math

Zyphra's ZAYA1-8B uses 760M active parameters to outperform Claude 4.5 Sonnet on math benchmarks using novel Markovian RSA test-time compute.

May 6, 2026

EarningsTechnical AnalysisAI Infrastructure

DOCN 5-Day Outlook: Blowout Q1 Earnings Clash with Extreme Overbought Technicals and Dilution Risks

DigitalOcean's blowout Q1 earnings and raised guidance face immediate headwinds from an extreme 90.80 RSI and a massive secondary share offering.

May 5, 2026DOCN

AI NewsCloud EngineeringAI Infrastructure

Architectural Strategies for Cross-Cloud Multi-Agent Systems Deployment

Deploying cross-cloud Multi-Agent Systems requires replacing synchronous HTTP with asynchronous brokers to prevent 40-second timeout failures.

May 4, 2026

AI NewsAI InfrastructureMachine Learning

Zyphra's TSP Strategy Achieves 2.6x Throughput for Large-Scale AI Training

Zyphra introduces Tensor and Sequence Parallelism (TSP), a hardware-aware strategy delivering 2.6x throughput over TP+SP baselines using 1,024 AMD MI300X GPUs.

May 4, 2026

AI NewsAI InfrastructureSoftware Engineering

Mitigating Tokenization Drift: How Spacing and Formatting Impact LLM Performance

Tokenization drift causes model degradation through minor formatting changes, with rewording instructions potentially cutting token overlap to 50%.

May 3, 2026

AI NewsAI InfrastructureLanguage Model

Mastering LLM Post-Training: A Practical Guide to SFT, DPO, and GRPO with TRL

Learn to align LLMs using the TRL library, covering SFT, Reward Modeling, DPO, and GRPO for reasoning tasks, optimized for limited hardware like NVIDIA T4 GPUs.

May 1, 2026

AI NewsAI InfrastructureMachine Learning

Qwen-Scope: Open-Source Sparse AutoEncoders for LLM Interpretability and Steering

Qwen AI releases Qwen-Scope, an open-source suite of 14 Sparse AutoEncoders (SAEs) for Qwen3/3.5 models, enabling inference-time steering and benchmark analysis without model runs.

May 1, 2026

AI NewsAI InfrastructureLarge Language Model

NVIDIA NeMo RL Accelerates LLM Post-Training with Lossless Speculative Decoding

NVIDIA Research integrates speculative decoding into NeMo RL v0.6.0, achieving a 1.8x rollout generation speedup at 8B scale and projecting a 2.5x end-to-end training speedup for 235B models.

May 1, 2026

AI NewsAI InfrastructureLarge Language Models

Moonshot AI Releases FlashKDA: 2.22x Faster Prefill for Kimi Delta Attention

Moonshot AI open-sources FlashKDA, a CUTLASS-based kernel delivering up to 2.22x prefill speedups for Kimi Delta Attention on NVIDIA H20 GPUs.

Apr 30, 2026

AI NewsAI InfrastructureMachine Learning

FlashQLA: High-Performance Linear Attention Library for NVIDIA Hopper GPUs

The Qwen Team has released FlashQLA, a linear attention kernel library achieving up to 3x speedup on NVIDIA Hopper GPUs for Gated Delta Network architectures.

Apr 29, 2026

AI NewsAI InfrastructureLarge Language Model

Top 10 KV Cache Compression Techniques for LLM Inference

KV cache compression reduces memory overhead by up to 93.3%, enabling larger batch sizes and higher throughput for long-context LLM inference.

Apr 29, 2026

AI NewsLarge Language ModelAI Infrastructure

DeepSeek-V4: 1M-Token Contexts via Compressed Sparse Attention and Hybrid Architecture

DeepSeek-AI releases DeepSeek-V4, featuring hybrid CSA/HCA attention that reduces KV cache size to 10% of previous models while supporting one-million-token contexts.

Apr 24, 2026

AI NewsAgentic AIAI Infrastructure

Google Cloud AI Research Unveils ReasoningBank: A Strategy-Distillation Framework for Agents

Google Cloud AI's ReasoningBank boosts agent success rates by 8.3% on WebArena by distilling reusable strategies from both successes and failures.

Apr 23, 2026

AI NewsAI InfrastructureMachine Learning

Google DeepMind’s Decoupled DiLoCo: Scaling AI Training with 88% Goodput and Asynchronous Fault Tolerance

Google DeepMind's Decoupled DiLoCo achieves 88% goodput under high hardware failure rates and reduces inter-datacenter bandwidth from 198 Gbps to 0.84 Gbps.

Apr 23, 2026