AI Infrastructure
189 articles in this category (Page 2 of 8)
Sakana AI and NVIDIA Introduce TwELL: 20.5% Faster LLM Inference via Unstructured Sparsity
Sakana AI and NVIDIA introduced TwELL and custom CUDA kernels, achieving 20.5% inference and 21.9% training speedups in LLMs by exploiting activation sparsity.
IREN Limited (IREN): 21-Day Bullish Outlook Driven by $3.4B NVIDIA AI Cloud Contract Despite Earnings Miss
IREN's landmark $3.4B NVIDIA contract and $70 share purchase warrants signal strong medium-term upside, counterbalancing recent earnings misses and heavy capital expenditures.
NVIDIA Releases cuda-oxide: A Native Rust-to-PTX Compiler for SIMT GPU Kernels
NVIDIA AI researchers released cuda-oxide, an experimental Rust-to-CUDA compiler backend that compiles SIMT GPU kernels directly to PTX, achieving 868 TFLOPS on B200 GPUs.
LightSeek Foundation Releases TokenSpeed: An Open-Source Inference Engine for Agentic AI
LightSeek Foundation's TokenSpeed is an open-source LLM inference engine that outperforms TensorRT-LLM by 11% in throughput on NVIDIA B200 GPUs for agentic coding workloads.
DOCN 5-Day Outlook: Blowout Q1 Earnings Clash with Extreme Overbought Technicals and Dilution Risks
DigitalOcean's blowout Q1 earnings and raised guidance face immediate headwinds from an extreme 90.80 RSI and a massive secondary share offering.
Qwen-Scope: Open-Source Sparse AutoEncoders for LLM Interpretability and Steering
Qwen AI releases Qwen-Scope, an open-source suite of 14 Sparse AutoEncoders (SAEs) for Qwen3/3.5 models, enabling inference-time steering and benchmark analysis without model runs.
NVIDIA NeMo RL Accelerates LLM Post-Training with Lossless Speculative Decoding
NVIDIA Research integrates speculative decoding into NeMo RL v0.6.0, achieving a 1.8x rollout generation speedup at 8B scale and projecting a 2.5x end-to-end training speedup for 235B models.