vLLM Blog

19 items · Foundation Models & Frontier AI Labs · site ↗

Announcing Day-0 Support for NVIDIA Nemotron 3 Ultra on vLLM vLLM Blog yest
Fast & Efficient LLM Inference with vLLM: A New Course with DeepLearning.AI vLLM Blog Jun 3
Session-Aware Agentic Routing: Continuity-Aware Model Selection for Long-Horizon LLM Agents vLLM Blog Jun 2
Accelerating vLLM-Omni Inference with AutoRound Quantization vLLM Blog Jun 2
vLLM on the DGX Spark: Architecture, Configuration, and Local Evaluation vLLM Blog Jun 1
Accelerating Laguna XS.2 Inference with vLLM, Speculators, and LLM Compressor vLLM Blog May 28
Native RL APIs in vLLM vLLM Blog May 28
Speculators v0.5.0: DFlash Support and Online Training vLLM Blog May 28
From Text to Multimodal Routing: Hardening Vision Signals in vLLM Semantic Router vLLM Blog May 28
EAGLE 3.1: Advancing Speculative Decoding Through Collaboration Between the EAGLE Team, vLLM, and TorchSpec vLLM Blog May 26
vLLM x Novita AI: PegaFlow for Production-Grade External KV Cache vLLM Blog May 18
Elastic Expert Parallelism in vLLM vLLM Blog May 14
Announcing VeRL-Omni: Easy, Fast, and Stable RL Training for Diffusion and Omni-Modality Models vLLM Blog May 14
A First Comprehensive Study of TurboQuant: Accuracy and Performance vLLM Blog May 11
vLLM Tops the Artificial Analysis Leaderboard vLLM Blog May 11
Serving Agentic Workloads at Scale with vLLM x Mooncake vLLM Blog May 6
Run Highly Efficient Multimodal Agentic AI with NVIDIA Nemotron 3 Nano Omni Using vLLM vLLM Blog Apr 28
DeepSeek V4 in vLLM: Efficient Long-context Attention vLLM Blog Apr 24
The State of FP8 KV-Cache and Attention Quantization in vLLM vLLM Blog Apr 22

Keyboard

j / k
move between items
Space
expand / collapse
o
open original
s
save / unsave
m
mark read
/
focus search
?
this help