vLLM Blog

19 items · Foundation Models & Frontier AI Labs · site ↗

Announcing Day-0 Support for NVIDIA Nemotron 3 Ultra on vLLM

vLLM Blog yest

Fast & Efficient LLM Inference with vLLM: A New Course with DeepLearning.AI

vLLM Blog Jun 3

Session-Aware Agentic Routing: Continuity-Aware Model Selection for Long-Horizon LLM Agents

vLLM Blog Jun 2

Accelerating vLLM-Omni Inference with AutoRound Quantization

vLLM Blog Jun 2

vLLM on the DGX Spark: Architecture, Configuration, and Local Evaluation

vLLM Blog Jun 1

Accelerating Laguna XS.2 Inference with vLLM, Speculators, and LLM Compressor

vLLM Blog May 28

Native RL APIs in vLLM

vLLM Blog May 28

Speculators v0.5.0: DFlash Support and Online Training

vLLM Blog May 28

From Text to Multimodal Routing: Hardening Vision Signals in vLLM Semantic Router

vLLM Blog May 28

EAGLE 3.1: Advancing Speculative Decoding Through Collaboration Between the EAGLE Team, vLLM, and TorchSpec

vLLM Blog May 26

vLLM x Novita AI: PegaFlow for Production-Grade External KV Cache

vLLM Blog May 18

Elastic Expert Parallelism in vLLM

vLLM Blog May 14

Announcing VeRL-Omni: Easy, Fast, and Stable RL Training for Diffusion and Omni-Modality Models

vLLM Blog May 14

A First Comprehensive Study of TurboQuant: Accuracy and Performance

vLLM Blog May 11

vLLM Tops the Artificial Analysis Leaderboard

vLLM Blog May 11

Serving Agentic Workloads at Scale with vLLM x Mooncake

vLLM Blog May 6

Run Highly Efficient Multimodal Agentic AI with NVIDIA Nemotron 3 Nano Omni Using vLLM

vLLM Blog Apr 28

DeepSeek V4 in vLLM: Efficient Long-context Attention

vLLM Blog Apr 24

The State of FP8 KV-Cache and Attention Quantization in vLLM

vLLM Blog Apr 22