r/LocalLLaMA

96 items · Foundation Models & Frontier AI Labs · site ↗

[NEW MODEL] SupraLabs just released a new model! - Supra-50M-Reasoning r/LocalLLaMA 1h
RTX Pro 4500 Blackwell Performance Numbers r/LocalLLaMA 2h
Gemma 4 12B is my new main squeeze r/LocalLLaMA 5h
hello there! i made a tool to explore kokoro. r/LocalLLaMA 7h
Here is my llama.cpp NVFP4/MXFP6 GGUF quantizer tool r/LocalLLaMA 7h
Finally finished my LLM server: EPYC 9575F, 4× RTX 3090 (96GB VRAM), 768GB ECC RAM r/LocalLLaMA 8h
How LLM-driven NPCs work in Ultima Online (ServUO) r/LocalLLaMA 9h
RTX Spark Ads: DJT Edition r/LocalLLaMA 11h
finally r/LocalLLaMA 11h
Higgs Audio v3 TTS 4B. Built for voice chat. Support 100 languages and inline control. r/LocalLLaMA 13h
You guys were right - Qwen 3.6 35B IS good...and KV Cache DOES matter. r/LocalLLaMA 16h
Nvidia's been paying shills on LinkedIn r/LocalLLaMA 20h
Today made me realize just how bad things have gotten without Meta r/LocalLLaMA 20h
VibeOS - Fully Hallucinated Operating System r/LocalLLaMA 21h
KVarN: new KV-cache quant from Huawei. 3–5× KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag) r/LocalLLaMA 21h
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16 · Hugging Face r/LocalLLaMA yest
nex-agi/Nex-N2-mini • Huggingface r/LocalLLaMA yest
Gemma 4 QAT confirmed to release soon! r/LocalLLaMA yest
Gemma 4 12b 8Q Heretic Oneshot Coding r/LocalLLaMA yest
The first Gemma 4 12B finetunes are ready r/LocalLLaMA yest
Me visiting this sub r/LocalLLaMA yest
Trump signs narrower executive order on AI oversight after industry objections r/LocalLLaMA yest
How can the numbers be this massive within a month ?? r/LocalLLaMA yest
New Google Gemma 4 12B Claims Near-26B Performance - We Tested Both! r/LocalLLaMA yest
Gemma 4 12B first coding agent test on a 4080 Super r/LocalLLaMA yest
gemma-4-12b-it vs Qwen3.5-9B on shared benchmarks: Qwen is overall winner beating gemma in 5/8 benchmarks despite a smaller footprint r/LocalLLaMA yest
More Gemma 4 models incoming r/LocalLLaMA yest
Introducing Gemma 4 12B: a unified, encoder-free multimodal model r/LocalLLaMA yest
Let us let Google know that we want the Gemma 4 124b r/LocalLLaMA yest
google/gemma-4-12B · Hugging Face r/LocalLLaMA yest
ui: Mermaid Diagrams in chat + interactive preview by allozaur · Pull Request #24032 · ggml-org/llama.cpp r/LocalLLaMA yest
Take Three: What’s the rub on memory sessions? r/LocalLLaMA yest
Qwen 3.7 Plus just briefly appeared and then disappeared on OpenRouter. r/LocalLLaMA yest
How does the new abliteration tool Apostate compare with others? - Abliterlitics r/LocalLLaMA yest
Tensor split mode: CUDA error on latest llama.cpp with Qwen-3.6-27b r/LocalLLaMA yest
How much VRAM needed for Qwen 3.6 27B Q8 with 262K context? r/LocalLLaMA Jun 3
Calling it now Microsoft is buying Unsloth. r/LocalLLaMA Jun 3
Holo3.1 35B/9B/4B/0.8B (Qwen 3.5 finetunes) r/LocalLLaMA Jun 3
Another shout out to llama.cpp build b9455 2x3090 r/LocalLLaMA Jun 3
Microsoft Aion 1.0 Instruct and Aion 1.0 Plan models! r/LocalLLaMA Jun 3
Nous Research — Hermes Desktop r/LocalLLaMA Jun 3
Why do we benchmark quants on perplexity and prose but never on tool call validity? r/LocalLLaMA Jun 3
I Put a Datacenter GPU in My Gaming PC for £200 r/LocalLLaMA Jun 2
Minimax M3 appears to have no political censorship r/LocalLLaMA Jun 2
I have become George Jetson: my job is now Yes/No supervision for a machine I don’t fully understand. r/LocalLLaMA Jun 2
1-bit Bonsai Image 4B and Ternary Bonsai Image 4B Image Generation for Local Devices with just 0.93 GB and 1.21 GB respectively of Diffusion Transformer Footprint. So tiny! r/LocalLLaMA Jun 2
ui: Add Thinking mode toggle with reasoning effort levels + improvements for Chat Form Add Action UI by allozaur · Pull Request #23434 · ggml-org/llama.cpp r/LocalLLaMA Jun 2
Tiny LLM Benchmark: Jetson Orin Nano Super 8GB - Four Power Modes × Eight Models r/LocalLLaMA Jun 2
Building a free, offline LLM “tutor” grounded in one university textbook — RAG, LoRA, or both? Sanity check wanted r/LocalLLaMA Jun 2
Ignoring benchmarks, how do the newest local models (gemma 4 31B, 26BA4B, Qwen 3.6) “feel” to you? What do you think they compare to? r/LocalLLaMA Jun 2
Replaced Claude with local Qwen3.6-27B in my multi-agent orchestrator for 2 weeks r/LocalLLaMA Jun 2
Dual rtx 3090 build r/LocalLLaMA Jun 2
Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro r/LocalLLaMA Jun 2
Intel Arc Pro B70 llama.cpp benchmarks posted r/LocalLLaMA Jun 2
NVIDIA releases Cosmos 3 Omnimodal world modelson HF r/LocalLLaMA Jun 2
Moss tts 1.5 8b Examples. It is the currently best voice cloning model for English as of June 2026 r/LocalLLaMA Jun 2
Stop asking what model to run. There are literally only two. r/LocalLLaMA Jun 1
RTX Spark does not have 600GB/s Bandwith r/LocalLLaMA Jun 1
I trusted random person on this subreddit and bought 3080 20gb made of chinesium r/LocalLLaMA Jun 1
llama: limit max outputs of `llama_context` by am17an · Pull Request #23861 · ggml-org/llama.cpp r/LocalLLaMA Jun 1
So qwen3.7-4b when? r/LocalLLaMA Jun 1
i dedicate this meme to you r/LocalLLaMA r/LocalLLaMA Jun 1
For Ling-2.6-1T, what would make the size feel justified first: quality per token, local serving reality, or long context stability? r/LocalLLaMA Jun 1
Mellum2 Goes Open Source: A Fast Model for AI Workflows | The JetBrains AI Blog r/LocalLLaMA Jun 1
Mellum 2 12B A2.5B r/LocalLLaMA Jun 1
Cheap V100 32gb r/LocalLLaMA Jun 1
Entire world: We need more GPUs. Meanwhile, Jensen Huang: r/LocalLLaMA Jun 1
A 1B humanizer that matches human writing on an AI detector r/LocalLLaMA Jun 1
Just found a 1-click RCE in pewdiepie's Odysseus Chat r/LocalLLaMA Jun 1
Open Models - May 2026 r/LocalLLaMA Jun 1
next MiniMax will be released in ~10 Days r/LocalLLaMA Jun 1
NVIDIA announces Nemotron 3 Ultra r/LocalLLaMA Jun 1
when you spend 5 days fine-tuning a model and it still confidently makes things up r/LocalLLaMA Jun 1
MiniMax M3 - Coding & Agentic Frontier, 1M Context, Multimodal r/LocalLLaMA Jun 1
Minimax M3 seems to be rolling out on the API r/LocalLLaMA Jun 1
Get you some GPUs, it's not worth the hacks around lack of RAM r/LocalLLaMA Jun 1
Semantic Step Prediction: Multi-Step Latent Forecasting in LLM Reasoning Trajectories via Step Sampling r/LocalLLaMA May 31
GPU Prices. Buy now, or buy later? r/LocalLLaMA May 31
G7 agrees on shared language around open-source AI and open weights AI r/LocalLLaMA May 31
God dammit Qwen r/LocalLLaMA May 31
I ported NVIDIA Parakeet (speech-to-text) to ggml: same output as NeMo, faster, GGUF-quantized, no Python r/LocalLLaMA May 31
What's this sub geebral opinion on quantisizing the KV cache r/LocalLLaMA May 31
Whats actually happening when a model spills out of VRAM into system memory? r/LocalLLaMA May 31
Llama Studio v0.2.0 r/LocalLLaMA May 31
Qwen3.6-35B vs Gemma4-26B on 7900 XTX r/LocalLLaMA May 31
(YT) PewDiePie released his harness/webui r/LocalLLaMA May 31
We might have a winner with the upcoming N1X r/LocalLLaMA May 31
Added an old 2070 Super to my rig and I can't go back...worse, now I need more r/LocalLLaMA May 31
13 abliterated Gemma 4 E2B variants, 44 GPU hours, Benchmark and Comparison - Abliterlitics r/LocalLLaMA May 31
Stepfun 3.7 Flash is very good r/LocalLLaMA May 31
Flash Attention for llama.cpp on RDNA3: 47% less KV VRAM than Vulkan f16 K, KLD almost losselss on F16 K / q4_0 V. Part 1. r/LocalLLaMA May 31
<Think> toggle button for llama.cp web chat for QWEN3.6 r/LocalLLaMA May 31
It's funny how everything changes, yet somehow stays the same. r/LocalLLaMA May 31
Dell confirms XPS laptop with NVIDIA N1X at Computex ( basically a DGX Spark GB10 for consumers with Windows ) r/LocalLLaMA May 31
My home data center r/LocalLLaMA May 31
Someone out there likely needs this r/LocalLLaMA May 30

Keyboard

j / k
move between items
Space
expand / collapse
o
open original
s
save / unsave
m
mark read
/
focus search
?
this help