r/LocalLLaMA

96 items · Foundation Models & Frontier AI Labs · site ↗

[NEW MODEL] SupraLabs just released a new model! - Supra-50M-Reasoning

r/LocalLLaMA 1h

RTX Pro 4500 Blackwell Performance Numbers

r/LocalLLaMA 2h

Gemma 4 12B is my new main squeeze

r/LocalLLaMA 5h

hello there! i made a tool to explore kokoro.

r/LocalLLaMA 7h

Here is my llama.cpp NVFP4/MXFP6 GGUF quantizer tool

r/LocalLLaMA 7h

Finally finished my LLM server: EPYC 9575F, 4× RTX 3090 (96GB VRAM), 768GB ECC RAM

r/LocalLLaMA 8h

How LLM-driven NPCs work in Ultima Online (ServUO)

r/LocalLLaMA 9h

RTX Spark Ads: DJT Edition

r/LocalLLaMA 11h

finally

r/LocalLLaMA 11h

Higgs Audio v3 TTS 4B. Built for voice chat. Support 100 languages and inline control.

r/LocalLLaMA 13h

You guys were right - Qwen 3.6 35B IS good...and KV Cache DOES matter.

r/LocalLLaMA 16h

Nvidia's been paying shills on LinkedIn

r/LocalLLaMA 20h

Today made me realize just how bad things have gotten without Meta

r/LocalLLaMA 20h

VibeOS - Fully Hallucinated Operating System

r/LocalLLaMA 21h

KVarN: new KV-cache quant from Huawei. 3–5× KV cache compression with actual speed-up instead of slow-down, and unlike TurboQuant it holds up on reasoning (Apache 2.0, vLLM single flag)

r/LocalLLaMA 21h

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16 · Hugging Face

r/LocalLLaMA yest

nex-agi/Nex-N2-mini • Huggingface

r/LocalLLaMA yest

Gemma 4 QAT confirmed to release soon!

r/LocalLLaMA yest

Gemma 4 12b 8Q Heretic Oneshot Coding

r/LocalLLaMA yest

The first Gemma 4 12B finetunes are ready

r/LocalLLaMA yest

Me visiting this sub

r/LocalLLaMA yest

Trump signs narrower executive order on AI oversight after industry objections

r/LocalLLaMA yest

How can the numbers be this massive within a month ??

r/LocalLLaMA yest

New Google Gemma 4 12B Claims Near-26B Performance - We Tested Both!

r/LocalLLaMA yest

Gemma 4 12B first coding agent test on a 4080 Super

r/LocalLLaMA yest

gemma-4-12b-it vs Qwen3.5-9B on shared benchmarks: Qwen is overall winner beating gemma in 5/8 benchmarks despite a smaller footprint

r/LocalLLaMA yest

More Gemma 4 models incoming

r/LocalLLaMA yest

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

r/LocalLLaMA yest

Let us let Google know that we want the Gemma 4 124b

r/LocalLLaMA yest

google/gemma-4-12B · Hugging Face

r/LocalLLaMA yest

ui: Mermaid Diagrams in chat + interactive preview by allozaur · Pull Request #24032 · ggml-org/llama.cpp

r/LocalLLaMA yest

Take Three: What’s the rub on memory sessions?

r/LocalLLaMA yest

Qwen 3.7 Plus just briefly appeared and then disappeared on OpenRouter.

r/LocalLLaMA yest

How does the new abliteration tool Apostate compare with others? - Abliterlitics

r/LocalLLaMA yest

Tensor split mode: CUDA error on latest llama.cpp with Qwen-3.6-27b

r/LocalLLaMA yest

How much VRAM needed for Qwen 3.6 27B Q8 with 262K context?

r/LocalLLaMA Jun 3

Calling it now Microsoft is buying Unsloth.

r/LocalLLaMA Jun 3

Holo3.1 35B/9B/4B/0.8B (Qwen 3.5 finetunes)

r/LocalLLaMA Jun 3

Another shout out to llama.cpp build b9455 2x3090

r/LocalLLaMA Jun 3

Microsoft Aion 1.0 Instruct and Aion 1.0 Plan models!

r/LocalLLaMA Jun 3

Nous Research — Hermes Desktop

r/LocalLLaMA Jun 3

Why do we benchmark quants on perplexity and prose but never on tool call validity?

r/LocalLLaMA Jun 3

I Put a Datacenter GPU in My Gaming PC for £200

r/LocalLLaMA Jun 2

Minimax M3 appears to have no political censorship

r/LocalLLaMA Jun 2

I have become George Jetson: my job is now Yes/No supervision for a machine I don’t fully understand.

r/LocalLLaMA Jun 2

1-bit Bonsai Image 4B and Ternary Bonsai Image 4B Image Generation for Local Devices with just 0.93 GB and 1.21 GB respectively of Diffusion Transformer Footprint. So tiny!

r/LocalLLaMA Jun 2

ui: Add Thinking mode toggle with reasoning effort levels + improvements for Chat Form Add Action UI by allozaur · Pull Request #23434 · ggml-org/llama.cpp

r/LocalLLaMA Jun 2

Tiny LLM Benchmark: Jetson Orin Nano Super 8GB - Four Power Modes × Eight Models

r/LocalLLaMA Jun 2

Building a free, offline LLM “tutor” grounded in one university textbook — RAG, LoRA, or both? Sanity check wanted

r/LocalLLaMA Jun 2

Ignoring benchmarks, how do the newest local models (gemma 4 31B, 26BA4B, Qwen 3.6) “feel” to you? What do you think they compare to?

r/LocalLLaMA Jun 2

Replaced Claude with local Qwen3.6-27B in my multi-agent orchestrator for 2 weeks

r/LocalLLaMA Jun 2

Dual rtx 3090 build

r/LocalLLaMA Jun 2

Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro

r/LocalLLaMA Jun 2

Intel Arc Pro B70 llama.cpp benchmarks posted

r/LocalLLaMA Jun 2

NVIDIA releases Cosmos 3 Omnimodal world modelson HF

r/LocalLLaMA Jun 2

Moss tts 1.5 8b Examples. It is the currently best voice cloning model for English as of June 2026

r/LocalLLaMA Jun 2

Stop asking what model to run. There are literally only two.

r/LocalLLaMA Jun 1

RTX Spark does not have 600GB/s Bandwith

r/LocalLLaMA Jun 1

I trusted random person on this subreddit and bought 3080 20gb made of chinesium

r/LocalLLaMA Jun 1

llama: limit max outputs of `llama_context` by am17an · Pull Request #23861 · ggml-org/llama.cpp

r/LocalLLaMA Jun 1

So qwen3.7-4b when?

r/LocalLLaMA Jun 1

i dedicate this meme to you r/LocalLLaMA

r/LocalLLaMA Jun 1

For Ling-2.6-1T, what would make the size feel justified first: quality per token, local serving reality, or long context stability?

r/LocalLLaMA Jun 1

Mellum2 Goes Open Source: A Fast Model for AI Workflows | The JetBrains AI Blog

r/LocalLLaMA Jun 1

Mellum 2 12B A2.5B

r/LocalLLaMA Jun 1

Cheap V100 32gb

r/LocalLLaMA Jun 1

Entire world: We need more GPUs. Meanwhile, Jensen Huang:

r/LocalLLaMA Jun 1

A 1B humanizer that matches human writing on an AI detector

r/LocalLLaMA Jun 1

Just found a 1-click RCE in pewdiepie's Odysseus Chat

r/LocalLLaMA Jun 1

Open Models - May 2026

r/LocalLLaMA Jun 1

next MiniMax will be released in ~10 Days

r/LocalLLaMA Jun 1

NVIDIA announces Nemotron 3 Ultra

r/LocalLLaMA Jun 1

when you spend 5 days fine-tuning a model and it still confidently makes things up

r/LocalLLaMA Jun 1

MiniMax M3 - Coding & Agentic Frontier, 1M Context, Multimodal

r/LocalLLaMA Jun 1

Minimax M3 seems to be rolling out on the API

r/LocalLLaMA Jun 1

Get you some GPUs, it's not worth the hacks around lack of RAM

r/LocalLLaMA Jun 1

Semantic Step Prediction: Multi-Step Latent Forecasting in LLM Reasoning Trajectories via Step Sampling

r/LocalLLaMA May 31

GPU Prices. Buy now, or buy later?

r/LocalLLaMA May 31

G7 agrees on shared language around open-source AI and open weights AI

r/LocalLLaMA May 31

God dammit Qwen

r/LocalLLaMA May 31

I ported NVIDIA Parakeet (speech-to-text) to ggml: same output as NeMo, faster, GGUF-quantized, no Python

r/LocalLLaMA May 31

What's this sub geebral opinion on quantisizing the KV cache

r/LocalLLaMA May 31

Whats actually happening when a model spills out of VRAM into system memory?

r/LocalLLaMA May 31

Llama Studio v0.2.0

r/LocalLLaMA May 31

Qwen3.6-35B vs Gemma4-26B on 7900 XTX

r/LocalLLaMA May 31

(YT) PewDiePie released his harness/webui

r/LocalLLaMA May 31

We might have a winner with the upcoming N1X

r/LocalLLaMA May 31

Added an old 2070 Super to my rig and I can't go back...worse, now I need more

r/LocalLLaMA May 31

13 abliterated Gemma 4 E2B variants, 44 GPU hours, Benchmark and Comparison - Abliterlitics

r/LocalLLaMA May 31

Stepfun 3.7 Flash is very good

r/LocalLLaMA May 31

Flash Attention for llama.cpp on RDNA3: 47% less KV VRAM than Vulkan f16 K, KLD almost losselss on F16 K / q4_0 V. Part 1.

r/LocalLLaMA May 31

<Think> toggle button for llama.cp web chat for QWEN3.6

r/LocalLLaMA May 31

It's funny how everything changes, yet somehow stays the same.

r/LocalLLaMA May 31

Dell confirms XPS laptop with NVIDIA N1X at Computex ( basically a DGX Spark GB10 for consumers with Windows )

r/LocalLLaMA May 31

My home data center

r/LocalLLaMA May 31

Someone out there likely needs this

r/LocalLLaMA May 30

r/LocalLLaMA

Keyboard