DeepSeek v3.1 is not having a moment

(thezvi.substack.com)

5 points | by paulpauper 5 hours ago

1 comments

karmakaze 51 minutes ago
What I find impressive with V3.1 are the things that are different, especially efficiency:
Significant improvements in training efficiency through innovations like FP8 mixed precision training, which reduces memory use by up to 75% and accelerates training.
Faster inference speed with multi-token prediction architecture, generating multiple tokens per step, resulting in 2-3x faster outputs.
New hybrid thinking mode that allows switching between fast non-thinking mode and slower, more thoughtful reasoning without quality loss.