DeepSeek v3.1 is not having a moment

(thezvi.substack.com)

5 points | by paulpauper 5 hours ago

1 comments

  • karmakaze 51 minutes ago
    What I find impressive with V3.1 are the things that are different, especially efficiency:

    Significant improvements in training efficiency through innovations like FP8 mixed precision training, which reduces memory use by up to 75% and accelerates training.

    Faster inference speed with multi-token prediction architecture, generating multiple tokens per step, resulting in 2-3x faster outputs.

    New hybrid thinking mode that allows switching between fast non-thinking mode and slower, more thoughtful reasoning without quality loss.