What I find impressive with V3.1 are the things that are different, especially efficiency:
Significant improvements in training efficiency through innovations like FP8 mixed precision training, which reduces memory use by up to 75% and accelerates training.
Faster inference speed with multi-token prediction architecture, generating multiple tokens per step, resulting in 2-3x faster outputs.
New hybrid thinking mode that allows switching between fast non-thinking mode and slower, more thoughtful reasoning without quality loss.
Significant improvements in training efficiency through innovations like FP8 mixed precision training, which reduces memory use by up to 75% and accelerates training.
Faster inference speed with multi-token prediction architecture, generating multiple tokens per step, resulting in 2-3x faster outputs.
New hybrid thinking mode that allows switching between fast non-thinking mode and slower, more thoughtful reasoning without quality loss.