3 comments

  • randomtoast 1 hour ago
    I have been following recent progress in the formalization of mathematical proofs in Lean, particularly in the context of large language models. One prominent advocate of this approach is Terence Tao, who regularly writes about developments in this area.

    From a programmer's perspective, this puts up an interesting parallel. Models such as Sonnet or Opus 4.5 can generate thousands of lines of code per hour. I can review the output, ask the model to write tests, iterate on the result, and after several cycles become confident that the software is sufficiently correct.

    For centuries, mathematicians developed proofs by hand, using pen and paper, and were able to check the proofs of their peers. In the context of LLMs, however, a new problem may arise. Consider an LLM that constructs a proof in Lean 4 iteratively over several weeks, resulting in more than 1,000,000 lines of Lean 4 code and concluding with a QED. At what point is an mathematician no longer able to confirm with confidence that the proof is correct?

    Such a mathematician might rely on another LLM to review the proof, and that system might also report that it is correct. We may reach a stage where humans can no longer feasibly verify every proof produced by LLMs due to their length and complexity. Instead, we rely on the Lean compiler, which confirm formal correctness, and we are effectively required to trust the toolchain rather than our own direct understanding.

    • lgas 44 minutes ago
      I'm not sure I understand what you're getting at -- your last paragraph suggestions that you understand the point of formal specification languages and theorem provers (ie. for the automated prover to verify the proof such that you just have to trust the toolchain) but in your next to last paragraph you speak as if you think that human mathematicians need to verify the lean 4 code of the proof? It doesn't matter how many lines the proof is, a proof can only be constructed in lean if it's correct. (Well, assuming it's free of escape hatches like `sorry`).
      • practal 4 minutes ago
        > Well, assuming it's free of escape hatches like `sorry`

        There are bugs in theorem provers, which means there might be "sorries", maybe even malicious ones (depending on what is at stake), that are not that easy to detect. Personally, I don't think that is much of a problem, as you should be able to come up with a "superlean" version of your theorem prover where correctness is easier to see, and then let the original prover export a proof that the superlean prover can check.

        I think more of a concern is that mathematicians might not "understand" the proof anymore that the machine generated. This concern is not about the fact that the proof might be wrong although checked, but that the proof is correct, but cannot be "understood" by humans. I don't think that is too much of a concern either, as we can surely design the machine in a way that the generated proofs are modular, building up beautiful theories on their own.

        A final concern might be that what gets lost is that humans understand what "understanding" means. I think that is the biggest concern, and I see it all the time when formalisation is discussed here on HN. Many here think that understanding is simply being able to follow the rules, and that rules are an arbitrary game, and that is simply not true. Obviously not, because think about it, what does it mean to "correctly follow the rules"?

        I think the way to address this final concern is to put beauty at the heart of our theorem provers. We need beautiful proofs, written in a beautiful language, checked and created by a beautiful machine.

  • oersted 2 hours ago
    The motivation itself is quite fresh and compelling even as a standalone article.

    https://sdiehl.github.io/zero-to-qed/02_why.html

  • IshKebab 53 minutes ago
    This looks really good and complete. And I like that it starts from normal programming stuff rather than immediately jumping into type universes and proof expressions.

    Edit: some feedback on the monads page:

    1. "The bind operation (written >>=) is the heart of the monad." .. but then it doesn't use >>= at all in the code example.

    2. "The left arrow ← desugars to bind; the semicolon sequences operations." .. again, no semicolons in the code example.

    3. I don't really understand the List Monad section. The example doesn't seem to use any monad interfaces.

    4. "The same laws look cleaner in the Kleisli category, where we compose monadic functions directly. If f:A→MB and g:B→MC, their Kleisli composition is g∘f:A→MC" - it was going so well!