Erdős Problem #1026

(terrytao.wordpress.com)

109 points | by tzury 8 hours ago

2 comments

  • baq 6 hours ago
    I have no comments about the result itself, but the process and the AI policy which facilitated it is inspiring and easily transferable to any moderately complicated software engineering problem. Much to learn regardless of the maths.
    • robrenaud 5 hours ago
      I think you underestimate how powerful lean is, and close it is to the tedious part of formal math. A theorem prover needs consult no outside resource. A formal math LLM-like generator need only consult the theorem prover to get rid of hallucinations. This is why it's actually much easier than SWE to optimize/hill climb on.

      Low level, automated theorem providing is going to fall way quicker than most expected, like AlphaGo, precisely because an MCTS++ search over lean proofs is scalable/amendable to self play/relevant to a significant chunk of professional math.

      Legit, I almost wish the US and China would sign a Formal Mathematics Profileration Treaty, as a sign of good will between very powerful parties who have much to gain from each other. When your theorem prover is sufficiently better than most Fields medalists alive, you share your arch/algorithms/process with the world. So Mathematics stays in the shared realm of human culture, and it doesn't just happen to belong to DeepMind, OpenAI, or Deepseek.

      • baq 3 hours ago
        On the contrary I think we're low key on the verge of model checkers being widely deployed in the industry. I've been experimenting with Opus 4.5 + Alloy and the preliminary results I'm getting are crossing usability thresholds in a step-function pattern (not surprising IMHO), I just haven't seen anyone pick up on it publicly yet.

        The workflow I'm envisioning here is the plan document we're all making nowadays isn't being translated directly into code, but into a TLA+/Alloy/... model as executable docs and only then lowered into the code space while conformance is continuously monitored (which is where the toil makes it not worth it most of the time without LLMs). The AI literature search for similar problems and solutions is also obviously helpful during all phases of the sweng process.

        • robot-wrangler 38 minutes ago
          > The workflow I'm envisioning here is the plan document we're all making nowadays isn't being translated directly into code, but into a TLA+/Alloy/... model as executable docs and only then lowered into the code space while conformance is continuously monitored

          I'm sure we've agreed on this before, but I agree again ;) There are dozens of us at least, dozens! There's also a recent uptick in posts with related ideas, for example this hit the front-page briefly ( https://news.ycombinator.com/item?id=46251667 ).

          I was tempted to start with alloy/tla for my own experiments along these lines due to their popularity, but since the available training data is so minimal for everything in the space.. I went with something more obscure (MCMAS) just for access to "agents" as primitives in the model-checker.

      • gaigalas 2 hours ago
        Is it trivial for any mathematician to understand lean code?

        I'm curious if there is a scenario in which a large automated proof is achieved but there would be no practical means of getting any understanding of what it means.

        I'm an engineer. Think like this: a large complex program that compiles but you don't understand what it does or how to use it. Is such a thing possible?

        • gus_massa 52 minutes ago
          With very difficult human generated proof, it's common that it take like 10 or 20 to make it understandable for mortals. The idea is to split the proof, create new notation, add intermedite steps that are nice, find a simpler path. It's like refactoring.

          Sometimes the original proof is compleyely replaced, bit by bit, until there is an easy to understand version.

        • JulianWasTaken 1 hour ago
          It's not trivial for a mathematician to understand Lean code, but it's something that's possible to learn to read and interpret in a day (without then necessarily being proficient in how to write it).

          That's true though of Lean code written by a human mathematician.

          AI systems are capable (and generally even predisposed to) producing long and roundabout proofs which are a slog to decipher. So yes the feeling is somewhat similar at times to an LLM giving you a large and sometimes even redundant-in-parts program.

        • QuesnayJr 23 minutes ago
          There's a couple of problems that were solved that way a while ago, and they have been formalized, but not in Lean:

          https://en.wikipedia.org/wiki/Four_color_theorem

          https://en.wikipedia.org/wiki/Kepler_conjecture

    • nsoonhui 5 hours ago
      But software engineering problems are more fuzzy and less amendable to mathematical analysis, so exactly how can those AI policies developed for math be applied to software engineering problems?
      • boerseth 5 hours ago
        Not sure which way the difference puts the pressure. Does the fuzziness require more prudent policies, or allow us to get away with less?
  • tzury 6 hours ago
    This case study reveals the future of AI-assisted[1] work, far beyond mathematics.

    It relies on a combination of Humans, LLMs ('General Tools'), Domain-Specific Tools, and Deep Research.

    It is apparent that the static data encoded within an LLM is not enough; one must re-fetch sources and digest them fresh for the context of the conversation.

    In this workflow, AlphaEvolve, Aristotle, and LEAN are the 'PhDs' on the team, while the LLM is the Full Stack Developer that glues them all together.

    [1] If one likes pompous terms, this is what 'AGI' will actually look like.

    • philipwhiuk 1 hour ago
      Aristotle is already an LLM and LEAN combined.

      [from the Aristotle paper]

      > Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver.

      [from elsewhere on how part 2 works]

      > To address IMO-level complexity, Aristotle employs a natural language module that decomposes hard problems into lists of informally reasoned lemmas. This module elicits high-level proof sketches and supporting claims, then autoformalizes them into Lean for formal proving. The pipeline features iterative error feedback: Lean verification errors are parsed and fed back to revise both informal and formal statements, iteratively improving the formalization and capturing creative auxiliary definitions often characteristic of IMO solutions.

    • 9u127v89yik2j3 3 hours ago
      The author is the PhD on the team.

      Literally not AGI.