Are We PEP740 Yet?

(trailofbits.github.io)

55 points | by djoldman 5 hours ago

4 comments

  • simonw 3 hours ago
    I suggest reading this detailed article to understand why they built this: https://blog.trailofbits.com/2024/11/14/attestations-a-new-g...

    The implementation is interesting - it's a static page built using GitHub Actions, and the key part of the implementation is this Python function here: https://github.com/trailofbits/are-we-pep740-yet/blob/a87a88...

    If you read the code you can see that it's hitting pages like https://pypi.org/simple/pydantic/ - which return HTML - but sending this header instead:

        Accept: application/vnd.pypi.simple.v1+json
    
    Then scanning through the resulting JSON looking for files that have a provenance that isn't set to null.

    Here's an equivalent curl + jq incantation:

        curl -s \
          -H 'Accept: application/vnd.pypi.simple.v1+json' \
          https://pypi.org/simple/pydantic/ \
        | jq '.files | map(select(.provenance != null)) | length'
  • cyrnel 19 minutes ago
    Why invest so much time and money in a feature that prevents such a small percentage of data breaches that it's not even categorized on the 2024 Verizon Data Breach Investigations Report?

    The vast majority of breaches are caused by credential theft, phishing, and exploiting vulnerabilities.

    It doesn't matter that you can cryptographically verify that a package came from a given commit if that commit has accidentally-vulnerable code, or someone just gets phished.

  • marky1991 3 hours ago
    Could someone explain why this is important? My uninformed feeling towards PEP 740 is 'who cares?'.
    • hadlock 3 hours ago
      I believe this is a system where a human/system builds a package and uploads and cryptographically signs it, verifying end to end that the code uploaded to github for widget-package 3.2.1 is the code you're downloading to your laptop for widget-package 3.2.1 and there's no chance it is modified/signed by a adversarial third party
      • marky1991 3 hours ago
        That's my understanding also, but I still feel like 'who cares' about that attack scenario. Am I just insufficiently paranoid? Is this kind of attack really likely? (How is it done, other than evil people at pypi?)
        • OutOfHere 2 hours ago
          Yes, it is likely. It is done by evil intermediaries on hosts that are used to create and upload the package. It is possible for example if the package is created and uploaded on the developer laptop which is compromised.

          ---

          From the docs:

          > PyPI's support for digital attestations defines a strong and verifiable association between a file on PyPI and the source repository, workflow, and even the commit hash that produced and uploaded the file.

          • cyrnel 5 minutes ago
            I'd encourage you to read the Verizon DBIR before making statements about whether a given attack is likely or not. Hijacking build systems is not likely: https://www.verizon.com/business/resources/reports/dbir/
          • abotsis 2 hours ago
            It still doesn’t protect against rogue commits to packages by bad actors. Which, IMO, is the larger threat (and one that’s been actively exploited). So while a step in the right direction, it certainly doesn’t completely solve the supply chain risk.
          • marky1991 1 hour ago
            Could you explain why you think it is a likely risk? Has this attack happened frequently?
            • OutOfHere 14 minutes ago
              It is likely in the same way that aliens are statistically likely. There is no evidence so far afaik, but you will likely never find out when it happens, and it can compromise the whole world if it were to happen to a widely used package. It is not worth the risk to not have the feature. I even think it should ideally eventually become mandatory.
          • mikepurvis 2 hours ago
            It’s honestly a bit nuts that in 2024 a system as foundational as PyPI just accepts totally arbitrary, locally built archives for its “packages”.

            I appreciate that it’s a community effort and compute isn’t free, but Launchpad did this correctly from the very beginning — dput your signed dsc and it will build and sign binary debs for you.

        • otabdeveloper4 51 minutes ago
          You are correct. Start distributing and requiring hashes with your Python dependencies instead.

          This thing is a non-solution to a non-problem. (The adversaries won't be MiTM'ing Github or Pypi.)

          The actual problem is developers installing "foopackage" by referencing the latest version instead of vetting their software supply chain. Fixing this implies a culture change where people stop using version numbers for software at all.

      • TZubiri 1 hour ago
        1- Why not compile it? 2- does pip install x not guarantee that?
      • otabdeveloper4 53 minutes ago
        Yeah, because the problem with Python packaging is a lack of cryptographic signatures.

        /"Rolls eyes" would be an understatement/

    • rty32 3 hours ago
      • marky1991 3 hours ago
        But that involved one of the developers of said package committing malicious code and it being accepted and then deployed. How would this prevent that from happening?

        I thought this was about ensuring the code that developers pushed is what you end up downloading.

        • rty32 3 hours ago
          No, part of the malicious code is in test data file, and the modified m4 file is not in the git repo. The package signed and published by Jia Tan is not reproducible from the source and intentionally done that way.

          You might want to revisit the script of xz backdoor.

          • epcoa 2 hours ago
            An absolutely irrelevant detail here. While there was an additional flourish of obfuscation of questionable prudence, the attack was not at all dependent on that. It’s a library that justifies all kinds of seemingly innocuous test data. There were plenty of creative ways to smuggle in selective backdoors to the build without resorting to a compromised tar file. The main backdoor mechanism resided in test data in the git repo, the entire compromise could have.
  • zahlman 4 hours ago
    >Using a Trusted Publisher is the easiest way to enable attestations, since they come baked in! See the PyPI user docs and official PyPA publishing action to get started.

    For many smaller packages in this top 360 list I could imagine this representing quite a bit of a learning curve.

    • amiga386 3 hours ago
      Or it could see Microsoft tightening its proprietary grip over free software by not only generously offering gratis hosting, but now also it's a Trusted Publisher and you're not - why read those tricky docs? Move all your hosting to Microsoft today, make yourself completely dependent on it, and you'll be rewarded with a green tick!
      • simonw 3 hours ago
        I think it's a little rude to imply that the people who worked on this are serving an ulterior motive.
        • akira2501 3 hours ago
          It's possible they're just naive.
        • Spivak 50 minutes ago
          Microsoft for sure has an ulterior motive here, and the PyPI devs are serving it. It's not a bad thing, it's a win-win for both parties. That kind of carrot is how you get buy-in from huge companies and in return they do free labor for you that secures your software supply chain.
      • zahlman 3 hours ago
        Thankfully, the PyPI side of the hosting is done by a smaller, unrelated company (Fastly).
    • simonw 3 hours ago
      I think it's pretty hard to get a Python package into the top 360 list while not picking up any maintainers who could climb that learning curve pretty quickly. I wrote my own notes on how to use Trusted Publishers here: https://til.simonwillison.net/pypi/pypi-releases-from-github

      The bigger problem is for projects that aren't hosting on GitHub and using GitHub Actions - I'm sure there are quite a few of those in the top 360.

      I expect that implementing attestations without using the PyPA GitHub Actions script has a much steeper learning curve, at least for the moment.

    • woodruffw 2 hours ago
      I suspect that most of the packages in the top 360 list are already hosted on GitHub, so this shouldn’t be a leap for many of them. This is one of the reasons we saw Trusted Publishing adopted relatively quickly: it required less work and was trivial to adopt within existing CI workflows.