Why invest so much time and money in a feature that prevents such a small percentage of data breaches that it's not even categorized on the 2024 Verizon Data Breach Investigations Report?
The vast majority of breaches are caused by credential theft, phishing, and exploiting vulnerabilities.
It doesn't matter that you can cryptographically verify that a package came from a given commit if that commit has accidentally-vulnerable code, or someone just gets phished.
I believe this is a system where a human/system builds a package and uploads and cryptographically signs it, verifying end to end that the code uploaded to github for widget-package 3.2.1 is the code you're downloading to your laptop for widget-package 3.2.1 and there's no chance it is modified/signed by a adversarial third party
That's my understanding also, but I still feel like 'who cares' about that attack scenario. Am I just insufficiently paranoid? Is this kind of attack really likely? (How is it done, other than evil people at pypi?)
Yes, it is likely. It is done by evil intermediaries on hosts that are used to create and upload the package. It is possible for example if the package is created and uploaded on the developer laptop which is compromised.
---
From the docs:
> PyPI's support for digital attestations defines a strong and verifiable association between a file on PyPI and the source repository, workflow, and even the commit hash that produced and uploaded the file.
It still doesn’t protect against rogue commits to packages by bad actors. Which, IMO, is the larger threat (and one that’s been actively exploited).
So while a step in the right direction, it certainly doesn’t completely solve the supply chain risk.
It is likely in the same way that aliens are statistically likely. There is no evidence so far afaik, but you will likely never find out when it happens, and it can compromise the whole world if it were to happen to a widely used package. It is not worth the risk to not have the feature. I even think it should ideally eventually become mandatory.
It’s honestly a bit nuts that in 2024 a system as foundational as PyPI just accepts totally arbitrary, locally built archives for its “packages”.
I appreciate that it’s a community effort and compute isn’t free, but Launchpad did this correctly from the very beginning — dput your signed dsc and it will build and sign binary debs for you.
You are correct. Start distributing and requiring hashes with your Python dependencies instead.
This thing is a non-solution to a non-problem. (The adversaries won't be MiTM'ing Github or Pypi.)
The actual problem is developers installing "foopackage" by referencing the latest version instead of vetting their software supply chain. Fixing this implies a culture change where people stop using version numbers for software at all.
But that involved one of the developers of said package committing malicious code and it being accepted and then deployed. How would this prevent that from happening?
I thought this was about ensuring the code that developers pushed is what you end up downloading.
No, part of the malicious code is in test data file, and the modified m4 file is not in the git repo. The package signed and published by Jia Tan is not reproducible from the source and intentionally done that way.
You might want to revisit the script of xz backdoor.
An absolutely irrelevant detail here. While there was an additional flourish of obfuscation of questionable prudence, the attack was not at all dependent on that. It’s a library that justifies all kinds of seemingly innocuous test data. There were plenty of creative ways to smuggle in selective backdoors to the build without resorting to a compromised tar file. The main backdoor mechanism resided in test data in the git repo, the entire compromise could have.
>Using a Trusted Publisher is the easiest way to enable attestations, since they come baked in! See the PyPI user docs and official PyPA publishing action to get started.
For many smaller packages in this top 360 list I could imagine this representing quite a bit of a learning curve.
Or it could see Microsoft tightening its proprietary grip over free software by not only generously offering gratis hosting, but now also it's a Trusted Publisher and you're not - why read those tricky docs? Move all your hosting to Microsoft today, make yourself completely dependent on it, and you'll be rewarded with a green tick!
Microsoft for sure has an ulterior motive here, and the PyPI devs are serving it. It's not a bad thing, it's a win-win for both parties. That kind of carrot is how you get buy-in from huge companies and in return they do free labor for you that secures your software supply chain.
I think it's pretty hard to get a Python package into the top 360 list while not picking up any maintainers who could climb that learning curve pretty quickly. I wrote my own notes on how to use Trusted Publishers here: https://til.simonwillison.net/pypi/pypi-releases-from-github
The bigger problem is for projects that aren't hosting on GitHub and using GitHub Actions - I'm sure there are quite a few of those in the top 360.
I expect that implementing attestations without using the PyPA GitHub Actions script has a much steeper learning curve, at least for the moment.
I suspect that most of the packages in the top 360 list are already hosted on GitHub, so this shouldn’t be a leap for many of them. This is one of the reasons we saw Trusted Publishing adopted relatively quickly: it required less work and was trivial to adopt within existing CI workflows.
The implementation is interesting - it's a static page built using GitHub Actions, and the key part of the implementation is this Python function here: https://github.com/trailofbits/are-we-pep740-yet/blob/a87a88...
If you read the code you can see that it's hitting pages like https://pypi.org/simple/pydantic/ - which return HTML - but sending this header instead:
Then scanning through the resulting JSON looking for files that have a provenance that isn't set to null.Here's an equivalent curl + jq incantation:
The vast majority of breaches are caused by credential theft, phishing, and exploiting vulnerabilities.
It doesn't matter that you can cryptographically verify that a package came from a given commit if that commit has accidentally-vulnerable code, or someone just gets phished.
---
From the docs:
> PyPI's support for digital attestations defines a strong and verifiable association between a file on PyPI and the source repository, workflow, and even the commit hash that produced and uploaded the file.
I appreciate that it’s a community effort and compute isn’t free, but Launchpad did this correctly from the very beginning — dput your signed dsc and it will build and sign binary debs for you.
This thing is a non-solution to a non-problem. (The adversaries won't be MiTM'ing Github or Pypi.)
The actual problem is developers installing "foopackage" by referencing the latest version instead of vetting their software supply chain. Fixing this implies a culture change where people stop using version numbers for software at all.
/"Rolls eyes" would be an understatement/
I thought this was about ensuring the code that developers pushed is what you end up downloading.
You might want to revisit the script of xz backdoor.
For many smaller packages in this top 360 list I could imagine this representing quite a bit of a learning curve.
The bigger problem is for projects that aren't hosting on GitHub and using GitHub Actions - I'm sure there are quite a few of those in the top 360.
I expect that implementing attestations without using the PyPA GitHub Actions script has a much steeper learning curve, at least for the moment.