Learning Center

Dependencies and Provenance

What is a software dependency?

A software dependency implies a relationship between multiple software components with one component relying on another to work. They include frameworks, libraries, plugins and more. They can be first party or third party proprietary or open source software.

‍

What is software dependency management?

A compromise or vulnerability at one part of the supply chain potentially introduces a compromise or vulnerability to every component down the chain.

This means that a vulnerable library can compromise any downstream software that uses it. It also means that any compromised system involved in the SDLC can have cascading effects on anything downstream from that system. Let’s look at the log4shell vulnerability from 2021. Not everything that used a vulnerable version of log4j could easily be compromised or attacked but it still led to massive audits of supply chains in an attempt to discover what might be vulnerable and where it lived. Similarly, if we look at the SolarWinds SUNBURST attack that was discovered in 2020, a compromise against the build systems led to potentially all software being built by that system being compromised.

‍

What is the bottom turtle and why is it a problem?

In software supply chain security, the term "bottom turtle" is a metaphor to represent the foundational components or dependencies upon which a software product relies. Just like the proverbial stack of turtles holding up the Earth in some myths, software often relies on layers of dependencies, with each layer built upon the ones below it.

In the context of supply chain security, securing the "bottom turtles" involves ensuring the integrity and security of the foundational components and dependencies upon which a software product relies. This can include libraries, frameworks, operating systems, and other fundamental software components. When foundational components are compromised or vulnerable, it can have cascading effects on the security of the entire software supply chain.

You need to apply supply chain security practices to the systems you own and operate, and verify that your dependencies do the same (see diagram below). This set of verifications can be done ad infinitum and leads to the “bottom turtle problem.” Therefore, ensuring the security of all dependencies is crucial for mitigating risks in the software supply chain and maintaining the overall security of software products.

‍

What is software provenance?

Provenance is a record of the history or origin of something. In the case of software and other IT systems, provenance includes git commit records that show who wrote the code and when, build logs that show how source code was transformed into runnable software, and the Software Bill of Materials (SBOM) that shows what upstream dependencies you include in your systems.

There are two key outcomes for provenance.

Establishing a chain of custody for each step of your SDLC—from the developer writing the code to the build, publication, and eventual deployment of your software and systems.
Linking your supply chain to that of your upstream dependencies and, in the case where you distribute software to users, making it easy for users to link your supply chain.

Thus, provenance helps you better understand your supply chain while enabling any downstream users to better understand theirs.

‍

How can I determine my software’s provenance?

Provenance can come in many forms and include the following:

Log files - includes logs for build, deployment, monitoring and runtime
Change management records
SBOMs
Cryptographic signatures and hashes
Build or continuous integration metadata

‍

Provenance is very useful, but can it be falsified?

Falsifying provenance is the act of manipulating provenance into not representing the truth. This can be done at the time of provenance generation. For example, if a build is compromised, it could misreport its own build logs. In the case of change management, someone may falsely assert that a review happened when it didn’t.

In other situations, someone or something could manipulate the provenance data after it has been generated, either while in transit or at rest in a datastore. One of the many mechanisms to ensure that the provenance is accurate and trustworthy is to ensure that it is generated by trusted identities. If you can prove that provenance came from a system you own and secured or from a known good actor, it’s easier to trust it.