To Fork or Not to Fork

How you handle your dependencies will change how you secure your software supply chain

Ben Cotton

June 27, 2024

It’s a common question in development: how do you handle your dependencies? You can choose to pull them in from an external repository at build time, cache them to an internal repository, or create a static copy (commonly called “vendoring” or, if you’re maintaining the copy, “forking”). Each of these approaches comes with advantages and disadvantages. So what’s the security-conscious developer to do? 

HashiCorp founder Mitchell Hashimoto recently offered his take:

Unpopular opinion: you should copy/fork/DIY your dependencies for everything but the most complicated or sensitive functionality (GUI, crypto, networking, etc.). Blindly depending on trivial functionality or having a deep dependency tree causes more problems than it solves.

Reasons to fork

It’s true that dependency management is one of the most challenging parts of maintaining supply chain security. Creating a static copy of your upstreams, either directly in your application’s source code or as an external repository, gives you a predictable experience. Forking or vendoring your dependencies can eliminate some supply chain hazards entirely.

Most notably, it means you don’t have to worry about the availability or security of the repository after you’ve made the initial copy. Outages in the language-specific repository (for example: PyPI) or the code forge (for example: GitHub), don’t impact you because you already have the code you needed. Plus, you’re protected against maintainers who decide to vandalize or remove their project (like in the left-pad incident of 2016) or compromises in the repository (like in 2023 when malicious actors impersonated GitHub’s Dependabot to inject malware into projects).

Beyond the availability and integrity considerations, a vendored dependency simplifies the developer experience in a couple of ways. For one, it means that the behavior of the dependency doesn’t change unexpectedly. If you’re pulling in the latest version of a library or application and one day there’s a new version with new behavior, your project may fail to build — or worse: build with subtle changes in behavior.

Vendoring dependencies also avoids a particular flavor of “dependency hell.” Although we often talk about a supply chain or imagine our dependencies as a tree, the reality is that the relationships are a directed acyclic graph (DAG). Looking at only the dependency name doesn’t tell you enough; your upstreams may depend on mutually-exclusive versions of the same upstream library. Vendoring those indirect dependencies into your application avoids this difficult-to-solve problem.

Reasons not to fork

The previous sentence implies a challenge with forking your direct dependencies: you may have to fork your indirect dependencies, too. With the full dependency chain encompassing dozens or hundreds of independent dependencies, that’s a lot of work to avoid what could be a minor inconvenience.

Of course, the big issue with forking your dependencies is that you don’t get any bug fixes that happen after you’ve made your copy. Sure, you might avoid new vulnerabilities that arise, but you have to keep track of the fixes that happen and apply them yourself. That’s a lot of work, especially when you consider that the vast majority of vulnerabilities come from indirect dependencies. So you have to track the vulnerabilities for dozens or hundreds of packages and then apply the fixes yourself — a task that becomes more difficult the more the upstream project has diverged from the point where you copied it.

And what happens when a new upstream release contains features that you want to start using? If you’ve made a simple copy, then you can update your copy and debug any issues that crop up, just like you would with an external copy. If you’ve made a “true” fork where you’ve made local changes to the copy, then you have to bring those changes forward into the new code base. Again, depending on how much the code bases have diverged, this may be a difficult task.

What to do?

This is a complicated problem; anyone who claims to have a simple answer is trying to sell you something. The real answer is: it depends.

For trivial dependencies, vendoring the code is a reasonable solution; left-pad is a great example. It’s a few lines of code, with no real functionality to add (in fact, it’s now obviated by a built-in function) or likelihood of new bugs. In general, the smaller and more “done” a project is, the better a candidate it is for vendoring.

Another approach that can help is pinning dependencies to a particular version or range of versions. Pinning to a specific version doesn’t make much day-to-day difference from vendoring or forking, but it does make the process of updating simpler. In particular, it makes it easy to test the new version and quickly roll back if it doesn’t work. If you use a range of versions, as many language ecosystems support, then you can automatically upgrade to point releases that contain bug fixes, but not automatically go to a new major version that may include behavior changes. This middle ground works in a lot of cases.

If you have the resources, doing regular scratch builds with the latest versions of the dependencies can help you find issues early. By keeping pace with your dependencies, making an urgent update to address a vulnerability becomes a small step instead of a giant leap.

Simplify, simplify

Simplifying the software supply chain is an important part of reducing security risk. Every line of code is another potential bug or attack vector. When you use only what you need, you avoid unnecessary risks. When you can’t entirely remove a dependency, forking it can be a way of simplifying by turning it into something under your control. The tradeoff is that you are now directly responsible for the security of that code.

The first step to making a simpler supply chain is understanding what your supply chain looks like in the first place. Using a tool like Graph for Understanding Artifact Composition (GUAC) to get a view of the supply chain across all of your applications will help you understand the current state. From there, you can examine each of your dependencies to determine the best approach for keeping them secure in the future.

Like what you read? Share it with others.

Other blog posts 

The latest industry news, interviews, technologies, and resources.

View all posts

Previous

No older posts

Next

No newer posts

Want to have a conversation about your software supply chain?

We’d love to hear from you.  Get in touch and we'll get back to you.

Say Hello
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.