Conceptualizing Dependencies (2019): rpdillon.net

Conceptualizing Dependencies (2019)

2019-11-20

Most programs rely on various libraries and tools. What makes a dependency a good thing to add? Are there costs to choosing dependencies poorly?

Choosing Dependencies

Modern languages have their own package managers, designed to make it easy to bring in dependencies. Very few of these tools give developers the context to evaluate the true cost of bringing in a dependency, though. This leads to a precarious situation where it is easy for a developer to see the benefits of bringing in a dependency, but makes the costs much more difficult to evaluate, particularly in terms of stability and quality. This creates a huge risk when selecting dependencies: a dependency will be selected primarily for its benefits, but with little awareness of its costs, which fundamentally compromises the software's architectural integrity. We'll discuss this more later.

The Five Dimensions

A dependency can be evaluated in terms of its intrinsic qualities, like whether it's well tested and well documented or not. It also can be evaluated by how and when it couples to your software, like whether it's a compile-time or runtime dependency, how many parts of your program depend on it, and whether you're importing all of it or only a small part. I've broken down these considerations into five dimensions: Time, Space, Size, Stability, and Quality.

Time

One way to think about dependencies is when they happen. In an interpreted language like elisp, Python, Ruby, or Lua, you can write the program without anything more than a text editor, but to run the program, you'll need the interpreter. These kind of dependencies are therefore called "runtime dependencies". Some languages are compiled, like Haskell, Go, and C++, so producing a runnable program requires a compiler. Once compiled, however, the compiler is no longer needed, and the program can be run freely without it. These kinds of dependencies are aptly named "compile-time dependencies". In general, runtime dependencies place greater costs on your software than compile time dependencies.

Space

Dependencies also occur in space: where will this dependency be used? Maybe it's used in a production environment, or maybe its only used during debugging on a developer workstation. Some dependencies may only be needed in special environments, like continuous integration that compiles code and runs tests every time a new change is committed to source control. Keeping track of why a dependency was introduced improves maintainability of the software. Production dependencies tend to be more costly than developer or CI dependencies, but this isn't always true. If your tests rely on a particular mocking framework, for example, it can be very costly to migrate to a new one if that framework is no longer maintained.

Size

Dependencies have size. Some languages and libraries are large and complex, and perhaps bring along dependencies of their own. Large dependencies carry greater risk: if they are large in size, then they use more disk space, and possibly more memory. But that doesn't mean you can ignore the size of dependency on powerful machines: large dependencies also increase compilation and linking time (if applicable), and take longer to download and update than small dependencies. It's not uncommon for developers to write desktop applications using a web stack by bringing in Electron as a dependency, for example. There's no end to the criticism that Electron apps tend to be bloated, however. The lesson here is that many end users notice when dependencies are managed poorly!

Stability

If a dependency has many dependencies of its own, its rate of change is a function of the stability of those dependencies. Stability (in this context) is a function of how often a dependency changes, and how often those changes alter how a program interacts with the dependency. Put another way, when I upgrade the dependency, do I have to do extra work to change my program to make sure it still works correctly? Understanding the stability of a program's dependencies is critical to estimating how much it will cost to keep the program working well over time. Dependencies with low stability are more costly than those with high stability.

Quality

Finally, like all tools, dependencies have varying quality. When the unexpected happens, a low quality dependency might crash, fail silently, or raise a generic exception. High quality dependencies raise a specific exception and print useful messages at a predictable logging level. They are also designed for configuration and extensibility, so they can adapt to various use cases. Low quality dependencies are monolithic: you either get all of the library, or none. High quality libraries are modular, allowing programs to import specific pieces of functionality as needed. Further, they are adaptable to many environments, and make very few assumptions about where they are being run. Low quality dependencies make (often silent) assumptions about where they run. Finally, high quality programs document their both their assumptions, their functionality, as well as the intended use of that functionality. In short, high quality dependencies are just less expensive over the long term.

Architectural Concerns & Pitfalls

At the end of the day, dependencies are just software, much like whatever software you're developing. This means all the same architectural concerns that apply to the software you're writing also apply to your dependencies.

Use Stable Core Modules

The more dependencies a software module has, the less stable it is, since changes in any of the dependencies can necessitate a change in the module as well. It's desirable to for a module to be quite stable if many other modules depend upon it, since it's costly to have a module that has many dependencies while also having many other modules depending on it, since changes will ripple through the software and increase maintenance costs.

Many consider frameworks to be a good way to develop software, but even they suffer from this. I worked on a collection of Rails codebases for 7 years. Rails is a large, complex dependency that itself brings in many other dependencies. As expected, every time we upgraded one of our larger applications from one version of Rails to another, it was a project that we carved out time and engineers to work on, often taking anywhere from a few hours to a couple of weeks. This was worth it because Rails brings an awful lot to the table in terms of engineering efficiency, but there are many dependencies for which it simply wouldn't be worth it.

Import the Smallest Scope Possible

Even when working with stable dependencies, it is important to specify which version of the dependency you expect, since all maintained software changes over time. Failure to pin a version (or version range) leads to unexpected versions being pulled in (either at compile time or at runtime) that can break the expected contract and generate errors. Unstable dependencies will require very strict version ranges, because they break compatibility often. Stable modules are more flexible in this regard.

While a version minimizes the scope across time, there's also a question of how much code is to be imported. Large modules that have good design allow you to import only a part of them, minimizing the risk associated with large, complex dependencies. Being thoughtful about streamlining imports reduces overall dependency surface area, making the software more stable and adaptable. Reflecting on my time with Rails again, there were many cases where I would argue for only relying on ActiveSupport or ActiveRecord, rather than all of Rails. For apps that didn't have a significant front-end component, I'd push to use the stripped down profile provided by Rails API rather than the larger base installation so that we minimized our dependency surface area.

Record End-of-Life on a Calendar

Finally, note the end-of-life date for software you depend on, and put a reminder on your calendar to reevaluate the landscape a few weeks or months prior to the end-of-life. This could mean choosing an LTS release of Ubuntu, or making sure you're not writing any new Python 2 code. This technique can also be useful to track events like domain registration, SSL certificates, software licenses, and the like.

Closing Thoughts

Manage dependencies carefully! I've always sort of adored Alan Perlis' Epigrams in Programming, and Epigram 55 is particularly salient:

A LISP programmer knows the value of everything, but the cost of nothing.

This has nothing to do with LISP, and everything to do with only looking at the 'return' when evaluating 'return-on-investment'. When it comes to architectural decisions, it's vital to the long-term health of the software to also consider the investment. Making your software dependent on other software is a risk, and weighing the cost and benefit wisely will improve the quality of your time working with that software, as well as improving the quality of your users' experience. Being thoughtful pays dividends!