At the end of 2021, a critical vulnerability was discovered in the Apache-Log4j logging tool. This Log4j tool and vulnerability became infamous because it was used by millions of software packages in organizations that had no idea it existed within their software supply chain. Even organizations that develop their own software often leverage third-party commercial and open-source software to support their commercial services.

Software supply chain risk has become a major concern for private sector companies and government agencies of all sizes. There’s even a legislative effort in the Senate Homeland Security and Governmental Affairs Committee to help secure open source software. Unpacking this supply chain and finding methods to estimate and reduce risk is a huge problem for a number of reasons.

First, the number of open source packages and libraries is enormous. Github, an online platform that manages software for others, hosts over 200 million software repositories. And each programming language uses its own system to track software in its ecosystems. Javascript and Python, two very popular programming languages, support over a million packages combined.

Second, very little is known about the extent to which organizations use these packages. There is no authoritative directory describing which companies use which software components. In fact, companies themselves may not even know the extent of the software they use for their critical business operations. A research collaboration between Harvard University and the Open Source Software Foundation has begun surveying companies to estimate the prevalence of software use in companies, but so far this only provides a small account of the software actually used by companies in the United States.

Thirdly, the tools for analyzing this risk remain to be constructed. Software Bills of Materials (SBOMs) serve as an ingredient list for software applications. SBOMs are increasingly popular and have even been mandated by presidential decree. The intent is for an SBOM to list all of the software components needed to run a given package, helping users identify and manage their software risks. However, the actual practice of their creation and disclosure is still evolving. For example, it’s not clear how many layers deep an SBOM should expose a software supply chain. Some packages (like Log4j) may have thousands and thousands of dependencies, and it’s not clear if that many details are useful or even necessary.

But there may be hope for a better understanding of this risk.

First, the data exists to document and map this vast network. They are incomplete and difficult to find, but they exist. Libraries.io and deps.dev are two community efforts that offer dependency data in several programming languages, from which network maps and network analyzes can be created and analyzed. Similarly, package managers of some software languages ​​provide information that could also be used to map their software ecosystem. Together, these data could fill a huge gap in our understanding of software dependency. And using standard network analysis techniques, the software components most critical to ecosystems could begin to be identified.

Second, as the practice of creating and using SBOMs becomes more mature, users may become better at ingesting information, comparing SBOMs between applications, and identifying riskier components. For example, one approach to using SBOMs to visualize risk might be to sort through all the software packages listed in a given SBOM and collect the known vulnerabilities of each, information readily available from the National Institute of Standards and Technology. Each vulnerability could then be plotted against its impact, using the standard Common Vulnerability Scoring System, and its exploitability, using the standard Exploit Prediction Scoring System, on a graph to make it easier to visualize the risk.

From there, organizations could visually inspect, compare, and develop strategies to mitigate the risk of one or more software applications.

Software supply chain security has become a major risk due to the massively fragmented and decentralized nature of modern software development. Unlike other cybersecurity issues, this is a discrete issue where the data exists. The information required to map dependencies or software dependencies is known because there is a finite limit to the number of nodes and dependencies. And so, while we still have a lot to learn as a community about this risk, there are concrete steps we can take to better understand and mitigate the risk.

Sasha Romanosky is a senior policy researcher at the nonprofit, nonpartisan RAND Corporation, an appointed member of the Department of Homeland Security’s Data Privacy and Integrity Advisory Board, and a former cyberpolicy adviser to the Pentagon in the Office of the Secretary of Defense. for Politics.

Source link

Leave A Reply