CVE-2023-38408, discovered by the Qualys Threat Research Unit (TRU), describes an RCE (remote code execution) vulnerability made possible by an unwanted interaction between OpenSSH’s ssh-agent executable, the dlopen() and dlclose() functions used by a process to load shared libraries, and various other deficiencies in libraries present (or installable) in many Linux distributions.
What is the ssh-agent credential forwarding CVE?
The list above seems like a long list of responsible pieces of code; and indeed, the Qualys team put the RCE together based on a number of separate problems present in different software packages. Fundamentally, the RCE allows an attacker to execute code remotely on a user’s machine, when that user is using ssh-agent in forwarding mode in certain circumstances. The forwarding mode of ssh-agent allows you to forward your ssh credentials to the machine you are logging into, then subsequently reuse your credentials if you ssh again from that first machine to another. This capability makes “ssh hopping” easier but has been cautioned against by the OpenSSH team for a number of years (there are other, safer ways to accomplish something similar). Nonetheless, ssh-agent retains this forwarding ability.
The credential forwarding mechanism can be combined with ssh-agent’s support for PKCS#11 (a standard used to interface with cryptographic tokens or smart cards). In this mode, ssh-agent can be told to load a specific PKCS#11 provider library to interface with a specific brand of token or card reader. There are potentially many such libraries available; indeed, each token vendor might have their own library that they require to be installed on client machines. The fact that each vendor can provide their own implementation library means that ssh-agent must provide a way to specify which provider library to use. In the past, ssh-agent would allow loading these libraries from anywhere on the filesystem, but this was pointed out to the OpenSSH team in 2016 and fixed; subsequently, PKCS#11 libraries must reside in “system” library paths such as /usr/lib.
How does this vulnerability work?
The researchers at Qualys noticed that although the set of available PKCS#11 candidate libraries were restricted to system paths, ssh-agent would still attempt to load any user-supplied library, as long as it resided in a system library path. This means any library, even if it wasn’t a PKCS#11 library, could be specified. This would typically just fail, since a library, say, from Chrome or Firefox obviously isn’t what ssh-agent is looking for. The load of one of those incompatible libraries would fail since it would lack an initialization function expected to be present in PKCS#11 libraries. ssh-agent would open the specified library, only to discover “this is not the library I’m looking for,” and then promptly unload the library.
However, the act of loading and unloading the library causes some unwanted side effects. Shared libraries can have initialization and deinitialization functions that are automatically called right after the library is loaded or immediately before the library is unloaded. This means that if a set of libraries could be located in /usr/lib that had load/unload behaviors that could be chained together to create an exploit, ssh-agent could be coaxed into loading and unloading these libraries in sequence to create the exploit.
How was this vulnerability discovered?
The Qualys research report is top-notch; it’s an amazing piece of work. Consider: the Qualys team installed and checked every library available in every package present in the public Ubuntu repository, cataloging the behavior seen when each library was loaded and unloaded. This set of libraries numbered in the thousands. Certain libraries were seen to do bizarre things in their initialization code, like creating network sockets, registering (and not unregistering on unload) signal handlers for various conditions, and so on. With this information, they constructed a chain of library loads with the behavior sequence they wanted, and that ultimately led to poisoning ssh-agent’s stack. The team then placed shellcode on the stack that launched a reverse shell from ssh-agent, and this allowed them remote network access into the machine as the user running that ssh-agent. Of course, from there, you own that user. (N.B—yes, some Linux libraries still require executable stacks, in 2023).
What’s the cause; how can you avoid being affected in the future?
Who’s to blame for this RCE? Is it ssh-agent? Is it Linux’s implementation of dlopen/dlclose? Is it the libraries from things like libreoffice and gstreamer, whose authors probably never thought their libraries would be loaded into something like ssh-agent? Is it libraries that require executable stacks, a bad practice that we’ve known about for 30 years? I don’t think you can point the finger at one entity here. Rather, it’s an exploit that’s built up from multiple layers, with many dependencies involved. Remove or change one of the pieces, and the exploit falls apart.
So how does this relate to Deepfactor? After all, this is an exploit that requires someone to have invoked ssh-agent in forwarding mode from their workstation or local machine, then logged into some machine capable of being manipulated by an attacker. This doesn’t seem to be an exploit that Deepfactor, which is designed to detect and prevent mistakes made by application developers, would be interested in.
To answer this question, I’d like to point out that even though this is an interesting bug, it’s not likely to affect cloud native applications, based on the scenario outlined above. But this bug does point out something very important; the need to know what’s within arm’s reach of an attacker (perhaps an attacker that has compromised the application in some other way and is attempting to exploit this vulnerability or a similar vulnerability). The fact that this CVE uses ssh-agent, or that it uses dlopen, or that it depends on certain libraries being present is not important. What is important is understanding that if any of those things weren’t available, this CVE would not be exploitable.
How Deepfactor can help
Now it makes more sense to discuss how Deepfactor can help in situations like these. I like to say “If you aren’t using something in your application, or in the environment where your application is being run, get rid of it. Reduce the potential blast radius if something goes wrong.” There are certain things you typically don’t need available within arm’s reach of your application once it’s deployed; ssh tools are one of these things. In fact, over three years ago, we added an alert to Deepfactor’s runtime monitor that informs you if your application executes (or tries to execute) things like ssh-agent, ssh-add, etc. So again, while it’s exceedingly unlikely that an application would itself be vulnerable to this particular CVE, having ssh tools laying around in container images or VMs where applications are running is just asking for trouble. And for some reason, if your application becomes compromised with a different vulnerability and you happen to have an active ssh-agent session forwarding credentials to the same machine, Deepfactor would be able to detect any malicious ssh-add requests made in that environment. I hope you aren’t doing this, though.
Why did Deepfactor even add this alert in the first place? If best practices dictate “remove things you don’t need,” why would this even be a problem to begin with? Maybe your application environment was created long ago, and you’ve inherited a “fat” container or VM image with lots of unnecessary cruft. Maybe nobody at your organization knows what can be removed and what cannot without breaking the application. Maybe you lack information about what your application is actually using in order to even make that decision. Or maybe your operations team simply didn’t have a security background and just chose a fat container or VM image because it was convenient. At Deepfactor, we’ve seen a spectrum of customer deployments; some environments have been pruned meticulously, and others are dumpster fires carried forward from years ago.
Speaking of container images, slim container images that have removed landmines (LOLbins) are a great idea; there are several you can choose from. Regardless, whether or not your image is slim, Deepfactor’s software composition analysis (SCA) and SBOM capabilities can help you identify the bill of materials in your environment, and more importantly, tell you what you’re using and what you aren’t, so you can make intelligent decisions about what you can prune. Even if your organization insists on a slim environment, you need to reinforce that the SBOM you think you have is actually what you have, since developers can import things from unexpected places, and environments slowly rot over time if not actively maintained.
These whitepapers, SCA 2.0: A Framework to Prioritize Risk, Reduce False Positives, and Eliminate SCA Alert Fatigue and SBOM Security: Top 5 Reasons to Build SBOMs Into Your Pipeline, offer insights and explanations into how you can gather the information you need to make these decisions and so you can be prepared to fend off the next vulnerability.