March 30, 2024

xz backdoor Part 1: An Accidental Discovery of a Backdoor Likely Prevented Thousands of Infections

Mike Larkin, Founder & CTO, Deepfactor

Whitepaper: SCA 2.0 — A Framework to Prioritize Risk, Reduce False Positives, and Eliminate SCA Alert Fatigue

6 strategies to prioritize runtime alerts.

Download Now >

(When you’re done reading this, make sure to check out Part 2 of the series here.)

Yesterday’s discovery of the xz backdoor was an accident. But what a fortunate accident it was. The actor (or actors, we don’t yet know) had been diligent in their efforts for a long time, and only very recently started putting all the pieces together in what ended up being discovered yesterday. The backdoor is incorrectly being called an “ssh backdoor”; this is a bit misleading. OpenSSH does not use xz itself, but Linux distribution maintainers linked xz into sshd when building it (ostensibly for easier integration with systemd). As a matter of fact, xz is linked into so many packages that it may never be possible to fully ascertain the scope of what the backdoor might have done.

“I am *not* a security researcher, nor a reverse engineer.”

Andres Freund posted to the oss-security mailing list that during testing some odd ssh performance issues and valgrind crashes, the backdoor was discovered. It’s important to note that Andres is not a security researcher (the title of this section is what was written by Andres in the disclosure); meaning, this was not something that was being actively looked for, but rather something that was stumbled upon.

This was an incredibly fortunate discovery; as mentioned previously, xz is used everywhere. While it is likely that ssh was the probable target, we may never know. The backdoor injected code into liblzma via obfuscated and obscure changes introduced into xz’s configure script. The stated intention of that change was to improve testing, by using some pre-built .xz files as test binaries. In reality, those test binaries actually contained the code that was injected into sshd.

Now, let’s be honest. How many developers are inspecting each configure script for all the dependencies they self build? Libtool, autoconf, automake, and friends were created specifically so that you don’t need to inspect these configuration scripts by hand (and in this case, nobody did). Perhaps worse, if you are relying on dependencies sourced from third parties, do you really trust that they are doing that diligently?

These changes were slowly introduced by “Jia Tan” (likely not a real name and also possible it’s a group of individuals or a nation state) with other fake accounts showing up over the last few years encouraging the changes. This was certainly the act of an individual or group playing the long game.

It has been reported that in late February, “Jia Tan” approached other Linux distribution maintainers, exerting pressure to include the backdoored versions of xz in their distributions, under the guise of “great new features”. Why xz, a library that has been essentially feature complete for years, would need “great new features” is beyond me, but here we are. Regardless, the backdoored library started making its way into rolling release distributions and preview versions of others.

And then, we all got really lucky with the discovery of the backdoor. Had the backdoor not introduced valgrind errors or ssh performance problems, six months from now we might have been facing thousands of machines being compromised as they gradually upgraded to new distribution versions. As it turns out, a few distributions had included the affected library, but for the most part, the ecosystem was spared from a larger disaster.

Linking All the Things Into sshd

Base OpenSSH, as delivered from the OpenSSH project, doesn’t require any third-party libraries for default functionality. Probably due to some unknown business motivations, sshd in some distributions has been linked against a universe of libraries under the guise of “increasing functionality”. Every time a dependency is linked into an application like this, the application inherits all the bugs and issues of that dependency. The presumed reason for linking xz, in this case, was to have sshd become more easily controllable by systemd. This decision is what exposed these distributions to the backdoor. As systemd slowly consumes the Linux universe, we’ll see more and more of this.

My Own Experiment

As an experiment in preparation for writing this blog, I spun up a few VMs and examined how many external libraries were linked into sshd for each. The results surprised me, but I suppose I shouldn’t have been as surprised as I was.

Distribution Number of library dependencies for sshd
OpenBSD 7.5 (baseline) 4
Alpine Linux 3.19 4
Gentoo 5 (*)
Oracle Linux 9.1 26
Rocky Linux 9.2 26
Ubuntu 22.04 26
CentOS 8 30
CentOS 7 47

(*) For Gentoo, I emerged the openssh ebuild using default USE settings (eg, whatever the handbook said to use). Depending on USE settings, this might be larger in some instances. My Gentoo install was using OpenRC, not systemd.

The values in the preceding table were gathered by running ldd against the openssh sshd binary, and deducting counts for common things like, vdso entries, and the binary itself.

Note: In general, you should never run ldd against untrusted binaries. My tests were performed in disposable, isolated VMs for this very reason.

As you can see, some distributions ship quite thin sshd binaries, only linking in things like libc and pthreads, while others simply link the entire universe into the binary. Granted, some of the distributions listed (eg, CentOS 7) are very old and approaching or past EOL, but you can see that there is a huge jump between distributions that care about thinness and smaller security footprint and those that do not. Over time the count appears to be reducing, but even the thinnest sshd in these “bloated” distributions links in 5-6x the number of dependencies than the thinnest on the list.

“You can’t have bugs in code you don’t have”

One of the issues with linking in a huge list of dependencies is that you get saddled with the vulnerabilities associated with each, even if you didn’t want or need the functionality from that dependency. This makes tracking and patching vulnerabilities in your application more difficult than it has to be. For example, some of the distributions listed above linked into sshd libraries to support Kerberos and smart card logins. How many installations require that? 5%? 10%? Nonetheless, in order to cover all possible business use cases, these distributions have decided that everyone is going to get that support, just in case. This now exposes any vulnerabilities in those libraries to the entire community.

So what can you do? Few people are going to build their own sshd independent from the distribution maintainer since going down that road leads to a bunch of other headaches. So we’re basically stuck with possibly vulnerable packages, through no fault of our own.


One of the things we do at Deepfactor is help customers understand their vulnerability posture and remediation priority based on runtime usage analysis. We can tell you specifically which vulnerable dependency has been loaded into memory and executed, so when things like this happen, you can quickly bubble the most important remediation to the top of the list. The converse of this is we can also tell you what is not used, so you can remove such landmines from your environment. Although legacy SCA tools (we call these SCA 1.0) can list what dependencies you have in your environment, they fall short of being able to tell you what is actually used, which is important in this case.

A Final Thought

It is my belief that this type of supply chain attack will continue as long as we have universally used repositories whose maintainers have either lost interest in maintenance or have abandoned the code entirely. There are many other open source libraries besides xz that fall into this category, and even if a universally used library is maintained today, there is no guarantee that this will continue. If a maintainer loses interest in maintaining their code, and a random person comes along and expresses interest in taking over maintenance, why would the original maintainer not say “sure, have at it”? Until this problem is solved (and I’m not sure it ever will be), tools like Deepfactor will prove valuable to protect you by telling you how exposed you are and the order in which you should remediate issues.

Additional Resources

I found these links helpful in writing this piece:

If you’d like to learn more about how Deepfactor can help in situations like this, contact us to connect with our team.

(And make sure to check out Part 2 of the series here.)

Free Trial Signup

The Deepfactor trial includes the full functionality of the platform, hosted in a multi-tenant environment.

Sign Up Today! >
SCA 2.0 Whitepaper

Whitepaper: SCA 2.0 — A Framework to Prioritize Risk, Reduce False Positives, and Eliminate SCA Alert Fatigue

6 strategies to prioritize runtime alerts.

Download Now >

About the Author

Mike Larkin, Founder & CTO, Deepfactor

Author of OpenBSD Hypervisor VMM. Guest Faculty at San Jose State University for 18 years. Serial Entrepreneur Founder/CTO at RingCube (acquired by Citrix). Holds numerous patents. Avid peak bagger, climbed over 1000 summits.

Subscribe to our monthly eNewsletter and stay up-to-date on everything Deepfactor has to offer!