Skip to content
Naked Security Naked Security

Apache’s other product: Critical bugs in ‘httpd’ web server, patch now!

The Apache web server just got an update - this one is nothing to do with Log4j!

Pick a random person, right now in late December 2021, and ask them these two questions:

Q1. Have you heard of Apache?
Q2. If so, can you name an Apache product?

We’re willing to wager that you will get one of two replies:

A1. No. A2. (Not applicable.)
A1. Yes. A2. Log4j.

Two weeks ago, however, we’d suggest that very few people had heard of Log4j, and even amongst those cognoscenti, few would have been particularly interested in it.

Until a cluster of potentially catastrophic bugs were revealed under the bug-brand Log4Shell, the Log4j programming library was merely one of many components that get sucked into and used by thousands, perhaps even hundreds of thousands, of Java applications and utilities.

Log4j was just “part of the supply chain”, and it had been bundled into more back-end servers and cloud-based services than anyone actually realised until now.

Many sysdamins, IT staff and cybersecurity teams have spent the past two weeks eradicating this programmatic plague from their demesnes. (Yes, that’s a real word. It’s pronounced domains, but the archaic spelling avoids implying a Windows network.)

Don’t forget “the other Apache”

Rewind to the oh-so-recent pre-Log4j era and we suggest that you’d get a different pair of answers, namely:

A1. Yes. A2. Apache’s a web server, isn’t it? (Actually, it’s a software foundation that makes a web server, amongst much else.)
A1. Yes. A2. Apache makes httpd, probably still the world’s most prevalent web server.

With more than 3000 files totalling close to a million lines of source code, Apache httpd is a large and capable server, with myriad combinations of modules and options making it both powerful and dangerous at the same time.

Fortunately, the open source httpd product receives constant attention from its developers, getting regular updates that bring new features along with critical security patches.

So, in all the excitement about Apache Log4j, don’t forget that:

  • You almost certainly have Apache httpd in your network somewhere. Just like Log4j, httpd has a habit of getting itself quietly included into software projects, for example as part of an internal service that works so well that it rarely draws attention to itself, or as a component built unobtrusively into a product or service you sell that isn’t predominantly thought of as “containing a web server”.
  • Apache just published an httpd update that fixes two CVE-numbered security bugs. These bugs might not be exposed in your configuration, because they are part of optional run-time modules that you might not actually be using. But if you are using these modules, whether you realise it or not, you could be at risk of server crashes, data leakage, or even remote code execution.

What got fixed?

The two CVE-numbered flaws are listed in Apache’s own changelog as follows:

  • CVE-2021-44790: Possible buffer overflow when parsing multipart content in mod_lua of Apache HTTP Server 2.4.51
  • CVE-2021-44224: Possible NULL dereference or SSRF in forward proxy configurations in Apache HTTP Server 2.4.51 and earlier.

The good news about the first bug is that Apache itself warns that the mod_lua server extension (which allows you to adapt the behaviour of httpd using Lua scripts instead of having to write modules in C):

…holds a great deal of power over httpd, which is both a strength and a potential security risk. It is not recommended that you use this module on a server that is shared with users you do not trust, as it can be abused to change the internal workings of httpd.

However, as Log4j has taught us, potentially exploitable bugs even on non-public servers can be troublesome if those bugs can be triggered by untrusted user data passed along by other internet-facing servers at your network edge.

And CVE-2021-44790 doesn’t involve sneaking any untrusted add-on Lua scripts into the configuration.

Instead, it involves simply tricking the “preprocessor” that prepares untrusted user content to be passed to trusted Lua scripts, so the attack does not depend on bugs or flaws in any of the add-on scripts you may have written yourself.

Multipart message splitting

Simply put, the CVE-2021-44790 bug exists in the code that deconstructs multipart messages, common in web form uploads, that typically look something like this:

Content-Type: multipart/form-data; boundary=VILC2R2IHFHLZZ

--VILC2R2IHFHLZZ
Content-Disposition: form-data; name="name"
                                 <--blank line denotes start of first data item
Paul Ducklin
--VILC2R2IHFHLZZ                 <--double-dash-plus-boundary denotes end
Content-Disposition: form-data; name="phone"
                                 <--blank line denotes start of second data item        
555-555-5555   
--VILC2R2IHFHLZZ--               <--double-dash-plus-boundary denotes end

Technically, each multipart component consists of the data after the end of each fully blank line (see above), and before each boundary line, which consists of two dashes (hyphens) followed by the unique boundary marker text.

(In case you are wondering, the extra double-dash at the end of the very last line above signals the final item in the list.)

A blank line in the raw data appears as two consecutive CRLF (carriage return plus line feed) pairs, or the ASCII codes (13,10,13,10), denoted in C by the text string "\r\n\r\n".

This parsing is handled very crudely by code that we’ve simplified like this:

for (start = findnext(start,boundarytext); start != NULL; start = end) {
   crlf = findnext(start,"\r\n\r\n");
   if (!crlf) break;
   end = findnext(crlf,boundarytext);
   len = end - crlf - 8;
   buff = memalloc(len+1);
   memcpy(buff,crlf+4,len);
   [. . .]
}

Don’t worry if you don’t know C – this code is impenetrable and poorly-documented even if you do. (The original is yet more complex and harder to follow; we have stripped it to its basics here.)

Loosely speaking, it looks for a double-CRLF string, denoting the next blank line.

From there, it finds the next occurrence of the boundary marker text (VILC2R2IHFHLZZ in our example above).

It then assumes that the data it needs to extract consists of everything between those two marker points, denoted by the memory addresses (pointers in C jargon) crlf and end, minus 8 bytes.

The code makes no effort to explain the meaning of that “minus 8” in the code, nor yet the “plus 4” two lines later, though it’s a good immediate guess that crlf+4 is there to skip past the 4 bytes that make up the data in the CRLFCRLF string itself. (The blank line is a separator, and isn’t part of the data to be used.)

Here’s where the “8” comes from:

  • 4 bytes taken up by the CRLFCRLF characters at the start, which are not part of the data itself.
  • 2 bytes of the CRLF at the end of the last line of data, not included.
  • 2 bytes used by the dashes (--) that denote the start of the boundary line, not included.

As you can see, the code allocates enough memory for the data between the exact start of the line after the CRLFCRLF separator and the exact end of the line before the boundary marker…

…plus an extra 1 byte (len+1) to ensure a NUL character (a zero byte) at the end of the buffer to act as the terminator that text strings require in C.

The code then uses memcpy() to copy the relevant data out of the incoming message into that new memory buffer, in which it will be presented to the Lua script that is about to run.

What if there aren’t 8 bytes?

You’ve probably figured out the problem: What if there aren’t 8 extra bytes to remove? What if the CRLF at the end of the last line of data, or the -- at the start of the next line, aren’t there at all?

What if there aren’t 8 bytes altogether between the CRLFCRLF and the boundary text?

This bug would have been much more obvious if the code were more clearly constructed or commented, and would almost certainly have been avoided if the CRLF-- separator between the blank line and the boundary text had been explicitly verified by the programmer. and tested for explicitly.

This bug was patched by adding a check to make sure that the final buffer size calculation doesn’t come out too small, by adding one line before the memory allocation attempt:

 if (end - crlf <= 8) break;

This verifies that the buffer length can’t end up negative, though we still think that an explicit check for a correct data terminator, in the same way that there’s an explicit check for CRLFCRLF, would make for clearer code.

We’d also insert a helful comment referring the reader to one of the internet RFCs about multipart messages, e.g. RFC 2045.

Proxy problems

Dealing with CVE-2021-44224 involved numerous code changes, the most obvious being a correction in a file full of utility code used by the httpd proxy module.

The fact that there are more than 5000 lines of C in proxy_util.c alone, which is support code for just one of many httpd modules, is testament to the overall size and complexity of the Apache HTTP Server.

The code we’re referring to above was changed from this…

url = ap_proxy_de_socketfy(p, url);

…to code that verifies that the function called actually did find a URL string to work with:

url = ap_proxy_de_socketfy(p, url);
if (!url) {
   return NULL;
}

Before the “if no url” error-check forced the code to give up early, the program would plough on even if url were NULL, and try to access the memory via the NULL address.

But reading from or writing to a NULL pointer is “undefined” according to the C standard, which means you must take care never do either.

Indeed, on almost all modern operating systems, the value used for NULL, usually zero, is chosen so that any attempt to access the NULL address, whether by reading or writing, will not only fail but be trapped by the operating system, which will then typically kill off the offending process to prevent dangerous or unintended side effects.

What to do?

  • If you are using Apache httpd anywhere, update to 2.4.52 as soon as you can.
  • If you can’t patch, check whether your configuration is at risk. There are many bug fixes beyond these two CVEs, so you should patch sooner rather than later. But you may decided to defer patching until a more convenient time if you aren’t loading either the Lua scripting or the proxy module.
  • If you are a coder, don’t forget to check for errors robustly in yor programs. If there’s a chance to spot mistakes before you make them worse, for example by verifying you have really have enough memory to play with, or checking that the string you are looking for really is there, take it!
  • If you are coder, assume that someone else will need to understand your code in the future. Write helpful and useful comments, on the grounds that those who cannot remember the past are condemned to repeat it.

11 Comments

Wherever you see magic numbers in code like ‘8’ you know there is something unexpected likely to happen!
It is surprising that noone has reviewed this code before or even put it through a static-analysis program like ‘lint’ but better (there are many commercial product that do this – I can think of two).

The thing about that 8 is “it’s really obvious once you know the answer”, which doesn’t help anyone who comes to your code later on to try to learn said answer!

Even a comment saying /* 8 is for the length of the ‘CRLFCRLF’ at the start and the trailing ‘CRLF–‘ */ would be a start. (And it would emphasise that the coder knew about those characters but neglected to check for them. This might have provoked the coder themselves to do a “self review” and improve the implementation.)

>Actually, it’s a software foundation that makes a web server, amongst much else.

Actually, the Apache web server came well before the foundation existed, and the foundation was named after the web server.

Both those statements are correct. The “Apache Group” indeed got its name because it set about a-patching the NCSA’s httpd server. That was in 1993. IIRC, the informal “Group” became the formal “Foundation” in 1999, and its website now boasts more than 300 “top-level projects”, and more than 250 million lines of code all-in, an average project size that suggests httpd (with just under 1M LOC on its own) is no longer unique amongst Apache’s projects in size and complexity.

So I am happy with my description of Apache as “a software foundation that makes a web server amongst much else” because [a] that it exactly what it is, and [b] it’s been a foundation for over 20 years, about 4/5ths of the time that “Apache” has been a well-known name in computer software.

Interesting. You took the time to indicate that “demesnes” is a real word
but not the more elusive:
cognoscenti (plural)
cognoscente (singular)
“persons who have superior knowledge and understanding of a particular field,
especially in the fine arts, literature, and world of fashion.”

There is an important difference between the two words.

Cognoscenti, though intended with mild, light-hearted irony here (those who are inclined to use the word cognoscenti could, if you were inclined to be judgmental, be accused of being pretentious linguistic cognoscenti), has been in continuous use, with that very spelling, ever since the hyperactive Plunderous Latinoidal Neologistic Wing (PLNW) of the Anglophone Literary Faction (ALFA) introduced it into our gloriously absorbent language in the 1700s.

The word cognoscenti has stuck in our language from that point onwards in exactly that form. (FWIW, my trusty Oxf Dict of Eng, and its invaluable companion, the New Oxf Amer Dict, insist that as an English word, it has no singular. It’s a collective plural noun used only to describe a group: there is no cognoscentus -a -um, no cognoscente, no cognoscent.)

In contrast, demesne is an archaic spelling. It is not really a proper choice in standard Modern English: it would usually be considered incorrect (and would be unilaterally changed by a copy editor) simply on orthographic grounds.

In short, in regular usage, demesnes can be objected to for purely objective reasons (being a spelling mistake), whereas cognoscenti is objectionable only subjectively, for reasons of stylistic offence (being hoity-toity twaddle).

However, I am keen to see a reintroduction of demesne in techical writing, if only to have an easy way to clarify its particular meaning (not a “Windows domain”!), in the same way that in British English we have happily adopted the dual spelling program/programme, which turns out to be very handy indeed when you need to differentiate between, say, a TV program (which is software that lets you watch TV shows, possibly after forcing you to identify yourself quite precisely for tracking purposes) and a TV programme (which is one of the shows that you might watch with the aforementioned program).

Same thing with dialog/dialogue. One is an interactive and possibly even an intellectual discussion that you have with a real person, while the other is a box that pops up on your computer screen and unilaterally tells you what is about to happen under the guise of inviting you into a dialogue that you never actually get to have.

In other news: cybersecurity stuff! (And Happy Holidays, if you are taking a vacation over the festive season.)

Am I lost here? Isn’t the log4j issue really a JNDI issue, making it an Oracle problem, that they need to fix?

Not really. After all, Java includes a whole slew of networking libraries, so if JNDI didn’t exist it you could still write the networking code to achieve the same goals inside a Java app. The real culprit is exposing the JNDI library to command strings embedded in data to be logged. (Imagine replacing ${jdni:…} with ${writetofileofyourchoice:…} – would you call that a flaw in the JVM itself and demand file IO libraries to be removed, or a flaw in the breadth of Java API that was opened up via the logger?

Having said that, many recent versions of the JVM configure the JNDI library to operate by default with the option “com.sun.jndi.ldap.object.trustURLCodebase=false”, which prevents the remote implantation of Java .class files via web download. This reduces the risk in this case, but it doesn’t eliminate it. Indeed, Apache suggested this was a complete mitigation for a short while, until is was obvious that it still allowed data exfiltration because the ${jndi:..} system still allows for remote code exection even of it prevents remote code execution.

This article contains a video that makes the attack chain clear:

https://nakedsecurity.sophos.com/log4shell-the-movie

And more detail on understanding the mitigations is here:

https://nakedsecurity.sophos.com/log4shell-explained

In the context of that video and article, the mitigation log4j2.formatMsgNoLookups=false stops the Log4j code from doing any outside lookups; the mitigation com.sun.jndi.ldap.object.trustURLCodebase=false stops the JVM’s JNDI library from ingesting remotely-supplied code; and the 2.17.0 patch, it seems, basically abandons the whole ${jndi:…} feature in Log4j as a bad idea. (2.17.0 also seems to stop a whole heap of ${…} operators in Log4j, as though Apache belatedly realised it was all a bad idea from the start.

I cannot thank you enough for this article, being an end user rather than a program designer or coder.
I have now sufficient up to date knowledge about Apache, httpd, to keep me occupied with my old hardware over Xmess. Into the bargain I discovered that I do actually have an understanding of the code involved (after your editing it) even though I only ever do HTML and CSS projects. The structured and object oriented stuff I was introduced to in college, apparently went in and is still resident. Thank you for bringing it back to my attention with your article, especially as my patchy knowledge of and interest in English language forms is just as robust as my home network and the family’s devices.
I hope Santa brings you what we all require ……… festive cheer and a peaceful New Year.

Comments are closed.

Subscribe to get the latest updates in your inbox.
Which categories are you interested in?