Solaris m4 broken on latest 11.3 SRU

Today I’ve had to troubleshoot an issue with sendmail where the make in the config directory would create an error message like this:

Looking further at that m4 command, I can see it is causing the issue:

This was really odd and seems to be related to the current 11.3 SRU I am running:

On an older host, this is just running file with equals sendmail configs:

Where to look at this? well first I’ve been trussing the m4 call:

Looking at the end of the truss output, close to where it echo the “Can’t open file” message, here is what we can see:

What is interesting into that output are the following lines:

It looks like m4 is trying to open a file with an empty filename and upon error and exiting the program, it tries to unlink it.
To me, this sounds pretty much like what should’ve been a temporary file… Let’s try to apptrace that software now…

Here it is! Now we can see that the char * buffer passed to fopen is 0x8068298. Let’s search for that
address through the whole apptrace output so we can see where it comes from:

Huh Hoh! Look at that… exactly what we’ve thought, this should actually be a /tmp/m4aXXXXX kind of temporary file, but it seems the mktemp() call is returning an empty string.

The mktemp() manpage say:

So, we now know that m4 is not checking the return value expected for mktemp() and hence, can not detect its call being failling.

What is also interesting, is that errno should give us a bit more detail over what’s happenning, let’s see how to get that with gdb:

Here we go, it seems EINVAL is triggered from mktemp in /lib/ which itself call libc_mktemps.
So, Solaris is closed source but looking at the old opensolaris code, we might have a pretty good idea about what could go wrong, hopefully, let’s have a look at mktemp.c file from opensolaris last public revision available.

The interesting part starts at line 98, we can however not find any reference to errno = 22 in that code. Would the libc changed? Let’s quickly compare checksum on the two systems we have:

On the NOT working system:

On the working system:

Aouch. While this in itself does not mean the mktemp function itself has changed, it is a signal that the libc between the two system is indeed different.

Let’s try to reproduce the call of mktemp() on a small code to see if we can indeed reproduce the issue:

Let’s compile and run:

Mmh. this is actually working… what the hell is wrong with m4 then?

Let’s look at IDA code dump for the libc_mktemps() function on the *OLD* system:

This code pretty much look like the code we can find in the opensolaris’ mktemp.c. let’s now check the *NOT* working system libc’s code:

NOTE: I have tried to remove useless part of code to make it more readable, but the complete files can be found there:

Man! this is looking completely different! Indeed, the old libc version doesn’t even have a reference to setting errno to 22, while the new code has a lot. Let’s check what can trigger an errno=22 in that new code:

This one is pretty much if the provided string is NULL or if the string pointed to starts with ‘\0’. Not our case here.

Then, let’s check if the as‘s length is over slen (which is 0 in every call of libc_mktemps), or if slen is less than 0, which is not our case here.

This one is more tricky, what the code does (if I got it right!):

  • v3 = strlen(as) – slen(0); // so basically always == strlen(as)
  • v4 == last valid character of as
  • if the last character of as is ‘X’, then it is counting backwards the amount of trailing ‘X’ present in as.
  • if the number of trailing ‘X’ is not 6, and _xpg6 is set, then errno=22!

Wait a minute, WTF is that _xpg6 in the first place? We do have less than 6 ‘X’ in the trailing of our string, still we don’t hit that code with out mktemp.c test code.

A google search later, I found this document which gets us to learn about the different standards implemented in the Solaris code. One of them (XPG6), also referenced SUSv3 is apparently what this flag mean. Let’s try then to compile our test software the way Oracle suggest it for SUSv3:

And then run it on both our working and not working systems:

Hurray! so we found the root cause of the issue. It seems after the libc changes made by Oracle, the m4 program has not been adapted to take into account the standards it is supposed to be compiled against. Let’s open a case and get oracle working on this. Note, the “simple” fix for this would probably be to just add a trailing ‘X’ to that m4.c source code.

Thanks a bunch to ar1s for getting to the bottom of this with me and providing the IDA code dumps 😉

This entry was posted in Solaris and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *