Solaris m4 broken on latest 11.3 SRU

Today I’ve had to troubleshoot an issue with sendmail where the make in the config directory would create an error message like this:

# make              
test ! -f sendmail.cf || /usr/bin/mv sendmail.cf sendmail.cf.prev
/usr/bin/m4 ../m4/cf.m4 sendmail.mc > sendmail.cf

/usr/bin/m4:sendmail.mc:../domain/solaris-generic.m4:../feature/redirect.m4:20 can't open file
divert(3)
*** Error code 1
make: Fatal error: Command failed for target `sendmail.cf'

Looking further at that m4 command, I can see it is causing the issue:

# m4 ../m4/cf.m4 sendmail.mc > /dev/null

m4:sendmail.mc:../domain/solaris-generic.m4:../feature/redirect.m4:20 can't open file
divert(3)

This was really odd and seems to be related to the current 11.3 SRU I am running:

# pkg info entire|grep Branch
           Branch: 0.175.3.9.0.4.0

On an older host, this is just running file with equals sendmail configs:

# make clean && make
/usr/bin/rm -f sendmail.cf submit.cf core
test ! -f sendmail.cf || /usr/bin/mv sendmail.cf sendmail.cf.prev
/usr/bin/m4 ../m4/cf.m4 sendmail.mc > sendmail.cf
test ! -f submit.cf || /usr/bin/mv submit.cf submit.cf.prev
/usr/bin/m4 ../m4/cf.m4 submit.mc > submit.cf
# pkg info entire|grep Branch
        Branch: 0.175.3.5.0.6.0

Where to look at this? well first I’ve been trussing the `m4` call:

# truss -ff m4 ../m4/cf.m4 sendmail.mc > /var/tmp/m4.truss 2>&1

Looking at the end of the truss output, close to where it echo the “Can’t open file” message, here is what we can see:

4850:   open("../feature/masquerade_entire_domain.m4", O_RDONLY|O_XPG4OPEN) = 4
4850:   fstat64(4, 0xFDB5C950)                          = 0
4850:   fstat64(4, 0xFDB5C860)                          = 0
4850:   ioctl(4, TCGETA, 0xFDB5C900)                    Err#25 ENOTTY
4850:   read(4, " d i v e r t ( - 1 )\n #".., 1024)     = 565
4850:   read(4, 0x088C123C, 1024)                       = 0
4850:   llseek(4, 0, SEEK_CUR)                          = 565
4850:   close(4)                                        = 0
4850:   open("", O_WRONLY|O_CREAT|O_TRUNC|O_XPG4OPEN, 0666) Err#2 ENOENT
4850:   open("/usr/lib/locale/en_US.UTF-8/LC_MESSAGES/SUNW_OST_OSCMD.mo", O_RDONLY|O_XPG4OPEN) Err#2 ENOENT
4850:   fstat64(2, 0xFDB5BA50)                          = 0

m4:4850:        write(2, "\n m 4 :", 4)                         = 4
sendmail.mc4850:        write(2, " s e n d m a i l . m c", 11)          = 11
:22 4850:       write(2, " : 2 2  ", 4)                         = 4
can't open file4850:    write(2, " c a n ' t   o p e n   f".., 15)      = 15

4850:   write(2, "\n", 1)                               = 1
divert4850:     write(2, " d i v e r t", 6)                     = 6
(4850:  write(2, " (", 1)                               = 1
54850:  write(2, " 5", 1)                               = 1
)4850:  write(2, " )", 1)                               = 1

4850:   write(2, "\n", 1)                               = 1
4850:   unlink("")                                      Err#2 ENOENT

What is interesting into that output are the following lines:

4850:   open("", O_WRONLY|O_CREAT|O_TRUNC|O_XPG4OPEN, 0666) Err#2 ENOENT
4850:   unlink("")                                      Err#2 ENOENT

It looks like `m4` is trying to open a file with an empty filename and upon error and exiting the program, it tries to `unlink` it.
To me, this sounds pretty much like what should’ve been a temporary file… Let’s try to apptrace that software now…

# apptrace m4 ../m4/cf.m4 sendmail.mc > /var/tmp/apptrace.out 2>&1=


Again, at the end of that output, let’s try to catch the `fopen` call with the empty filename:

-> m4       -> libc.so.1:FILE * fopen(const char * = 0x8068298 "", const char * = 0x8053464 "w")
<- m4       -> libc.so.1:fopen()
-> m4       -> libc.so.1:char * gettext(const char * = 0x8052e74 "can't open file")

Here it is! Now we can see that the `char *` buffer passed to `fopen` is `0x8068298`. Let’s search for that
address through the whole apptrace output so we can see where it comes from:

# grep 0x8068298 /var/tmp/apptrace.out 
-> m4       -> libc.so.1:char * mktemp(char * = 0x8068298 "/tmp/m4aXXXXX")
<- m4       -> libc.so.1:mktemp() = 0x8068298
-> m4       -> libc.so.1:int creat(const char * = 0x8068298 "", mode_t = 0x0)
-> m4       -> libc.so.1:FILE * fopen(const char * = 0x8068298 "", const char * = 0x8053464 "w")
-> m4       -> libc.so.1:int unlink(const char * = 0x8068298 "")

Huh Hoh! Look at that… exactly what we’ve thought, this should actually be a `/tmp/m4aXXXXX` kind of temporary file, but it seems the `mktemp()` call is returning an empty string.

The `mktemp()` manpage say:

RETURN VALUES
     The mktemp() function returns the  pointer  template.  If  a
     unique  name  cannot  be  created, template points to a null
     string.

     Upon successful completion, mkdtemp()  returns  the  pointer
     template.  If   a  unique   directory   cannot  be  created,
     mkdtemp() returns a null pointer.

ERRORS
     The mkdtemp() function can set errno to the same  values  as
     lstat(2) and mkdir(2).

So, we now know that `m4` is not checking the return value expected for mktemp() and hence, can not detect its call being failling.

What is also interesting, is that `errno` should give us a bit more detail over what’s happening, let’s see how to get that with gdb:

# /usr/bin/gdb m4 ../m4/cf.m4 sendmail.mc 
Excess command line arguments ignored. (sendmail.mc)
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-pc-solaris2.11".
For bug reporting instructions, please see:
...
Reading symbols from /usr/bin/m4...(no debugging symbols found)...done.
/usr/include/sys/../m4/cf.m4: No such file or directory.
(gdb) break mktemp
Breakpoint 1 at 0x805357c
(gdb) run
Starting program: /usr/bin/m4 
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[Switching to Thread 1 (LWP 1)]

Breakpoint 1, 0x07f90393 in mktemp () from /lib/libc.so.1
(gdb) finish
Run till exit from #0  0x07f90393 in mktemp () from /lib/libc.so.1
0x080539a3 in main ()
(gdb) p errno
$1 = 22
(gdb) bt
#0  0x07f901c0 in libc_mktemps () from /lib/libc.so.1
#1  0x07f903b2 in mktemp () from /lib/libc.so.1
#2  0x080539a3 in main ()
(gdb) ^C
ROOT # grep 22 /usr/include/sys/errno.h
#define EINVAL  22      /* Invalid argument                     */

Here we go, it seems `EINVAL` is triggered from mktemp in `/lib/libc.so.1` which itself call `libc_mktemps`.
So, Solaris is closed source but looking at the old opensolaris code, we might have a pretty good idea about what could go wrong, hopefully, let’s have a look at `mktemp.c` file from opensolaris last public revision available.

The interesting part starts at line 98, we can however not find any reference to `errno = 22` in that code. Would the libc changed? Let’s quickly compare `libc.so.1` checksum on the two systems we have:

On the NOT working system:

3df7fcc46656bf516f792403f638e4c1  /lib/libc.so.1

On the working system:

1c24a7fbeb0badf6a33d7f9d62e8280d  /lib/libc.so.1

Aouch. While this in itself does not mean the mktemp function itself has changed, it is a signal that the libc between the two system is indeed different.

Let’s try to reproduce the call of mktemp() on a small code to see if we can indeed reproduce the issue:

#include 
#include 

int main(void) {
        char template[]="/tmp/m4aXXXXX";
        char *b = mktemp(template);
        printf("mktemp ret: %s\n", b);
}

Let’s compile and run:

# gcc -o mktemp-gcc mktemp.c
# ./mktemp-gcc
mktemp ret: /tmp/m4a6Ltbn

Mmh. this is actually working… what the hell is wrong with `m4` then?

Let’s look at IDA code dump for the `libc_mktemps()` function on the *OLD* system:

char *__cdecl libc_mktemps(char *as, int slen)
{
  if ( as && *as )
  {
    lmutex_lock(&unk_167070);
    v29 = getpid();
    if ( v29 != dword_167064 )
    {
      gettimeofday(&tv, 0);
      if ( ((tv.tv_usec + 0x4000) | 0x43E8u) >> 15 )
        v2 = tv.tv_usec / 1000;
      else
        LOWORD(v2) = SLOWORD(tv.tv_usec) / 1000;
      dword_167068 = (1000 * LOWORD(tv.tv_sec) + (_WORD)v2) & 0xFFF;
      dword_167064 = v29;
      Ddata_data_11 = 6;
    }
    if ( !dword_16706C )
    {
      v3 = sysconf(514);
      dword_16706C = fls(v3);
    }
    v4 = strlen(as);
    if ( slen < v4 && slen >= 0 )
    {
      v5 = v4 - slen;
      v6 = &as[v5 - 1];
      v24 = 0;
      if ( v5 && as[v5 - 1] == 88 )
      {
        do
        {
          v7 = v24++ + 1;
          --v5;
          --v6;
        }
        while ( v5 && v7 < 6 && *v6 == 'X' );
      }
      v23 = (int)(v6 + 1);
      v8 = 6 * v24 - dword_16706C;
      if ( v8 < 0 )
      {
        v15 = &as[v5];
        v28 = v15;
        v26 = Ddata_data_11;
        v16 = 0;
        for ( i = 0; v26 && v28 > as; --v26 )
        {
          --v28;
          v18 = 0;
          v19 = chars;
          while ( *v28 != *v19 )
          {
            ++v19;
            if ( ++v18 >= 64 )
            {
              if ( v18 == 64 )
                goto LABEL_47;
              break;
            }
          }
          i = (v18 + __PAIR__((i << 6) + ((unsigned __int64)(unsigned int)v16 >> 26), v16 << 6)) >> 32;
          v16 = v18 + (v16 << 6);
        }
        v20 = 6 * Ddata_data_11 - dword_16706C;
        if ( v20 >= 32 )
        {
          LOBYTE(v20) = v20 - 32;
          v16 = i;
          i = 0;
        }
        if ( v29 == __PAIR__(
                      (((1 << dword_16706C) - 1) >> 31) & (i >> v20),
                      (unsigned int)(__PAIR__(i, v16) >> v20) & ((1 << dword_16706C) - 1))
          && lstat64(as, &v32) == -1
          && *(_DWORD *)__errno() == 2 )
        {
          lmutex_unlock(&unk_167070);
          return as;
        }
      }
      else
      {
        v25 = dword_167068;
        v22 = 1 << v8;
        if ( dword_167068 >= 1 << v8 )
        {
          dword_167068 = 0;
          v25 = 0;
        }
        v9 = v29;
        if ( v8 >= 32 )
        {
          LOBYTE(v8) = v8 - 32;
          HIDWORD(v9) = v29;
          v29 = 0;
        }
        v27 = __PAIR__(HIDWORD(v9), v29) << v8 >> 32;
        v30 = v29 << v8;
        while ( 1 )
        {
          v10 = (v25 + __PAIR__(v27, v30)) >> 32;
          v11 = v25 + v30;
          v12 = (_BYTE *)v23;
          if ( v24 )
          {
            v13 = 0;
            do
            {
              *v12++ = chars[v11 & 0x3F];
              v11 = __PAIR__(v10, v11) >> 6;
              v10 >>= 6;
              ++v13;
            }
            while ( v13 < v24 );
          }
          if ( lstat64(as, &v32) == -1 )
            break;
          v14 = v25 + 1;
          if ( v25 + 1 == v22 )
            v14 = 0;
          v25 = v14;
          if ( v14 == dword_167068 )
            goto LABEL_47;
        }
        if ( *(_DWORD *)__errno() == 2 )
        {
          dword_167068 = v25 + 1;
          Ddata_data_11 = v24;
          lmutex_unlock(&unk_167070);
          return as;
        }
      }
    }
LABEL_47:
    lmutex_unlock(&unk_167070);
    *as = 0;
  }
  return as;
}

This code pretty much look like the code we can find in the opensolaris’ mktemp.c. let’s now check the *NOT* working system libc’s code:

char *__cdecl libc_mktemps(char *as, int slen)
{
  if ( !as || !*as )
  {
    *(_DWORD *)__errno() = 22;
    return as;
  }
  v2 = strlen(as);
  if ( slen >= v2 || slen < 0 )
  {
    *(_DWORD *)__errno() = 22;
    *as = 0;
    return as;
  }
  v3 = v2 - slen;
  v4 = &as[v3 - 1];
  v17 = 0;
  if ( v3 && as[v3 - 1] == 'X' )
  {
    do
    {
      v5 = v17++ + 1;
      --v3;
      --v4;
    }
    while ( v3 && v5 < 6 && *v4 == 'X' );
  }
  v14 = (int)(v4 + 1);
  if ( v17 != 6 && _xpg6 )
  {
    *(_DWORD *)__errno() = 22;
    *as = 0;
    return as;
  }
  v15 = (1 << 6 * v17) - 1;
  v16 = 0LL;
  if ( 1 << 6 * v17 == 1 )
    goto LABEL_23;
  while ( 1 )
  {
    pthread_setcancelstate(1, &v18);
    if ( v17 < 6 )
    {
      v7 = arc4random_uniform(v15);
      pthread_setcancelstate(v18, 0);
      v8 = 0;
      v9 = (_BYTE *)v14;
      if ( !v17 )
        goto LABEL_19;
    }
    else
    {
      v6 = arc4random_uniform(1 << 6 * (v17 - 1));
      v7 = arc4random_uniform(64) + (v6 << 6);
      pthread_setcancelstate(v18, 0);
      v8 = 0;
      v9 = (_BYTE *)v14;
    }
    v10 = 0;
    do
    {
      *v9++ = chars[v7 & 0x3F];
      v7 = __PAIR__(v10, v7) >> 6;
      v10 >>= 6;
      ++v8;
    }
    while ( v8 < v17 );
LABEL_19:
    if ( lstat64(as, &v19) == -1 )
      break;
    v11 = v16;
    LODWORD(v16) = v16 + 1;
    v12 = __PAIR__(HIDWORD(v16), v11) + 1;
    HIDWORD(v16) = (__PAIR__(HIDWORD(v16), v11) + 1) >> 32;
    if ( HIDWORD(v12) >= (unsigned int)((unsigned int)v12 < (unsigned int)v15) + HIDWORD(v15) )
      goto LABEL_23;
  }
  if ( *(_DWORD *)__errno() != 2 )
  {
LABEL_23:
    if ( v15 == v16 )
      *(_DWORD *)__errno() = 28;
    *as = 0;
  }
  return as;
}

NOTE: I have tried to remove useless part of code to make it more readable, but the complete files can be found there:

Man! this is looking completely different! Indeed, the old libc version doesn’t even have a reference to setting errno to 22, while the new code has a lot. Let’s check what can trigger an errno=22 in that new code:

  if ( !as || !*as )
  {
    *(_DWORD *)__errno() = 22;
    return as;
  }

This one is pretty much if the provided string is NULL or if the string pointed to starts with ‘\0’. Not our case here.

  v2 = strlen(as);
  if ( slen >= v2 || slen < 0 )
  {
    *(_DWORD *)__errno() = 22;
    *as = 0;
    return as;
  }

Then, let’s check if the `as`’s length is over `slen` (which is 0 in every call of libc_mktemps), or if slen is less than 0, which is not our case here.

  v3 = v2 - slen;
  v4 = &as[v3 - 1];
  v17 = 0;
  if ( v3 && as[v3 - 1] == 'X' )
  {
    do
    {
      v5 = v17++ + 1;
      --v3;
      --v4;
    }
    while ( v3 && v5 < 6 && *v4 == 'X' );
  }
  v14 = (int)(v4 + 1);
  if ( v17 != 6 && _xpg6 )
  {
    *(_DWORD *)__errno() = 22;
    *as = 0;
    return as;
  }

This one is more tricky, what the code does (if I got it right!):

  • v3 = strlen(as) – slen(0); // so basically always == strlen(as)
  • v4 == last valid character of `as`
  • if the last character of `as` is ‘X’, then it is counting backwards the amount of trailing ‘X’ present in as.
  • if the number of trailing ‘X’ is not 6, and _xpg6 is set, then errno=22!

Wait a minute, WTF is that `_xpg6` in the first place? We do have less than 6 ‘X’ in the trailing of our string, still we don’t hit that code with out mktemp.c test code.

A google search later, I found this document which gets us to learn about the different standards implemented in the Solaris code. One of them (XPG6), also referenced `SUSv3` is apparently what this flag mean. Let’s try then to compile our test software the way Oracle suggest it for `SUSv3`:

$ /opt/solarisstudio12.4/bin/c99  $(getconf POSIX_V6_LP64_OFF64_CFLAGS) \
             -D_XOPEN_SOURCE=600  $(getconf POSIX_V6_LP64_OFF64_LDFLAGS) mktemp.c -o mktemp  \
                                  $(getconf POSIX_V6_LP64_OFF64_LIBS) 

And then run it on both our working and not working systems:

# ./mktemp
mktemp ret: 
# ./mktemp
mktemp ret: /tmp/m4aAEiQ0

Hurray! so we found the root cause of the issue. It seems after the libc changes made by Oracle, the `m4` program has not been adapted to take into account the standards it is supposed to be compiled against. Let’s open a case and get oracle working on this. Note, the “simple” fix for this would probably be to just add a trailing ‘X’ to that m4.c source code.

Thanks a bunch to ar1s for getting to the bottom of this with me and providing the IDA code dumps 😉

This entry was posted in Solaris and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *