Solaris: Too many files open

Recently, I had to diagnose an issue where the only symptom was a failure with ‘mailx’ command from time to time, which caused the following output to be mailed instead of the expected report from a cron job:

Your “cron” job on <Hostname> /path/to/script | mailx my@email.domain
produced the following output:
/tmp/Rs10293: File exists

At this point, it was unclear to me where the “/tmp/Rs10293” file was coming, but apparently, /tmp was containing a lot of these RsXXXXX files.

It became quickly obvious that the command which produced those RsXXXX files was ‘mailx’:

# strings /usr/bin/mailx|grep /tmp/Rs
/tmp/Rs%-ld

Oddly enough, I knew now that the file was created by mailx and that after a while, it was trying to create the same file and quit failing. But how comes this file was left in /tmp ?

I’ve decided to put a wrapper to the mailx command which would dump the ‘truss’ output to /var/tmp and then see what happens:

# mv /usr/bin/mailx /usr/bin/mailx.orig
# cat > /usr/bin/mailx
#!/bin/bash
truss -eafld -vall -wall -rall -xall -u a.out -o /var/tmp/mail.$(date +%Y%m%d_%H%M%S).truss /usr/bin/mailx.orig $@
^D

After a while, I’ve noticed that I got some very short truss output file and gotcha! They were exitting quickly after having created the corresponding ‘/tmp/RsXXXX’ file:

14581/1: 0.3988 openat(0xFFD19553, 0x080865C2, 02402, 0600) = 263
14581/1: 0xFFD19553: AT_FDCWD
14581/1: 0x080865C2: “/tmp/Rs14581”
14581/1: 0.3990 fcntl(263, 1, 0x00000000) = 0
14581/1: 0.3991 openat(0xFFD19553, 0xF0FC7F60, 0, 024) Err#2 ENOENT
14581/1: 0xFFD19553: AT_FDCWD
14581/1: 0xF0FC7F60: “/usr/lib/locale/en_US.UTF-8/LC_MESSAGES/SUNW_OST_OSLIB.mo”
14581/1: 0.3992 write(2, 0x080865C2, 12) = 12
14581/1: 0x080865C2: ” / t m p / R s 1 4 5 8 1″
14581/1: 0.3993 write(2, 0xEF558B90, 2) = 2
14581/1: 0xEF558B90: ” : ”
14581/1: 0.3995 write(2, 0xEF50393A, 19) = 19
14581/1: T o o m a n y o p e n f i l e s
14581/1: 0.3995 write(2, 0xEF558B8C, 1) = 1
14581/1: 0xEF558B8C: “\n”

I also discovered that the utility which was launching theses faulty ‘mailx’ commands was ‘smartmon-ux’ disk monitoring software.

I confirmed that by reloading smartmon-ux and waiting on the console:

/dev/rdsk/c0t5000C50055C78D1Bd0s0 polled at Mon Feb 11 06:18:07 2013 Status:Passed (Temperature = 34C 93F) (Speed: Port0=6.0 G; Port1=6.0 G)
/dev/rdsk/c0t5000C50055C78D43d0s0 polled at Mon Feb 11 06:18:08 2013 Status:Passed (Temperature = 28C 82F) (Speed: Port0=6.0 G; Port1=6.0 G)
/dev/rdsk/c0t5000C50055C791F7d0s0 polled at Mon Feb 11 06:18:08 2013 Status:Passed (Temperature = 28C 82F) (Speed: Port0=6.0 G; Port1=6.0 G)
/dev/rdsk/c0t5000C50055C792EFd0s0 polled at Mon Feb 11 06:18:08 2013 Status:Passed (Temperature = 31C 87F) (Speed: Port0=6.0 G; Port1=6.0 G)
/dev/rdsk/c0t5000C50055C7982Bd0s0 polled at Mon Feb 11 06:18:09 2013 Status:Passed (Temperature = 36C 96F) (Speed: Port0=6.0 G; Port1=6.0 G)
/dev/rdsk/c0t5000C50055C7999Fd0s0 polled at Mon Feb 11 06:18:09 2013 Status:Passed (Temperature = 32C 89F) (Speed: Port0=6.0 G; Port1=6.0 G)
/dev/rdsk/c0t5000C5005A8B8693d0s0 polled at Mon Feb 11 06:18:10 2013 Status:Passed (Temperature = 27C 80F) (Speed: Port0=6.0 G; Port1=<unattached>)
/tmp/Rs16850: Too many open files
Device on /dev/rdsk/c2t0d0s0, Thermal alert. Temperature now at 255C 491F degrees.
/dev/rdsk/c2t0d0s0 polled at Mon Feb 11 06:18:10 2013 Status:Passed (Temperature = 255C 491F)
/tmp/Rs16853: Too many open files
Device on /dev/rdsk/c3t1d0s0, Thermal alert. Temperature now at 255C 491F degrees.
/dev/rdsk/c3t1d0s0 polled at Mon Feb 11 06:18:10 2013 Status:Passed (Temperature = 255C 491F)
/tmp/Rs16856: Too many open files
Device on /dev/rdsk/c6t6d0s0, Thermal alert. Temperature now at 255C 491F degrees.
/dev/rdsk/c6t6d0s0 polled at Mon Feb 11 06:18:10 2013 Status:Passed (Temperature = 255C 491F)
/tmp/Rs16859: Too many open files
Device on /dev/rdsk/c7t7d0s0, Thermal alert. Temperature now at 255C 491F degrees.
/dev/rdsk/c7t7d0s0 polled at Mon Feb 11 06:18:11 2013 Status:Passed (Temperature = 255C 491F)

Right, now we know the “Who”, but why?

The syscall returned ‘Too many open files’ error which was impossible as per our /etc/system:

set rlim_fd_max=900000
set rlim_fd_cur=32768

And checking the current per-process limit:

# plimit $$|grep nofiles
nofiles(descriptors) 32768 900000

So we can open 32k files! How comes smartmon-ux/mailx is complaining?

# pgrep smartmon-ux
16692
# ls -l /proc/16692/fd/*|wc -l
261

So we’re clearly not reaching the 32k file descriptor limit! I’ve then found this blog post from 2006.

Let’s confirm that by compiling this little .c file:

#include <stdio.h>
#define MAXF 65535

int main()
{
        char filename[20];
        FILE *fds[MAXF];
        int i;

        for (i = 0; i < MAXF; ++i) 
        {
                sprintf (filename, "./out/%d.log", i);
                fds[i] = fopen(filename, "w");

                if (fds[i] == NULL)
                {
                        printf("\n** Number of open files = %d. fopen() failed with error:  ", i);
                        perror("");
                        exit(1);
                }
                else
                {
                        fprintf (fds[i], "some string");
                }
        }
        return (0);
}

# gcc -o files files.c
# mkdir out
# ./files
** Number of open files = 253. fopen() failed with error: Too many open files
# ls out|wc -l
254

ARGHL… ?

Then, the funny part is comming:

# gcc -o files files.c -m64
# ./files
** Number of open files = 32765. fopen() failed with error: Too many open files

Okay, this is a 32 vs 64 bits problem. Googling again is taking us to the /usr/lib/extendedFILE.so.1 library, which overcomes this bug:

# gcc -o files files.c
# file files
files: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped, no debugging information available
# LD_PRELOAD=/usr/lib/extendedFILE.so.1 ./files
** Number of open files = 32353. fopen() failed with error: Not enough space

And so our problem is fixed! Now let’s change a little bit the smartmon-ux start script to work around our problem:
Edit /etc/init.d/smartmon-ux.d and change line

/etc/smartmon-ux -E -G 45 -link -F 1200 -P -sq -M smartmon-alert@domain.tld

by:

LD_PRELOAD=/usr/lib/secure/extendedFILE.so.1 /etc/smartmon-ux -E -G 45 -link -F 1200 -P -sq -M smartmon-alert@domain.tld

also, symlink libextendedFILE.so.1 in the /usr/lib/secure directory to avoid the system complaining (mailx is setguid!!)

# cd /usr/lib/secure ; ln -s /usr/lib/extendedFILE.so.1

restart smartmon-ux and check it’s been using the library:

# /etc/init.d/smartmon-ux.d restart
# pgrep smartmon-ux
18319
# ls -al /proc/18319/path/|grep extendedFILE
lrwxrwxrwx 1 root root 0 2013-02-11 06:43 zfs.124.65538.134236 -> /usr/lib/extendedFILE.so.1

Then, I’ve begin thinking… what would be the proper and definitive fix for that problem? Would smartmon-ux compiled in 64bit be the fix?

Let’s try with our little .c file slightly modified:

#include <stdio.h>
#define MAXF 252

int main()
{
        char filename[20];
        FILE *fds[MAXF];
        int i;

        for (i = 0; i < MAXF; ++i)
        {
                sprintf (filename, "./out/%d.log", i);
                fds[i] = fopen(filename, "w");

                if (fds[i] == NULL)
                {
                        printf("\n** Number of open files = %d. fopen() failed with error:  ", i);
                        perror("");
                        exit(1);
                }
                else
                {
                        fprintf (fds[i], "some string");
                }
        }
        /* we have MAXF opened FD, now try to send a mail... */
        execl("/usr/bin/mailx", "-s", "test files2.c", "-f", "/root/test/mymail", "my@mail.tld", NULL);
        return (0);
}

# gcc -o files2 files2.c # ./files2
/tmp/Rs20586: Too many open files
# gcc -o files2_64 files2.c -m64
# ./files2_64
/tmp/Rs20637: Too many open files

Conclusions:

  • Smartmon-UX is leaking FDs to mailx process
  • Having Smartmon-UX compiled in 64 bits wouldn’t help
  • mailx should be fixed to cleanup /tmp/RsXXXXX files before exiting in error
  • extendedFILE.so.1 preload could be used as workaround
  • Final fix should be fixing smartmon-ux file descriptor leak issue

NOTE: I filed a support case with this, I will let you know the outcome…

Posted in Solaris | Tagged , , , , | 3 Comments

Migration to wordpress

I needed to refurbish a little bit this blog and it finally happened.

I’ve migrated the software behind the blog to WordPress and imported the posts and categories.

Although, some post have some discrepancies and aren’t displaying properly, this will be fixed soon.

Posted in wildness' Life | Tagged , | Leave a comment

Fuite de données a la SNCB: Géolocalison les entrées


Continue reading

Posted in Leaks | Leave a comment

SNCB Data Leak: Geo-localization of the entries


Continue reading

Posted in Leaks | Leave a comment

SUNWjet: Add a slice 7 to a zpool’s disk

According to Oracle’s documentation, if you want to use SUNWjet to jumpstart a server with a ZFS Root pool and a slice 7 to put metadb on, you must first parition your drives and then launch the jumpstart.

This is particularly annoying as two operations are required.

I’ve found this post while searching on Google to see if it was possible to create a slice 7 automatically during the jumpstart. Unfortunately, it didn’t worked as:

  • Disks were hardcoded
  • Line 321 seemed to be architecture dependant (i86pc)

As I wanted to add this permanently and as every server I jumpstart can possibly one day use either UFS/DiskSuite or metaset, I wanted to have a slice 7 on every server in case of future use.

I wrote then this little patch to adapt the “populate_client_dir” script to add a slice 7 of 100Mb on every disk specified to be used as root pool:

So, to apply the patch, simply run:

 cd /opt/SUNWjet/Utils/solaris patch -p0 < /tmp/solaris-populate_client_dir.patch

Then, run the make_client script against your template, where you would have specified the zpool spec:

base_config_profile_zfs_disk="c0t0d0s0 c1t0d0s0"

Posted in Solaris | Leave a comment

Solaris 11 ISC DHCP: Cannot specify multiple interfaces

While trying to configure the ISC DHCP server on Solaris 11 to serve my local VLANs, I wanted to restrict its usage to only three interfaces, I then issued the following setprop command:

# svccfg -s svc:/network/dhcp/server:ipv4 config/listen_ifnames astring: "vlan100 vlan201 vlan202"
# svcadm refresh -s svc:/network/dhcp/server:ipv4
# svcadm enable svc:/network/dhcp/server:ipv4

But the service was failing to start… after adding some debug echo’s to the /lib/svc/method/isc-dhcp file, I saw that the whitespaces of this property get escaped when retrieved from the method’s script:

# tail -3 /var/svc/log/network-dhcp-server:ipv4.log 
[ Jul 10 13:44:02 Executing start method ("/lib/svc/method/isc-dhcp"). ] IFACES: vlan100\ vlan201\ vlan202 
[ Jul 10 13:44:02 Method "start" exited with status 95. ]
# ggrep -B 4 IFACES /lib/svc/method/isc-dhcp get_dhcpd_options() {
# get listen_ifname property value. LISTENIFNAMES="`get_prop listen_ifnames`" echo "IFACES: ${LISTENIFNAMES}";
# /usr/lib/inet/dhcpd -f -d -4 --no-pid  -cf /etc/inet/dhcpd4.conf -lf /var/db/isc-dhcp/dhcpd4.leases vlan100\ vlan201\ vlan202

vlan100 vlan201 vlan202: interface name too long (is 23)
# /usr/lib/inet/dhcpd -f -d -4 --no-pid  -cf /etc/inet/dhcpd4.conf -lf /var/db/isc-dhcp/dhcpd4.leases vlan100 vlan201 vlan202
Internet Systems Consortium DHCP Server 4.1-ESV-R4

So, to fix this behaviour, edit the /lib/svc/method/isc-dhcp, line 66 should be changed:

LISTENIFNAMES="`get_prop listen_ifnames`"

by

LISTENIFNAMES="`get_prop listen_ifnames|sed -e 's/,/ /g'`"

Then, you can set the listen_ifnames properties with multiple interfaces separated by commas:

# svccfg -s svc:/network/dhcp/server:ipv4
 svc:/network/dhcp/server:ipv4> setprop config/listen_ifnames = astring: "vlan100,vlan201,vlan202"
 svc:/network/dhcp/server:ipv4> exit
# svcadm refresh svc:/network/dhcp/server:ipv4
# svcadm disable svc:/network/dhcp/server:ipv4
# svcadm enable svc:/network/dhcp/server:ipv4
Posted in Solaris | Leave a comment

Solaris 11 Automated Installer

If you wish to install solaris on multiple machines, consider the installation of an Automated Installer!

I just created a documentation that explain everything from start to finish 😉

Comments are welcome!

Solaris 11 Automated Installer

UPDATE: Link adapted to the Espix’s wiki instead of WeSunSolve one.

Posted in Solaris | Leave a comment

WeSunSolve: One year later

More than one year ago, the WeSunSolve website has been launched publicly to address the lack of information available for the Solaris operating system.

Facing it’s success, a lot of improvements, features and stuff were added.. and visitors keeps like it!

Here are some statistics about this past year on the website:

Visitors

  • 307191 Unique Visits
  • 657270 Page viewed
  • 5500 Downloads
  • 64% Bouncing Visits
  • 670 Registered Users

Countries

  • 1. United States
  • 2. United Kingdom
  • 3. Germany
  • 4. Japan
  • 5. France

Website

  • Number of patches registered: 75928
  • Number of readmes version gathered: 63579
  • Number of checksums registered: 59471
  • Number of BugIDs registered: 365191
  • Total size of the patches repository: 623.57 GBytes
  • Number of Files detected: 1323021
  • Number of Packages: 8639
  • Number of CVE: 438

I would like to thank all the people who have made bugs reports, features requests and comments as well as the ones who have simply put their thumbs up!

If you like WeSunSolve, please spread the word! Talk ’bout it with your colleagues and share your experiences! If you’re achieving a recurrent task using WeSunSolve, why not writing a little Howto?

You’re part of a team of Solaris sysadmin? Did you know that you can know work in collaboration with your colleague on WeSunSolve? Check the documentation for more information 😉

Last but not least, do not hesitate to send me your thoughts on the website! It’s always good to hear from people who are using your work 🙂 Especially when it’s free 😉

Posted in WeSunSolve | Leave a comment

Sending PGP HTML Encrypted e-mail with PHP

While adding the PGP HTML Report feature to WeSunSolve, I first successfully crypted the content of the HTML report to be sent to the user with PGP key. I would have thought that this was gonna be the hardest part, that was without thinking about MIME and HTML support of PGP encrypted mails.

Here is how I finally ended up by creating HTML PGP encrypted Mails using PHP which can be opened using (at least) claws-mail and thunderbird with proper rendering of the HTML report:

Content of the clear message

 To: test@test.com 
 Subject: My HTML crypted report
 X-PHP-Originating-Script: 1000:mlist.obj.php
 From: "We Sun Solve" <admin@wesunsolve.net>
 Reply-to: admin@wesunsolve.net
 X-Sender: WeSunSolve v2.0
 Message-ID: <1335717276@wesunsolve.net>
 Date: Sun, 29 Apr 2012 18:34:36 +0200
 MIME-Version: 1.0
 Content-Type: multipart/encrypted; 
 protocol="application/pgp-encrypted"; 
 boundary="------------enig029BFFF948226050D5D90E10F" 

This is an OpenPGP/MIME encrypted message (RFC 2440 and 3156)
 --------------enig029BFFF948226050D5D90E10F 
 Content-Type: application/pgp-encrypted
 Content-Description: PGP/MIME version identification Version: 1
 --------------enig029BFFF948226050D5D90E10F
 Content-Type: application/octet-stream; name="encrypted.asc"
 Content-Description: OpenPGP encrypted message
 Content-Disposition: inline; filename="encrypted.asc"
 -----BEGIN PGP MESSAGE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 ****SNIPPED PGP CRYPTED BASE64 MESSAGE ****
 -----END PGP MESSAGE-----
 --------------enig029BFFF948226050D5D90E10F--

Content of the PGP encrypted part

Content-Type: multipart/alternative; 
 boundary="------------F983FADF500537B8AFDC5E483"
 This is a multi-part message in MIME format.
 --------------F983FADF500537B8AFDC5E483
 Content-Type: text/plain; charset=utf-8
 Content-Transfer-Encoding: quoted-printable
 You need to have a MUA capable of rendering HTML to read the WeSunSolve emails.
 You can consult the website http://wesunsolve.net if you are not able to read this email, the information sent to you should also be on the website...
 --------------F983FADF500537B8AFDC5E483
 Content-Type: text/html; charset="utf-8"
 Content-Transfer-Encoding: quoted-printable
 --------------F983FADF500537B8AFDC5E483--

This is the report in cleartext!


Code Used

$pgpmime = ”;
$mime = ”;
$headers = ”;
$dest = ‘test@test.com’;
$subject = ‘My HTML crypted report’;
$clearContent = ‘<html><p>This is the report in cleartext!</p></html>’;
$clearText = ‘This is the text version of the report’;
/* Prepare the crypted Part of the message */
$bound = ‘————‘.substr(strtoupper(md5(uniqid(rand()))), 0, 25);
$pgpmime .= “Content-Type: multipart/alternative;\r\n boundary=\”$bound\”\r\n\r\n”;
$pgpmime .= “This is a multi-part message in MIME format.\r\n”;
$pgpmime .= “–$bound\r\n”;
$pgpmime .= “Content-Type: text/plain; charset=utf-8\r\n”;
$pgpmime .= “Content-Transfer-Encoding: quoted-printable\r\n\r\n”;
$pgpmime .= $clearText.”\r\n\r\n”;
$pgpmime .= “–$bound\r\n”;
$pgpmime .= “Content-Type: text/html; charset=\”utf-8\”\r\n”;
$pgpmime .= “Content-Transfer-Encoding: quoted-printable\r\n\r\n”;
$pgpmime .= $clearContent.”\r\n”;
$pgpmime .= “–$bound–\r\n”;
$content = GPG::cryptTxt($pgpkey, $pgpmime);
/* Make the email’s headers */
$headers = ”;
$headers = “From: $from\r\n”;
$headers .= “Reply-to: “.$config[‘mailFrom’].”\r\n”;
$headers .= “X-Sender: WeSunSolve v2.0\r\n”;
$headers .= “Message-ID: <“.time().”@”.$_SERVER[‘SERVER_NAME’].”>\r\n”;
$headers .= “Date: ” . date(“r”) . “\r\n”;
$bound = ‘————enig’.substr(strtoupper(md5(uniqid(rand()))), 0, 25);
$headers .= “MIME-Version: 1.0\r\n”;
$headers .= “Content-Type: multipart/encrypted;\r\n”;
$headers .= ” protocol=\”application/pgp-encrypted\”;\r\n”;
$headers .= ” boundary=\”$bound\”\r\n\r\n”;
/* And the cleartext body which encapsulate PGP message */
$mime = ”;
$mime .= “This is an OpenPGP/MIME encrypted message (RFC 2440 and 3156)\r\n”;
$mime .= “–$bound\r\n”;
$mime .= “Content-Type: application/pgp-encrypted\r\n”;
$mime .= “Content-Description: PGP/MIME version identification\r\n\r\n”;
$mime .= “Version: 1\r\n\r\n”;
$mime .= “–$bound\r\n”;
$mime .= “Content-Type: application/octet-stream; name=\”encrypted.asc\”\r\n”;
$mime .= “Content-Description: OpenPGP encrypted message\r\n”;
$mime .= “Content-Disposition: inline; filename=\”encrypted.asc\”\r\n\r\n”;
$mime .= $content.”\r\n”;
$mime .= “–$bound–“;
mail($dest, $subject, $mime, $headers);
Posted in Programming | 1 Comment

WeSunSolve: Site News April

New Features

  • Added wiki to hold the documentation;
  • Added the monitoring of multiple IPS Repositories;
  • User list now allows the user to load multiple patches at once;
  • Added patch timeline
  • Added CVE list affecting Solaris packages
  • Added patch link to CVE when issue is fixed
  • Users can now download a ZIP containing all README of user patch list
  • SSL Signed certificate added to wesunsolve.net https domain
  • User login goes over SSL by default
  • Added support for SRV4 Packages link to patches
  • Modified the structure of patch level to link SRV4 packages
  • Added user setting for API access
  • Added function to API to allow server registration and adding a patch level
  • Added patch/security report based on patch level and PCA execution
  • Added mail report for patch/security based on a patch level and patchdiag.xref automatic selection
  • We added a logo to our Wiki! (thanks to Dagobert Michelsen for the logo 😉

Full listing of changes made can be found here

Patch level report (Using PCA)

PCA has been integrated into WeSunSolve so you can generate patch report on any server registered into your account where at least one patch level is defined.
The report which is created by WeSunSolve is based on the information you are entering when adding a server’s patch level: showrev-p.out and pkginfo-l.out.
Theses two files are generated while running the Explorer or simply gathered by hand with the two corresponding commands. (respectively: /usr/bin/showrev -p and /usr/bin/pkginfo -l).

You can see there a full example of such generated report.
To generate a report like this, you must Add a server and an associated patch level, you can achieve this by following steps pointed in the documentation.

Please, give us feedback if you feel something is missing inside this report!

Mail reports

You can also get the previous report being sent to you by mail regularly, everything can be configured to fit your needs… You can:

  • Choose the server and the patch level on which the report will be generated;
  • Choose the interval between two reports being sent to you: every day, every week, every month ?
  • You can decide which patchdiag.xref delay you want to have, this is the best if you always want to have a delay between what’s out and what you will actually install.

This way, you can get a report of what patches are to be installed on your server based on an up-to-date baseline every day…

To create a report, simply follow the steps at our documentation.

API Access

As of now, you can enable the API access inside your panel and take advantage of the function we have recently implemented, like:

  • Add a server easily;
  • Upload a patch level directly from command line;

We plan to add more feature to the API very soon…

Least known features: Window size

If you are browsing WeSunSolve regularly, you can greatly enhance your browsing by fitting the size of the website to your resolution.
We’ve implemented three size of screen:

  • 960px
  • 1200px
  • 1600px

By default, the website is rendering in 960px, which is fine to cope with most of our visitors but certainly not the best one if you have a 22″ screen 😉
See our documentation to know how to change your settings.

Like it? Spread it!

Please, if you do like WeSunSolve, spread it over your fellow sysadmin! Write a blog post ’bout it and send it over to get a backlink 🙂

You found a cool way of doing something with WeSunSolve that spared you hours of work? Please, tell us how! Don’t hesitate to write a Howto on our wiki

Finally, if you want to thank me personally, you can simply connect through LinkedIN and let a little recommendation on the WeSunSolve job…

Posted in WeSunSolve | Leave a comment