OpenVPN: Listen on TCP and UDP with TUN

Today I’ll describe how to get OpenVPN to listen both to UDP and TCP port, using both tun device and the same network for clients. Meaning the same client can connect on either TCP or UDP and get the same IP Address assigned.

To achieve this, we’re gonna need:

  • OpenVPN
  • Sudo

I’m running this on a Debian Wheezy installation, but any Linux distribution can do the trick.

Let’s create the first OpenVPN instance, to listen on UDP/1194, creation of certificate is not covered by this HOWTO, as plenty of resources can already be found online.

Note that, the only difference with a standard VPN configuration, is the following line:


learn-address /etc/openvpn/MYVPN/learn-address.sh

This configuration line will make the script ‘learn-address.sh’ be ran whenever a client’s address is learned or unlearned. This will allow us to modify the kernel’s routing table upon client connection/disconnection and specify which tunnel interface we should use for that particular client.

Let’s now configure the TCP VPN:

Note that we only changed the name of log files, IPP, the tun device name and of course, the protocol.

Now, let’s configure a client’s CCD file like the following:

Allow vpn user to run /sbin/ip through sudo, to be able to make routing table changes within the learn-address.sh script; Let’s create a ‘/etc/sudoers.d/openvpn’ file with the following content:


vpn ALL=(ALL:ALL) NOPASSWD: /sbin/ip

And last but not least, let’s put together the learn-address.sh script which will make all the magic:

Now, you can try and make your client connect to the UDP instance, disconnect and connect again to the TCP one. You can tail -f the /tmp/learn.log file in which you can see routing changes if everything is working:


[-] Adding addr 10.1.2.6 -> tun0
[-] Deleting addr 10.1.2.6 ->
[-] Adding addr 10.1.2.6 -> tun1
[-] Deleting addr 10.1.2.6 ->

This was useful to you? Have questions? Thoughts? Don’t hesitate to leave a comment if so.

Solaris 11: The signature value did not match the expected value.

Since quite sime time now I’ve been unable to upgrade my test system to Solaris 11.1.
The system was running 11.0 SRU 10.4 and for some reason, when I wanted to run ‘pkg update’, I faced this message:

I’ve finally found (1) a solution to upgrade anyway and (2) a way to install the system/locale package, discarding this message.

Then, I’ve discovered that pkg signatures can be avoided using a pkg property:

And that’s it! As I’ve never had any answer to this issue anywhere, I’d thought I would share the resolution ;-)

VirtualBox IPS Package for Solaris 11

Some time ago I had to write a script to convert currently downloadable VirtualBox Package into IPS Packages for Solaris 11. As I found the thing really useful, I decided to setup a repository to provide everyone easy access over these packages.

You can find the different version available here:

http://mdma.igh.cnrs.fr/vbox/en/catalog.shtml?show_all_versions=1&action=Refresh

To install one version, follow these steps:

First, add the publisher:

pkg set-publisher -g 'http://mdma.igh.cnrs.fr/vbox' solaris

Then refresh the publisher cache:

pkg refresh --full

And finally, install the virtualbox version you want:

pkg install virtualbox@4.2.12

You can also install the virtualbox’s additions:

pkg install virtualbox-extpack@4.2.12

If you have any question on that, don’t hesitate to contact me.

IPS Repository

I’ve recently been able to build two IPS repositories for Solaris 11, so I’m now sharing access to these here.

The first repository concern some package that I’m using on a daily basis, so I’m just sharing them:

http://mdma.igh.cnrs.fr/espix/en/catalog.shtml

To add it as part of your Solaris 11, installation, simply type:

pkg set-publisher -P -g http://pkg.espix.net/espix espix

Currently, package listed are:

  • openvpn
  • tuntap driver
  • rssh

I’m planning of releasing much more soon. Puppet and Facter are to come and JRDS is also in my plans.

I’ll make a separate post for the second repository ;)

Solaris Hotplug: Manage your devices!

Solaris Hotplug can be used in various ways, let’s see what we can get out of it!

Enable the daemon

First of all, we should enable the daemon before anything else:

# svcadm enable hotplug

Listing devices

You can list devices in different ways, first you can have the list of devices and their corresponding driver attach:

On another side, you can see the device attached. I found that really interesting especially for the network cards:

Practical use: resetting a PCIEX card

When you have a hardware failure or something that looks like it, it’s sometime good to be able to reset the device prior to replacing the hardware. The easiest way to do that is simply to power-cycle the server. Although, that’s really not convenient when you have to do that on a production machine.

You can then give a try to the following procedure:

Find the device to reset

We want to reset the PCIEX device behind net1:

We can see that the device behind net1 is /pci@0,0/pci8086,1c10@1c/pci108e,125e@0,1, this is a dualport network card and the two ports are in use. As we don’t want to disable the other port, we’ll try first to offline only one port of the card. If that does not suits our needs, we’ll power-cycle the whole card.

Power-Cycle a single port


# hotplug offline /pci@0,0/pci8086,1c10@1c pci.0,1 # second port of the PCIEX card
# hotplug online /pci@0,0/pci8086,1c10@1c pci.0,1

We can find the corresponding messages in the /var/adm/messages file:


Apr 30 21:42:48 [hostname] mac: [ID 736570 kern.info] NOTICE: e1000g1 unregistered
Apr 30 21:42:48 [hostname] genunix: [ID 408114 kern.notice] /pci@0,0/pci8086,1c10@1c/pci108e,125e@0,1 (e1000g1) offline
Apr 30 21:42:33 [hostname] mac: [ID 469746 kern.info] NOTICE: e1000g1 registered
Apr 30 21:42:33 [hostname] genunix: [ID 408114 kern.notice] /pci@0,0/pci8086,1c10@1c/pci108e,125e@0,1 (e1000g1) online

Power-cycle the whole PCIEX card


# hotplug offline /pci@0,0/pci1458,a102 pci.1b,0
# hotplug online /pci@0,0/pci1458,a102 pci.1b,0

Here we are! The card and the two ports have been reset.

Sometimes, I’ve seen some people unloading the drivers to get the device reloaded by the OS: with hotplug, you can get even further by reloading only the particular device you want to.

If that was useful to you, don’t hesitate to tell me so ;)

Solaris: Tracing your application

Every Solaris system administrator already know the truss utility, which allow you to trace the system calls as your application is running.

I recently discovered another tool which I found to be really helpful when you want to know what an application is really doing: apptrace.

Apptrace is tracing the _function_ calls instead of the _system_ calls: You can then see which library and function your program is using!

This comes very handy in your daily sysadmin life, when you have to port a software or simply check what function of which library that software is using!

Here is a simple example of this utility against the “hostid” binary:

# apptrace hostid
-> hostid -> libc.so.1:int atexit(int (*)() = 0xeef93ee4)
<- hostid -> libc.so.1:atexit()
-> hostid -> libc.so.1:int atexit(int (*)() = 0x8050bac)
<- hostid -> libc.so.1:atexit()
-> hostid -> libc.so.1:void __fpstart(void)
<- hostid -> libc.so.1:__fpstart() = 0xeebfd484
-> hostid -> libc.so.1:long gethostid(void)
<- hostid -> libc.so.1:gethostid() = 0xcfabcc
-> hostid -> libc.so.1:int printf(const char * = 0x8050bcc "%08lx
", void * = 0xcfabcc, ...)
00cfabcc
<- hostid -> libc.so.1:printf() = 0x9
-> hostid -> libc.so.1:exit(0x0, 0xeec00088, 0xf10c8c04) ** NR

Have you ever wondered “How does that application is getting that system information?”, apptrace is there to help answering.
For the purpose of this post, I was simply wondering how zpool list was using libzfs.so to get the list of existing system pools, here is below the output of “apptrace zpool list”. I’ve removed useless output for clarity:


-> zpool -> libzfs.so.1:libzfs_handle_t * libzfs_init(void)
<- zpool -> libzfs.so.1:libzfs_init() = 0x851b648
-> zpool -> libzfs.so.1:void libzfs_print_on_error(libzfs_handle_t * = 0x851b648, boolean_t = 0x1)
<- zpool -> libzfs.so.1:libzfs_print_on_error() = 0x851b648
...[SNIPPED]...
-> zpool -> libzfs.so.1:int zpool_iter(libzfs_handle_t * = 0x851b648, zpool_iter_f = 0x805d608, void * = 0x8516ec8)
...[SNIPPED]...
-> zpool -> libzfs.so.1:const char * zpool_get_name(zpool_handle_t * = 0x8536e08)
<- zpool -> libzfs.so.1:zpool_get_name() = 0x8536e10
-> zpool -> libzfs.so.1:const char * zpool_get_name(zpool_handle_t * = 0x8536cc8)
<- zpool -> libzfs.so.1:zpool_get_name() = 0x8536cd0
...[SNIPPED]...

We can build the following .c code based on apptrace’s output:

Compile and try:


$ gcc zl.c -lzfs -lcryptoutil -m64 -o zl
$ ./zl
name: rpool
name: tank

And voila! If you find other useful usage for apptrace, please leave a comment below ;)

Analyze System Hangs using SCAT

There are already a lot of Howto’s out there explaining how to use the SCAT tool to analyze some Solaris Crash-dumps;
Although, I will try here to describe how to analyze a HANG of a system instead of a pure Crash.

First of all, when you find a system unresponsive, you must gather a crash-dump to allow further analysis of what happened; To allow crash-dump collection, you must of course have a dump device configured in beforehand and when the crash happen, you must force the system to dump. If you still have access or a connected console, you might use one of the following command to dump:

After the reboot of the system, the vmdump file will get saved into /var/crash, first extract that file:

savecore -f /var/crash/vmdump.3 /var/crash/

Then, fire up SCAT on that crash-dump:

# /opt/SUNWscat/bin/scat unix.3 vmcore.3

Oracle Solaris Crash Analysis Tool
Version 5.3 (SV5415, Jan 31 2012) for Oracle Solaris 11 64-bit x64

Copyright � 1989, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle proprietary - DO NOT RE-DISTRIBUTE!

Feedback regarding the tool should be sent to
SolarisCAT-feedback@opensolaris.org. For support, please use the
Oracle Solaris Performance, Hangs, Panics, and Dtrace community on

http://communities.oracle.com/.

opening unix.3 vmcore.3 ...dumphdr...symtab...core...done
loading core data: modules...symbols...CTF...done

core file: vmcore.3
user: Super-User (root:0)
release: 5.11 (64-bit)
version: 11.1
machine: i86pc
node name: .undisclosed.
system type: i86pc
hostid: .undisclosed.
dump_conflags: 0x10000 (DUMP_KERNEL) on /dev/zvol/dsk/rpool/dump(64G)
boothowto: 0x2000 (VERBOSE)
dump_uuid: .undisclosed.
time of crash: Wed Mar 20 14:02:30 MDT 2013 (core is 32 days old)
age of system: 14 days 23 hours 3 minutes 39.12 seconds
panic CPU: 18 (24 CPUs, 143G memory)
panic string: forced crash dump initiated at user request

sanity checks: settings...vmem...sysent...clock...misc...lookup failed for symbol freemem_wait: symbol not found

WARNING: 252 severe kstat errors (run "kstat xck")
WARNING: kernelbase 0xffff810000000000, expected 0xfffffd8000000000, resetting
done
CAT(vmcore.3/11X)>

You can indeed see there that the crash-dump has been forced at user request.

Now, let’s check the output of analyze command and check what was going on:


CAT(vmcore.3/11X)> analyze

core file: vmcore.3
user: Super-User (root:0)
release: 5.11 (64-bit)
version: 11.1
machine: i86pc
node name: .undisclosed.
system type: i86pc
hostid: .undisclosed.
dump_conflags: 0x10000 (DUMP_KERNEL) on /dev/zvol/dsk/rpool/dump(64G)
boothowto: 0x2000 (VERBOSE)
dump_uuid: .undisclosed.
time of crash: Wed Mar 20 14:02:30 MDT 2013 (core is 32 days old)
age of system: 14 days 23 hours 3 minutes 39.12 seconds
panic CPU: 18 (24 CPUs, 143G memory)
panic string: forced crash dump initiated at user request

==== system appears to have been Stop-A'ed ====

==== reporting thread summary ====

reference clock = panic_lbolt: 0x7b46158, panic_hrtime: 0x497cacd510312
3752 threads ran after current tick (1973 user, 1779 kernel)
3752 threads ran since 1 second before current tick (1973 user, 1779 kernel)
3752 threads ran since 1 minute before current tick (1973 user, 1779 kernel)

3 TS_RUN threads (0 user, 3 kernel)
1 TS_STOPPED threads (0 user, 1 kernel)
216 TS_FREE threads (0 user, 216 kernel)
0 !TS_LOAD (swapped) threads

0 threads trying to get a mutex
5* threads trying to get an rwlock (5 user, 0 kernel)
longest sleeping 0 seconds later
3486 threads waiting for a condition variable (1737 user, 1749 kernel)
0 threads sleeping on a semaphore
0 threads sleeping on a user-level sobj
230 threads sleeping on a shuttle (door) (230 user, 0 kernel)

0 threads in biowait()
1* proc with SIGKILL posted (see "tlist killed")
1* threads with procs with SIGKILL posted (1 user, 0 kernel)
4* threads in zio_wait() (4 user, 0 kernel)

3 threads in dispatch queues (0 user, 3 kernel)
1* threads in dispq of CPU running idle thread (0 user, 1 kernel)

3991 total threads in allthreads list (1973 user, 2018 kernel)
9 thread_reapcnt
7 lwp_reapcnt
4007 nthread

==== found thread waiting for rwlock, reporting owner thread ====

thread pri pctcpu idle PID wchan command
0xffffc102271710e0 59 0.000 -518665257d11h26m28.89s 21016 0xffffc1007a432658 /usr/sbin/zfs snapshot -r .undisclosed.
0xffffc1029a5184c0 59 0.000 -518422609d23h19m44.46s 20381 0xffffc1007a432658 zfs list -H -t filesystem -o name,used,available
0xffffc1024ee31c00 59 0.000 -518441533d15h25m10.24s 20524 0xffffc1007a432658 df -h .
0xffffc1029a6d87e0 59 0.000 -518485332d18h58m29.99s 20776 0xffffc1007a432658 df -h .
0xffffc10236741800 59 0.000 -518615477d18h57m37.74s 20871 0xffffc1007a432658 zfs list -H -t filesystem -o name,used,available

5 threads with that sobj found.

==== there are runnable threads, may have a CPU hog ====

==== reporting stopped threads ====

thread pri pctcpu idle PID wchan command
0xfffffffffc038460 96 0.000 -83391d15h40m14.01s 0 0 sched

From the output here, we can see a lot of useful information:

  • 5 Threads are waiting on a rwlock. We need to check whose command are actually waiting;
  • The thread’s waiting on a rwlock are all filesystem based, which is likely the cause of our hang;

Let’s dig into the rwlock’ed threads and see on what they are waiting:


CAT(vmcore.3/11X)> thread 0xffffc1029a5184c0
==== user (LWP_SYS) thread: 0xffffc1029a5184c0 PID: 20381 ====
cmd: zfs list -H -t filesystem -o name,used,available
fmri: svc:/network/ssh:default
t_wchan: 0xffffc1007a432658 sobj: reader/writer lock owner: 0xfffffffc82695c20
top owner (0xfffffffc82695c20) is waiting for cv 0xfffffffc8269576c

t_procp: 0xffffc101a8bc3118
p_as: 0xffffc10056765028 size: 13086720 RSS: 6676480
hat: 0xffffc100ab5fa948
cpuset:
zone: global
t_stk: 0xfffffffc85af0f10 sp: 0xfffffffc85af07e0 t_stkbase: 0xfffffffc85aec000
t_pri: 59(TS) t_tid: 1 pctcpu: 0.000000
t_lwp: 0xffffc10243526180 lwp_regs: 0xfffffffc85af0f10
mstate: LMS_SLEEP ms_prev: LMS_SYSTEM
lookup failed for symbol nsec_scale: symbol not found
ms_state_start: 14 days 23 hours 6 minutes 37.190669074 seconds earlier
lookup failed for symbol nsec_scale: symbol not found
ms_start: 14 days 23 hours 6 minutes 37.190669074 seconds earlier
psrset: 0 last CPU: 8
idle: -4479171350158446 ticks (518422609 days 23 hours 19 minutes 44.46 seconds)
start: Wed Mar 20 13:49:54 2013
age: 756 seconds (12 minutes 36 seconds)
syscall: #54 ioctl(, 0x0) (sysent: genunix:ioctl+0x0)
tstate: TS_SLEEP - awaiting an event
tflg:tpflg: TP_TWAIT - wait to be freed by lwp_wait
TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
pflag: SMSACCT - process is keeping micro-state accounting
SMSFORK - child inherits micro-state accounting

pc: unix:_resume_from_idle+0xf5 resume_return: addq $0x8,%rsp

unix:_resume_from_idle+0xf5 resume_return()
unix:swtch+0x13c()
genunix:turnstile_block+0x6ff()
unix:rw_enter_sleep+0x20d()
zfs:dsl_dir_open_spa+0xdd()
zfs:dsl_dataset_hold+0x44()
zfs:dmu_objset_hold+0x2f()
zfs:zfs_ioc_objset_stats+0x30()
zfs:zfsdev_ioctl+0x1f1()
genunix:cdev_ioctl+0x6e()
specfs:spec_ioctl+0x5d()
genunix:fop_ioctl+0xd6()
genunix:ioctl+0x188()
unix:_sys_sysenter_post_swapgs+0x149()
-- switch to user thread's user stack --

We can see here that this thread (cmd: zfs list -H -t filesystem -o name,used,available), is currently waiting over the rwlock 0xffffc1007a432658 which is owned by the thread 0xfffffffc82695c20. This rwlock owner thread is also waiting on a conditional variable 0xfffffffc8269576c.

Let’s check what the owner thread is stuck on:


CAT(vmcore.3/11X)> thread 0xfffffffc82695c20
==== kernel thread: 0xfffffffc82695c20 PID: 0 ====
cmd: sched
t_wchan: 0xfffffffc8269576c sobj: condition var (from zfs:dsl_dataset_drain_refs+0xa6)
t_procp: 0xfffffffffc037440(proc_sched)
p_as: 0xfffffffffc039460(kas)
zone: global
t_stk: 0xfffffffc82695c20 sp: 0xfffffffc826956f0 t_stkbase: 0xfffffffc8268e000
t_pri: 60(SYS) pctcpu: 0.002182
t_lwp: 0x0 psrset: 0 last CPU: 0
idle: -4479141744163556 ticks (518419183 days 8 hours 27 minutes 15.56 seconds)
start: Tue Mar 5 14:03:27 2013
age: 1292343 seconds (14 days 22 hours 59 minutes 3 seconds)
tstate: TS_SLEEP - awaiting an event
tflg: T_TALLOCSTK - thread structure allocated from stk
tpflg: none set
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
pflag: SSYS - system resident process

pc: unix:_resume_from_idle+0xf5 resume_return: addq $0x8,%rsp
startpc: zfs:txg_sync_thread+0x0: pushq %rbp

unix:_resume_from_idle+0xf5 resume_return()
unix:swtch+0x13c()
genunix:cv_wait+0x60()
zfs:dsl_dataset_drain_refs+0xa6()
zfs:dsl_dataset_destroy_sync+0x890()
zfs:dsl_sync_task_group_sync+0xf6()

zfs:dsl_pool_sync+0x20c()
zfs:spa_sync+0x395()
zfs:txg_sync_thread+0x244()
unix:thread_start+0x8()
-- end of kernel thread's stack --

This thread is currently stuck on draining some references from a dataset destroy previously run.
This is most likely where the system is hanging.
We can now check the list of the running process for an on-going destroy:


CAT(vmcore.3/11X)> proc|grep destr
0xffffc100947dd050 20372 20273 0 12943360 6717440 18442588409967410656 0 zfs destroy -r .undisclosed.
CAT(vmcore.3/11X)>

We can now open a support case with that basic analysis and upload the core-dump.
We couldn’t match the actual code with that information, but we can as much as possible direct the Oracle’s support to the actual bug.

As well, sometimes the first level of Oracle support are quickly closing the cases with an unrelated problem: using SCAT to make a first analysis can help you to pin point the actual bug you want to get support for!

For the record, this crash-dump has been analyzed by the support and we dig to the actual bug which is:

15810518 ZFS STUCK BEHIND A "ZFS DESTROY" JOB WAITING IN DSL_DATASET_DR

Solaris: Too many files open

Recently, I had to diagnose an issue where the only symptom was a failure with ‘mailx’ command from time to time, which caused the following output to be mailed instead of the expected report from a cron job:

Your “cron” job on <Hostname> /path/to/script | mailx my@email.domain
produced the following output:
/tmp/Rs10293: File exists

At this point, it was unclear to me where the “/tmp/Rs10293″ file was coming, but apparently, /tmp was containing a lot of these RsXXXXX files.

It became quickly obvious that the command which produced those RsXXXX files was ‘mailx’:

# strings /usr/bin/mailx|grep /tmp/Rs
/tmp/Rs%-ld

Oddly enough, I knew now that the file was created by mailx and that after a while, it was trying to create the same file and quit failing. But how comes this file was left in /tmp ?

I’ve decided to put a wrapper to the mailx command which would dump the ‘truss’ output to /var/tmp and then see what happens:

# mv /usr/bin/mailx /usr/bin/mailx.orig
# cat > /usr/bin/mailx
#!/bin/bash
truss -eafld -vall -wall -rall -xall -u a.out -o /var/tmp/mail.$(date +%Y%m%d_%H%M%S).truss /usr/bin/mailx.orig $@
^D

After a while, I’ve noticed that I got some very short truss output file and gotcha! They were exitting quickly after having created the corresponding ‘/tmp/RsXXXX’ file:

14581/1: 0.3988 openat(0xFFD19553, 0x080865C2, 02402, 0600) = 263
14581/1: 0xFFD19553: AT_FDCWD
14581/1: 0x080865C2: “/tmp/Rs14581″
14581/1: 0.3990 fcntl(263, 1, 0×00000000) = 0
14581/1: 0.3991 openat(0xFFD19553, 0xF0FC7F60, 0, 024) Err#2 ENOENT
14581/1: 0xFFD19553: AT_FDCWD
14581/1: 0xF0FC7F60: “/usr/lib/locale/en_US.UTF-8/LC_MESSAGES/SUNW_OST_OSLIB.mo”
14581/1: 0.3992 write(2, 0x080865C2, 12) = 12
14581/1: 0x080865C2: ” / t m p / R s 1 4 5 8 1″
14581/1: 0.3993 write(2, 0xEF558B90, 2) = 2
14581/1: 0xEF558B90: ” : ”
14581/1: 0.3995 write(2, 0xEF50393A, 19) = 19
14581/1: T o o m a n y o p e n f i l e s
14581/1: 0.3995 write(2, 0xEF558B8C, 1) = 1
14581/1: 0xEF558B8C: “\n”

I also discovered that the utility which was launching theses faulty ‘mailx’ commands was ‘smartmon-ux’ disk monitoring software.

I confirmed that by reloading smartmon-ux and waiting on the console:

/dev/rdsk/c0t5000C50055C78D1Bd0s0 polled at Mon Feb 11 06:18:07 2013 Status:Passed (Temperature = 34C 93F) (Speed: Port0=6.0 G; Port1=6.0 G)
/dev/rdsk/c0t5000C50055C78D43d0s0 polled at Mon Feb 11 06:18:08 2013 Status:Passed (Temperature = 28C 82F) (Speed: Port0=6.0 G; Port1=6.0 G)
/dev/rdsk/c0t5000C50055C791F7d0s0 polled at Mon Feb 11 06:18:08 2013 Status:Passed (Temperature = 28C 82F) (Speed: Port0=6.0 G; Port1=6.0 G)
/dev/rdsk/c0t5000C50055C792EFd0s0 polled at Mon Feb 11 06:18:08 2013 Status:Passed (Temperature = 31C 87F) (Speed: Port0=6.0 G; Port1=6.0 G)
/dev/rdsk/c0t5000C50055C7982Bd0s0 polled at Mon Feb 11 06:18:09 2013 Status:Passed (Temperature = 36C 96F) (Speed: Port0=6.0 G; Port1=6.0 G)
/dev/rdsk/c0t5000C50055C7999Fd0s0 polled at Mon Feb 11 06:18:09 2013 Status:Passed (Temperature = 32C 89F) (Speed: Port0=6.0 G; Port1=6.0 G)
/dev/rdsk/c0t5000C5005A8B8693d0s0 polled at Mon Feb 11 06:18:10 2013 Status:Passed (Temperature = 27C 80F) (Speed: Port0=6.0 G; Port1=<unattached>)
/tmp/Rs16850: Too many open files
Device on /dev/rdsk/c2t0d0s0, Thermal alert. Temperature now at 255C 491F degrees.
/dev/rdsk/c2t0d0s0 polled at Mon Feb 11 06:18:10 2013 Status:Passed (Temperature = 255C 491F)
/tmp/Rs16853: Too many open files
Device on /dev/rdsk/c3t1d0s0, Thermal alert. Temperature now at 255C 491F degrees.
/dev/rdsk/c3t1d0s0 polled at Mon Feb 11 06:18:10 2013 Status:Passed (Temperature = 255C 491F)
/tmp/Rs16856: Too many open files
Device on /dev/rdsk/c6t6d0s0, Thermal alert. Temperature now at 255C 491F degrees.
/dev/rdsk/c6t6d0s0 polled at Mon Feb 11 06:18:10 2013 Status:Passed (Temperature = 255C 491F)
/tmp/Rs16859: Too many open files
Device on /dev/rdsk/c7t7d0s0, Thermal alert. Temperature now at 255C 491F degrees.
/dev/rdsk/c7t7d0s0 polled at Mon Feb 11 06:18:11 2013 Status:Passed (Temperature = 255C 491F)

Right, now we know the “Who”, but why?

The syscall returned ‘Too many open files’ error which was impossible as per our /etc/system:

set rlim_fd_max=900000
set rlim_fd_cur=32768

And checking the current per-process limit:

# plimit $$|grep nofiles
nofiles(descriptors) 32768 900000

So we can open 32k files! How comes smartmon-ux/mailx is complaining?

# pgrep smartmon-ux
16692
# ls -l /proc/16692/fd/*|wc -l
261

So we’re clearly not reaching the 32k file descriptor limit! I’ve then found this blog post from 2006.

Let’s confirm that by compiling this little .c file:

# gcc -o files files.c
# mkdir out
# ./files
** Number of open files = 253. fopen() failed with error: Too many open files
# ls out|wc -l
254

ARGHL… ?

Then, the funny part is comming:

# gcc -o files files.c -m64
# ./files
** Number of open files = 32765. fopen() failed with error: Too many open files

Okay, this is a 32 vs 64 bits problem. Googling again is taking us to the /usr/lib/extendedFILE.so.1 library, which overcomes this bug:

# gcc -o files files.c
# file files
files: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped, no debugging information available
# LD_PRELOAD=/usr/lib/extendedFILE.so.1 ./files
** Number of open files = 32353. fopen() failed with error: Not enough space

And so our problem is fixed! Now let’s change a little bit the smartmon-ux start script to work around our problem:
Edit /etc/init.d/smartmon-ux.d and change line

/etc/smartmon-ux -E -G 45 -link -F 1200 -P -sq -M smartmon-alert@domain.tld

by:

LD_PRELOAD=/usr/lib/secure/extendedFILE.so.1 /etc/smartmon-ux -E -G 45 -link -F 1200 -P -sq -M smartmon-alert@domain.tld

also, symlink libextendedFILE.so.1 in the /usr/lib/secure directory to avoid the system complaining (mailx is setguid!!)

# cd /usr/lib/secure ; ln -s /usr/lib/extendedFILE.so.1

restart smartmon-ux and check it’s been using the library:

# /etc/init.d/smartmon-ux.d restart
# pgrep smartmon-ux
18319
# ls -al /proc/18319/path/|grep extendedFILE
lrwxrwxrwx 1 root root 0 2013-02-11 06:43 zfs.124.65538.134236 -> /usr/lib/extendedFILE.so.1

Then, I’ve begin thinking… what would be the proper and definitive fix for that problem? Would smartmon-ux compiled in 64bit be the fix?

Let’s try with our little .c file slightly modified:

# gcc -o files2 files2.c # ./files2
/tmp/Rs20586: Too many open files
# gcc -o files2_64 files2.c -m64
# ./files2_64
/tmp/Rs20637: Too many open files

Conclusions:

  • Smartmon-UX is leaking FDs to mailx process
  • Having Smartmon-UX compiled in 64 bits wouldn’t help
  • mailx should be fixed to cleanup /tmp/RsXXXXX files before exiting in error
  • extendedFILE.so.1 preload could be used as workaround
  • Final fix should be fixing smartmon-ux file descriptor leak issue

NOTE: I filed a support case with this, I will let you know the outcome…

Migration to wordpress

I needed to refurbish a little bit this blog and it finally happened.

I’ve migrated the software behind the blog to WordPress and imported the posts and categories.

Although, some post have some discrepancies and aren’t displaying properly, this will be fixed soon.

Fuite de données a la SNCB: Géolocalison les entrées