Solaris: Tracing your application

Every Solaris system administrator already know the truss utility, which allow you to trace the system calls as your application is running.

I recently discovered another tool which I found to be really helpful when you want to know what an application is really doing: apptrace.

Apptrace is tracing the _function_ calls instead of the _system_ calls: You can then see which library and function your program is using!

This comes very handy in your daily sysadmin life, when you have to port a software or simply check what function of which library that software is using!

Here is a simple example of this utility against the “hostid” binary:

# apptrace hostid
-> hostid -> libc.so.1:int atexit(int (*)() = 0xeef93ee4)
<- hostid -> libc.so.1:atexit()
-> hostid -> libc.so.1:int atexit(int (*)() = 0x8050bac)
<- hostid -> libc.so.1:atexit()
-> hostid -> libc.so.1:void __fpstart(void)
<- hostid -> libc.so.1:__fpstart() = 0xeebfd484
-> hostid -> libc.so.1:long gethostid(void)
<- hostid -> libc.so.1:gethostid() = 0xcfabcc
-> hostid -> libc.so.1:int printf(const char * = 0x8050bcc "%08lx
", void * = 0xcfabcc, ...)
00cfabcc
<- hostid -> libc.so.1:printf() = 0x9
-> hostid -> libc.so.1:exit(0x0, 0xeec00088, 0xf10c8c04) ** NR

Have you ever wondered “How does that application is getting that system information?”, apptrace is there to help answering.
For the purpose of this post, I was simply wondering how zpool list was using libzfs.so to get the list of existing system pools, here is below the output of “apptrace zpool list”. I’ve removed useless output for clarity:


-> zpool -> libzfs.so.1:libzfs_handle_t * libzfs_init(void)
<- zpool -> libzfs.so.1:libzfs_init() = 0x851b648
-> zpool -> libzfs.so.1:void libzfs_print_on_error(libzfs_handle_t * = 0x851b648, boolean_t = 0x1)
<- zpool -> libzfs.so.1:libzfs_print_on_error() = 0x851b648
...[SNIPPED]...
-> zpool -> libzfs.so.1:int zpool_iter(libzfs_handle_t * = 0x851b648, zpool_iter_f = 0x805d608, void * = 0x8516ec8)
...[SNIPPED]...
-> zpool -> libzfs.so.1:const char * zpool_get_name(zpool_handle_t * = 0x8536e08)
<- zpool -> libzfs.so.1:zpool_get_name() = 0x8536e10
-> zpool -> libzfs.so.1:const char * zpool_get_name(zpool_handle_t * = 0x8536cc8)
<- zpool -> libzfs.so.1:zpool_get_name() = 0x8536cd0
...[SNIPPED]...

We can build the following .c code based on apptrace’s output:

Compile and try:


$ gcc zl.c -lzfs -lcryptoutil -m64 -o zl
$ ./zl
name: rpool
name: tank

And voila! If you find other useful usage for apptrace, please leave a comment below 😉

Analyze System Hangs using SCAT

There are already a lot of Howto’s out there explaining how to use the SCAT tool to analyze some Solaris Crash-dumps;
Although, I will try here to describe how to analyze a HANG of a system instead of a pure Crash.

First of all, when you find a system unresponsive, you must gather a crash-dump to allow further analysis of what happened; To allow crash-dump collection, you must of course have a dump device configured in beforehand and when the crash happen, you must force the system to dump. If you still have access or a connected console, you might use one of the following command to dump:

After the reboot of the system, the vmdump file will get saved into /var/crash, first extract that file:

savecore -f /var/crash/vmdump.3 /var/crash/

Then, fire up SCAT on that crash-dump:

# /opt/SUNWscat/bin/scat unix.3 vmcore.3

Oracle Solaris Crash Analysis Tool
Version 5.3 (SV5415, Jan 31 2012) for Oracle Solaris 11 64-bit x64

Copyright � 1989, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle proprietary - DO NOT RE-DISTRIBUTE!

Feedback regarding the tool should be sent to
SolarisCAT-feedback@opensolaris.org. For support, please use the
Oracle Solaris Performance, Hangs, Panics, and Dtrace community on
http://communities.oracle.com/.

opening unix.3 vmcore.3 ...dumphdr...symtab...core...done
loading core data: modules...symbols...CTF...done

core file: vmcore.3
user: Super-User (root:0)
release: 5.11 (64-bit)
version: 11.1
machine: i86pc
node name: .undisclosed.
system type: i86pc
hostid: .undisclosed.
dump_conflags: 0x10000 (DUMP_KERNEL) on /dev/zvol/dsk/rpool/dump(64G)
boothowto: 0x2000 (VERBOSE)
dump_uuid: .undisclosed.
time of crash: Wed Mar 20 14:02:30 MDT 2013 (core is 32 days old)
age of system: 14 days 23 hours 3 minutes 39.12 seconds
panic CPU: 18 (24 CPUs, 143G memory)
panic string: forced crash dump initiated at user request

sanity checks: settings...vmem...sysent...clock...misc...lookup failed for symbol freemem_wait: symbol not found

WARNING: 252 severe kstat errors (run "kstat xck")
WARNING: kernelbase 0xffff810000000000, expected 0xfffffd8000000000, resetting
done
CAT(vmcore.3/11X)>

You can indeed see there that the crash-dump has been forced at user request.

Now, let’s check the output of analyze command and check what was going on:


CAT(vmcore.3/11X)> analyze

core file: vmcore.3
user: Super-User (root:0)
release: 5.11 (64-bit)
version: 11.1
machine: i86pc
node name: .undisclosed.
system type: i86pc
hostid: .undisclosed.
dump_conflags: 0x10000 (DUMP_KERNEL) on /dev/zvol/dsk/rpool/dump(64G)
boothowto: 0x2000 (VERBOSE)
dump_uuid: .undisclosed.
time of crash: Wed Mar 20 14:02:30 MDT 2013 (core is 32 days old)
age of system: 14 days 23 hours 3 minutes 39.12 seconds
panic CPU: 18 (24 CPUs, 143G memory)
panic string: forced crash dump initiated at user request

==== system appears to have been Stop-A'ed ====

==== reporting thread summary ====

reference clock = panic_lbolt: 0x7b46158, panic_hrtime: 0x497cacd510312
3752 threads ran after current tick (1973 user, 1779 kernel)
3752 threads ran since 1 second before current tick (1973 user, 1779 kernel)
3752 threads ran since 1 minute before current tick (1973 user, 1779 kernel)

3 TS_RUN threads (0 user, 3 kernel)
1 TS_STOPPED threads (0 user, 1 kernel)
216 TS_FREE threads (0 user, 216 kernel)
0 !TS_LOAD (swapped) threads

0 threads trying to get a mutex
5* threads trying to get an rwlock (5 user, 0 kernel)
longest sleeping 0 seconds later
3486 threads waiting for a condition variable (1737 user, 1749 kernel)
0 threads sleeping on a semaphore
0 threads sleeping on a user-level sobj
230 threads sleeping on a shuttle (door) (230 user, 0 kernel)

0 threads in biowait()
1* proc with SIGKILL posted (see "tlist killed")
1* threads with procs with SIGKILL posted (1 user, 0 kernel)
4* threads in zio_wait() (4 user, 0 kernel)

3 threads in dispatch queues (0 user, 3 kernel)
1* threads in dispq of CPU running idle thread (0 user, 1 kernel)

3991 total threads in allthreads list (1973 user, 2018 kernel)
9 thread_reapcnt
7 lwp_reapcnt
4007 nthread

==== found thread waiting for rwlock, reporting owner thread ====

thread pri pctcpu idle PID wchan command
0xffffc102271710e0 59 0.000 -518665257d11h26m28.89s 21016 0xffffc1007a432658 /usr/sbin/zfs snapshot -r .undisclosed.
0xffffc1029a5184c0 59 0.000 -518422609d23h19m44.46s 20381 0xffffc1007a432658 zfs list -H -t filesystem -o name,used,available
0xffffc1024ee31c00 59 0.000 -518441533d15h25m10.24s 20524 0xffffc1007a432658 df -h .
0xffffc1029a6d87e0 59 0.000 -518485332d18h58m29.99s 20776 0xffffc1007a432658 df -h .
0xffffc10236741800 59 0.000 -518615477d18h57m37.74s 20871 0xffffc1007a432658 zfs list -H -t filesystem -o name,used,available

5 threads with that sobj found.

==== there are runnable threads, may have a CPU hog ====

==== reporting stopped threads ====

thread pri pctcpu idle PID wchan command
0xfffffffffc038460 96 0.000 -83391d15h40m14.01s 0 0 sched

From the output here, we can see a lot of useful information:

  • 5 Threads are waiting on a rwlock. We need to check whose command are actually waiting;
  • The thread’s waiting on a rwlock are all filesystem based, which is likely the cause of our hang;

Let’s dig into the rwlock’ed threads and see on what they are waiting:


CAT(vmcore.3/11X)> thread 0xffffc1029a5184c0
==== user (LWP_SYS) thread: 0xffffc1029a5184c0 PID: 20381 ====
cmd: zfs list -H -t filesystem -o name,used,available
fmri: svc:/network/ssh:default
t_wchan: 0xffffc1007a432658 sobj: reader/writer lock owner: 0xfffffffc82695c20
top owner (0xfffffffc82695c20) is waiting for cv 0xfffffffc8269576c

t_procp: 0xffffc101a8bc3118
p_as: 0xffffc10056765028 size: 13086720 RSS: 6676480
hat: 0xffffc100ab5fa948
cpuset:
zone: global
t_stk: 0xfffffffc85af0f10 sp: 0xfffffffc85af07e0 t_stkbase: 0xfffffffc85aec000
t_pri: 59(TS) t_tid: 1 pctcpu: 0.000000
t_lwp: 0xffffc10243526180 lwp_regs: 0xfffffffc85af0f10
mstate: LMS_SLEEP ms_prev: LMS_SYSTEM
lookup failed for symbol nsec_scale: symbol not found
ms_state_start: 14 days 23 hours 6 minutes 37.190669074 seconds earlier
lookup failed for symbol nsec_scale: symbol not found
ms_start: 14 days 23 hours 6 minutes 37.190669074 seconds earlier
psrset: 0 last CPU: 8
idle: -4479171350158446 ticks (518422609 days 23 hours 19 minutes 44.46 seconds)
start: Wed Mar 20 13:49:54 2013
age: 756 seconds (12 minutes 36 seconds)
syscall: #54 ioctl(, 0x0) (sysent: genunix:ioctl+0x0)
tstate: TS_SLEEP - awaiting an event
tflg:tpflg: TP_TWAIT - wait to be freed by lwp_wait
TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
pflag: SMSACCT - process is keeping micro-state accounting
SMSFORK - child inherits micro-state accounting

pc: unix:_resume_from_idle+0xf5 resume_return: addq $0x8,%rsp

unix:_resume_from_idle+0xf5 resume_return()
unix:swtch+0x13c()
genunix:turnstile_block+0x6ff()
unix:rw_enter_sleep+0x20d()
zfs:dsl_dir_open_spa+0xdd()
zfs:dsl_dataset_hold+0x44()
zfs:dmu_objset_hold+0x2f()
zfs:zfs_ioc_objset_stats+0x30()
zfs:zfsdev_ioctl+0x1f1()
genunix:cdev_ioctl+0x6e()
specfs:spec_ioctl+0x5d()
genunix:fop_ioctl+0xd6()
genunix:ioctl+0x188()
unix:_sys_sysenter_post_swapgs+0x149()
-- switch to user thread's user stack --

We can see here that this thread (cmd: zfs list -H -t filesystem -o name,used,available), is currently waiting over the rwlock 0xffffc1007a432658 which is owned by the thread 0xfffffffc82695c20. This rwlock owner thread is also waiting on a conditional variable 0xfffffffc8269576c.

Let’s check what the owner thread is stuck on:


CAT(vmcore.3/11X)> thread 0xfffffffc82695c20
==== kernel thread: 0xfffffffc82695c20 PID: 0 ====
cmd: sched
t_wchan: 0xfffffffc8269576c sobj: condition var (from zfs:dsl_dataset_drain_refs+0xa6)
t_procp: 0xfffffffffc037440(proc_sched)
p_as: 0xfffffffffc039460(kas)
zone: global
t_stk: 0xfffffffc82695c20 sp: 0xfffffffc826956f0 t_stkbase: 0xfffffffc8268e000
t_pri: 60(SYS) pctcpu: 0.002182
t_lwp: 0x0 psrset: 0 last CPU: 0
idle: -4479141744163556 ticks (518419183 days 8 hours 27 minutes 15.56 seconds)
start: Tue Mar 5 14:03:27 2013
age: 1292343 seconds (14 days 22 hours 59 minutes 3 seconds)
tstate: TS_SLEEP - awaiting an event
tflg: T_TALLOCSTK - thread structure allocated from stk
tpflg: none set
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
pflag: SSYS - system resident process

pc: unix:_resume_from_idle+0xf5 resume_return: addq $0x8,%rsp
startpc: zfs:txg_sync_thread+0x0: pushq %rbp

unix:_resume_from_idle+0xf5 resume_return()
unix:swtch+0x13c()
genunix:cv_wait+0x60()
zfs:dsl_dataset_drain_refs+0xa6()
zfs:dsl_dataset_destroy_sync+0x890()
zfs:dsl_sync_task_group_sync+0xf6()

zfs:dsl_pool_sync+0x20c()
zfs:spa_sync+0x395()
zfs:txg_sync_thread+0x244()
unix:thread_start+0x8()
-- end of kernel thread's stack --

This thread is currently stuck on draining some references from a dataset destroy previously run.
This is most likely where the system is hanging.
We can now check the list of the running process for an on-going destroy:


CAT(vmcore.3/11X)> proc|grep destr
0xffffc100947dd050 20372 20273 0 12943360 6717440 18442588409967410656 0 zfs destroy -r .undisclosed.
CAT(vmcore.3/11X)>

We can now open a support case with that basic analysis and upload the core-dump.
We couldn’t match the actual code with that information, but we can as much as possible direct the Oracle’s support to the actual bug.

As well, sometimes the first level of Oracle support are quickly closing the cases with an unrelated problem: using SCAT to make a first analysis can help you to pin point the actual bug you want to get support for!

For the record, this crash-dump has been analyzed by the support and we dig to the actual bug which is:

15810518 ZFS STUCK BEHIND A "ZFS DESTROY" JOB WAITING IN DSL_DATASET_DR