SSH connection to Solaris 11 is sometimes slow…

Today at work, we migrated the first box to Solaris 11 and we experienced the first bug as soon as we needed to log in onto the server.

As theses delays are quite common when the SSHd is configured by default, I quickly added theses lines to remove GSSAPI and DNS common issues:

/etc/ssh/sshd_config

LookupClientHostnames no VerifyReverseMapping no GSSAPIAuthentication no

Although, theses settings didn’t fixed the problem.

I added some verbosity to both ssh client and server and tracked down the delay to happen at this stage of the connection:

On the client:

$ ssh -v -p 2222 s11box -l adminifm OpenSSH_5.8p1-hpn13v10lpk, OpenSSL 1.0.0c 2 Dec 2010
 debug1: Reading configuration data /home/wildcat/.ssh/config
 debug1: Reading configuration data /etc/ssh/ssh_config
 debug1: Connecting to admblockum04 10.2.12.155 port 2222.
 debug1: Connection established. debug1: identity file /home/wildcat/.ssh/id_rsa type -1
 debug1: identity file /home/wildcat/.ssh/id_rsa-cert type -1
 debug1: identity file /home/wildcat/.ssh/id_dsa type 2
 debug1: identity file /home/wildcat/.ssh/id_dsa-cert type -1
 debug1: identity file /home/wildcat/.ssh/id_ecdsa type -1
 debug1: identity file /home/wildcat/.ssh/id_ecdsa-cert type -1
 debug1: Remote protocol version 2.0, remote software version Sun_SSH_2.0
 debug1: no match: Sun_SSH_2.0
 debug1: Enabling compatibility mode for protocol 2.0
 debug1: Local version string SSH-2.0-OpenSSH_5.8p1-hpn13v10lpk
 debug1: SSH2_MSG_KEXINIT sent
 *** HANG ***

And on the server:

 debug1: Reloading X.509 host keys to avoid PKCS#11 fork issues.monitor
 debug1: reading the context from the   child
 debug1: use_engine is 'no'
 debug1: list_hostkey_types: ssh-rsa,ssh-dss
 debug1: My KEX proposal before adding the GSS KEX algorithm:
 debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-  group1-sha1
 debug2: kex_parse_kexinit: ssh-rsa,ssh-dss debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour128,arcfour256,arcfour
 debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour128,arcfour256,arcfour debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,hmac-sha1-96,hmac-md5-96
 debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,hmac-sha1-96,hmac-md5-96
 debug2: kex_parse_kexinit: none,zlib
 debug2: kex_parse_kexinit: none,zlib
 debug2: kex_parse_kexinit: de-DE,en-US,es-ES,fr-FR,it-IT,ja-JP,ko-KR,pt-BR,zh-CN,zh-TW,i-default
 debug2: kex_parse_kexinit: de-DE,en-US,es-ES,fr-FR,it-IT,ja-JP,ko-KR,pt-BR,zh-CN,zh-TW,i-default
 debug2: kex_parse_kexinit: first_kex_follows 0
 debug2: kex_parse_kexinit: reserved 0
 *** HANG ***

Adding truss of the server process is helping us a lot:

  17477:  so_socket(PF_INET, SOCK_STREAM, IPPROTO_IP, 0, SOV_DEFAULT) = 3
  17469:  pollsys(0x080451C0, 1, 0x00000000, 0x00000000) (sleeping...)
  17477:  connect(3, 0x08047030, 16, SOV_DEFAULT) (sleeping...)

The lock is happening just after the connect() syscall. We can now check the pfiles of this process together with a netstat to identify which connection is causing trouble to be established:

  root@admblockum04:/local/home/ucc# pfiles 16585
 16585:  /usr/lib/ssh/sshd -f /etc/ssh/sshd_config -p 2222 -ddd -D
 Current rlimit: 256 file descriptor 0: S_IFCHR mode:0620 dev:532,0 ino:3502445063 uid:60004 gid:7 rdev:221,8 O_RDWR|O_NOCTTY|O_LARGEFILE /dev/pts/8 offset:43267 1: S_IFCHR mode:0620 dev:532,0 ino:3502445063 uid:60004 gid:7 rdev:221,8 O_RDWR|O_NOCTTY|O_LARGEFILE /dev/pts/8 offset:43267 2: S_IFCHR mode:0620 dev:532,0 ino:3502445063 uid:60004 gid:7 rdev:221,8 O_RDWR|O_NOCTTY|O_LARGEFILE /dev/pts/8 offset:43267 3: S_IFSOCK mode:0666 dev:540,0 ino:34566 uid:0 gid:0 size:0 O_RDWR SOCK_STREAM SO_SNDBUF(49152),SO_RCVBUF(131072) sockname: AF_INET 127.0.0.1  port: 55867 congestion control: newreno 4: S_IFSOCK mode:0666 dev:540,0 ino:5714 uid:0 gid:0 size:0 O_RDWR|O_NONBLOCK SOCK_STREAM SO_REUSEADDR,SO_KEEPALIVE,SO_SNDBUF(49152),SO_RCVBUF(128872) sockname: AF_INET6 ::ffff:10.2.12.155  port: 2222 peername: AF_INET6 ::ffff:10.2.60.1  port: 43575 congestion control: newreno 5: S_IFIFO mode:0000 dev:529,0 ino:70783 uid:0 gid:0 size:0 O_RDWR 6: S_IFIFO mode:0000 dev:529,0 ino:70783 uid:0 gid:0 size:0 O_RDWR 8: S_IFIFO mode:0000 dev:529,0 ino:70784 uid:0 gid:0 size:0 O_RDWR FD_CLOEXEC
 # netstat -an|grep 55867 127.0.0.1.55867      127.0.0.1.30003          0      0 131072      0 SYN_SENT

The port 30003 is the default port of tcsd daemon, which is managing physical cryptography (through /dev/tpm). If there is no hardware crypto devices, this daemon is disabled. It seems though that cryptoadm is linking tpm crypto mechanism by default, enabling ssh to trying to access this daemon.

Workaround found (just to confirm slowliness is caused by tcsd):

Run this command on the server:

 # nc -e 'cat /dev/null' localhost 30003

and try to ssh the box, it should be fast.

Permanent workaround:

Simply remove the pcks11_tpm provider from the crypto framework:

cryptoadm disable provider=/usr/lib/security/\$ISA/pkcs11_tpm.so mechanism=all
cryptoadm uninstall provider=/usr/lib/security/\$ISA/pkcs11_tpm.so

Other references:

Posted in Solaris | 1 Comment

One Liner: Ifstat

Recently I tried ifstat on a freshly updated linux box with a 3.2 kernel, it was reporting nothing (0kb) although the network was heavily used.

A quick one-liner confirmed this and I decided to keep it here for later use:

export IFACE=eth0 ; while(:); do cat /proc/net/dev;sleep 1;done|nawk -W interactive -v iface=${IFACE} 'BEGIN{ cnt = 0 } $1 ~ iface { if (cnt) { print iface ":",(($2 - cnt)/1024),"Kbytes/sec"; } cnt=$2; }'

Posted in Uncategorized | Leave a comment

How to upgrade a Solaris server to a particular patching level

Situation

Today at work, I needed to prepare the patching of a two-nodes cluster. Not only I should patch this cluster, but I should also mimic the same patching level as the currently used prod server. Well Well. Instead of making a diff of the showrev and try to sort out which patch is installed on this node, not on the other and so on, I tried to define a new way of doing this kind of things. I’ll describe my method here, don’t hesitate to comment, suggest or criticize something 🙂

The Idea

The idea is to use PCA to achieve everything. As you may (or may not) know, PCA is based upon patchdiag.xref files, which are provided by SUN Oracle once a day.

They contain the list of latest released patches as well as dependencies. The Idea of my solution is to generate a patchdiag.xref based on the patching level I should match. I could then use PCA together with this patchdiag.xref on the two nodes I need to patch. Child-Game!

WeSunSolve to the rescue

Again, you may (or not) know that WeSunSolve allow you to register yourself and use the Panel as a little server dashboard. You can enter some server name and link some patching level to them. This is being done using a simple “showrev -p” output that you can paste on the website to add patch level to a server.

Once you got two (or more) server, you can use the newly added feature Server Upgrade. You can choose two patching level there to generate a patchdiag.xref file:

  • Source patching level, is the patch level of the server you’ll need to patch.
  • Destination patching level, is the patch level of the server you want to mimic.

Use PCA, and voila!

Next step is known by you all! Just use pca together with the patchdiag.xref file and see the output:

 $ ./pca -X . -f explorer.XXXXXXXXX.YYYYYYYYY-2011.12.17.02.00 -l 
Using /home/wildcat/YYYYYYYYYYY/./patchdiag.xref from Feb/03/12 
Host: XXXXXXXXXXX (SunOS 5.9/Generic_122300-05/sparc/sun4u) 
List: missing (133/210205) Patch  IR   CR RSB Age Synopsis - - - - 
112951 13 < 14 RS- 999 SunOS 5.9: patchadd and patchrm Patch 
111711 16 < 18 R-- 999 SunOS 5.9: 32-bit Shared library patch for C++ 
111712 16 < 18 R-- 999 SunOS 5.9: 64-Bit Shared library patch for C++ **SNIPPED**
Posted in WeSunSolve | Leave a comment

sfpC.pl – The Solaris Fingerprint Companion revived!

A few ago, I’ve found the sfpC.pl script from Glenn Brunette which was used to check the fingerprint of any Solaris file against the SunSolve Fingerprint Database.

I’ve managed to port this script to be used against WeSunSolve fingerprint database, as the old sunsolve one is not available anymore.

You can find the modified version there: sfpC.pl

Usage Example:

 # uname -a SunOS xxxxxxx 5.10 Generic_147440-01 sun4u sparc SUNW,SPARC-Enterprise # find /usr -type f -perm -2000 -o -perm -4000 -exec digest -a md5 {} \; | tee /tmp/md5.list 70888c55597129ae8f7143567c7dd2f1 b437cf99006a0d22287f8b26938a90db 378b83c6d55b44d20a1f6902612a1d3d d2acd1dd698e218a62aae3567336b4e1 055bc51ba15da6d5c15c7f7e26d538ea 12817b8e863451f40020506f08aa1bb8 70db5d696f6be203e48d23069eb48254 d84db7394e336deccece59b3db01fdee 2483960ec8dc844e145da663e73e2e39 d768cd6855ee5f02b7ca2c330060abfb fde68b10c193c721d7d7ad7265db1618 93dec393cbebd965888b104e2a7c6d95 d01bc4ef4a9b807b5586ebc3f61634dd fa6b38a551be90d69a151fbdc0ac427e f28a9328566be6d949ba1c7ef3bab5e3 fa27c3be194e4a29967a385c2df1d36b bf2d9b6392045af566186a0de83f3975 **SNIPPED**
 ~/sfpC/sfpC-v0.6 $ ./sfpC.pl md5.list -> 70888c55597129ae8f7143567c7dd2f1 found in 2 OS Releases and 1 patches. path: /usr/bin/at md5: 70888c55597129ae8f7143567c7dd2f1 sha1: ce12021b0694dfe8d58c74f45b7988d87250dfb7 size: 41344 associated package: SUNWcsu associated solaris releases: > Solaris 10 (Update 10) 8/11 (sparc) > Solaris 10 (Update 9) 9/10 (sparc) associated solaris patches: > 142909-17 -> b437cf99006a0d22287f8b26938a90db found in 2 OS Releases and 1 patches. path: /usr/bin/atq md5: b437cf99006a0d22287f8b26938a90db sha1: 5568caebc18d066edf1d4d6a8888c91b521a19fa size: 19064 associated package: SUNWcsu associated solaris releases: > Solaris 10 (Update 10) 8/11 (sparc) > Solaris 10 (Update 9) 9/10 (sparc) associated solaris patches: > 142909-17 -> 378b83c6d55b44d20a1f6902612a1d3d found in 2 OS Releases and 1 patches. path: /usr/bin/atrm md5: 378b83c6d55b44d20a1f6902612a1d3d sha1: 8ff96329c9e4f99a9288115fb8acc7cfa089ce10 size: 19016 associated package: SUNWcsu associated solaris releases: > Solaris 10 (Update 10) 8/11 (sparc) > Solaris 10 (Update 9) 9/10 (sparc) associated solaris patches: > 142909-17 -> d2acd1dd698e218a62aae3567336b4e1 found in 2 OS Releases and 1 patches. path: /usr/bin/crontab md5: d2acd1dd698e218a62aae3567336b4e1 sha1: 45c6aa5803d3f4f02d68076eb99bbcc508511d59 size: 20336 associated package: SUNWcsu associated solaris releases: > Solaris 10 (Update 10) 8/11 (sparc) > Solaris 10 (Update 9) 9/10 (sparc) associated solaris patches: > 142909-17 -> 055bc51ba15da6d5c15c7f7e26d538ea found in 11 OS Releases and 0 patches. **SNIPPED**
Posted in WeSunSolve | Leave a comment

My Oracle Support: Authenticate with CLI

I was wondering since some times how I could authenticate on MOS using CLI. Now I got my answer: By reversing the whole SSO auth process in a python script that generate a cookies.txt file, usable with wget.

I know that a simpler method exists using directly wget, but as it doesn’t work with every MOS page, I’ll prefer a more generic way of doing things.

Here is a quick how to use:

  1. Edit MOSLogin.py and setup variables inside.
  1. Install python dependencies (linux/debian):
apt-get install python-pip 
pip install BeautifulSoup
pip install requests
  1. Run the script
$ ./login.py 
[-] Initialization...done 
[-] Gathering JSESSIONID..done 
[-] Trying loginSuccess.jsp...done 
[-] Signing in...done 
[-] Trying to submit credentials...done 
[-] Checking that credentials are valid...done 
[-] Logged-in.
  1. Use the cookies.txt with wget
wget --load-cookies=/tmp/cookies.txt --save-cookies=/tmp/cookies.txt --no-check-certificate http://MOSURL

With a little bit of time and fun, you can imagine every tool based on this to ease your sysadmin life. You can even fetch your SR summary/updates using this cookie…

Get the MOSLogin.py file…

Posted in Solaris | Leave a comment

WeSunSolve.net: 6 Months later

6 Months ago, I created the “sunsolve.espix.org” website running at home on my personal VDSL link; 6 Months later, the website is still evolving.

Where we started

Everything started when SunSolve.sun.com has died. I remember my frustration at that time.. I couldn’t gather any information I needed as fast as before. I couldn’t even download some patches as the MOS website was still buggy for the newly added SUN content.

I then decided to put everything I could found inside a database. A couple of hours later and some lines of PHP, the 5.1 MySQL database was filling with patches and bugs descriptions. Two or three HTML forms and a quick’n’dirty web design later and I was set with my first and personal SunSolve database.

I couldn’t resist to showing this to some colleagues and sysadmin friends, which were all enthusiast about it, Okay, so could it be the same for people out there ? Let’s see…

I’ve then decided to keep the website on my VDSL link and open it to public. I’ve sent two or three e-mail to some mailling-list and waited.

And it all started, 100 visits the first day, 150 the next one, and 200 the day after. It was already incredible.

Upgrade !

Step one

After two weeks of running on my VDSL link, it was time to get real bandwidth and more CPU power. I first created a Virtual machine inside my available pool to hold both the SQL server and the Frontend/processing, which proves to be too much so I’ve ordered a server @ Hetzner.de to hold the MySQL server, which changed to 5.5 in the meantime. All in all, I was ready to take up some load now!

Step two

After having upgraded the hardware and the connectivity, it was time to give WeSunSolve a cleaner design. This was done already two months ago, with the help of NESS, my wife 😉

Issues foreseen

In the future, I can see two problems that will probably popup anytime from now:

Running out of disk space

From the start, WeSunSolve is downloading every patch, every bundle, every readme file and is storing everything inside a database. I also keep archives to be sure I can rebuild the database from scratch if needed. This takes some disk space, as of now, the WeSunSolve repository weight 609 GBytes added to the MySQL database which is 6.2 GBytes plus 42 GBytes for the log files.

I’m currently suffering of a lack of disk space, I’ve got about 20 GBytes left on the pool and need to do with that, or find another solution.

Too much time to load pages

Some page on WeSunSolve are taking huge time to load. I need to find a solution to load pages dynamically, item per item with AJAX or so. This will be the next improvement foreseen on the WebGUI.

How can you help ?

Spread the word!

The most needed thing is to get known by more sysadmins, developers and anyone who can find this website usefull. So if you got a blog, blog it! If you have a website, add a little link to wesunsolve.net! Have colleagues at work that work with Solaris ? Drop them a mail to let them know about it 😉

Hire us 😉

Currently, the guys behind WeSunSolve are actually sysadmins, so if you ever need a remote sysadmin or a Belgium located one, don’t hesitate to contact us! You have a migration project ? You need some advise on patching ? You need to teach something to your sysadmin team ? We can help you on that 😉

Recommend us 😉

If you like WeSunSolve, you can also recommend the owner on Linkedin for example. Just send me an invitation and then add a recommendation on the WeSunSolve job. I’m also on Xing under the same name…

Join us and comment

You can still join us on IRC and have a chat or drop an e-mail to the owner.. It’s always good to receive a quick word saying that what you’ve done is cool. Also you can help us improving WeSunSolve by dropping comments, suggestions and so on. What would you like to see on it ? Which feature is missing for you ? How can it improve your daily work ?

Now

A quick word on what we’re experiencing now:

WeSunSolve.net is currently seeing 2300 Visitors per day on peak days, which is terrific! We’ve always seen the curve going higher and higher since the start. I hope it will go on this way 😉

Monthly Visits graph

Yearly Visits graph

Thanks to all of you who are daily using the website 😉

Posted in WeSunSolve | Leave a comment

Solaris 10u10 is now out !

And there it is… Solaris 10 update 10 ISO are now available for download.

Get them here

Happy live upgrade !

Posted in Solaris | Leave a comment

Solaris 11 Early Adopters release.

Yesterday, Oracle announced the Solaris 11 Early Adopters release, a new bunch of ISOs are available for download, only for Gold members of Oracle/SUN.

Anyway, if you’re a lucky one, you can have it here.

Rumors said in the wild that theses ISO are based on the build 172 or 173, I’m currently busy downloading ISO to give it a quick try..

EDIT: Confirmed, the build is rev 173 EDIT2:

Zpool version is now 33 which introduces “1MB Block size” as well as “Improved share support”.

Posted in Solaris | Leave a comment

ZFS Fragmentation issue – examining the ZIL

During the past days, I’ve been troubleshooting some Zpools with heavy databases usage. The issue reported by the customer was a huge performance decrease after the last reboot of the cluster. The issue has been finally troubleshooted and has been identified as being a ZFS Fragmentation problem. I’ll try now to share the lesson learned with this issue.

First of all, the performance problem has been investigated with a drill down method, we used teamquest to visualize the differences between now and the week before, what we discovered is that the I/O on the pool which holds the DBF of the database had literally exploded. We had sometimes more that 50K write iops balanced accross the different vdev of the affected pool.

The pool was actually configured like this one:

 # zpool status i-ora-pro06-dat1-pl pool: i-ora-pro06-dat1-pl state: ONLINE scrub: none requested config: NAME                                       STATE     READ WRITE CKSUM i-ora-pro06-dat1-pl                        ONLINE       0     0     0 mirror-0                                 ONLINE       0     0     0 c6t60060E800571FC00000071FC000020C3d0  ONLINE       0     0     0 c6t60060E800570FB00000070FB000020C3d0  ONLINE       0     0     0 mirror-1                                 ONLINE       0     0     0 c6t60060E800571FC00000071FC000020C4d0  ONLINE       0     0     0 c6t60060E800570FB00000070FB000020C4d0  ONLINE       0     0     0 mirror-2                                 ONLINE       0     0     0 c6t60060E800571FC00000071FC000020C5d0  ONLINE       0     0     0 c6t60060E800570FB00000070FB000020C5d0  ONLINE       0     0     0 mirror-3                                 ONLINE       0     0     0 c6t60060E800571FC00000071FC000020C6d0  ONLINE       0     0     0 c6t60060E800570FB00000070FB000020C6d0  ONLINE       0     0     0 errors: No known data errors

The SAN disks behind have been able to handle a lot of I/O and the SAN was also checked for any problem, but, clearly, the problem was the heavy IOPs load on the LUNs.

Following to this, we’ve ran a zpool iostat -v i-ora-pro06-dat1-pl 2 for a while, to confirm what we were thinking. This confirmed the heavy write load on the vdev.

Still with teamquest, we were able to see that the kind of write operations that were done on disks, were actually very tiny write blocks.

We have then opened a support case at the Oracle-SUN’s support and uploaded some Guds traces which exposed the problem we faced. Here are the complete explanation of this problem as well as the way to detect it and also the fix.

Continue reading

Posted in Solaris | Leave a comment

sunsolve.espix.org : A new tool for Solaris sysadmins…

When the merge of Oracle and SUN became reality, we lost one of the greatest documentation portal for Solaris: Sun Solve.

While I’m a Solaris sysadmin myself, I needed a tool to manage my daily patching, to ease search with bugs, patches and dependancies. I needed also something that could track what I was applying to each system. After some thinking, I came up with a solution: We Sun Solve !

Indeed, I’ve decided not to keep my work for myself, but to share it amongst every solaris sysadmin who want to use it. Check it out! Give me feedback, ideas and any thing that you’re thinking in front of such portal..

You can also come and discuss with us on IRC #sunsolve @ irc.freenode.org

Posted in WeSunSolve | Leave a comment