SSH connection to Solaris 11 is sometimes slow…

Today at work, we migrated the first box to Solaris 11 and we experienced the first bug as soon as we needed to log in onto the server.

As theses delays are quite common when the SSHd is configured by default, I quickly added theses lines to remove GSSAPI and DNS common issues:

/etc/ssh/sshd_config

LookupClientHostnames no VerifyReverseMapping no GSSAPIAuthentication no

Although, theses settings didn’t fixed the problem.

I added some verbosity to both ssh client and server and tracked down the delay to happen at this stage of the connection:

On the client:

$ ssh -v -p 2222 s11box -l adminifm OpenSSH_5.8p1-hpn13v10lpk, OpenSSL 1.0.0c 2 Dec 2010
 debug1: Reading configuration data /home/wildcat/.ssh/config
 debug1: Reading configuration data /etc/ssh/ssh_config
 debug1: Connecting to admblockum04 10.2.12.155 port 2222.
 debug1: Connection established. debug1: identity file /home/wildcat/.ssh/id_rsa type -1
 debug1: identity file /home/wildcat/.ssh/id_rsa-cert type -1
 debug1: identity file /home/wildcat/.ssh/id_dsa type 2
 debug1: identity file /home/wildcat/.ssh/id_dsa-cert type -1
 debug1: identity file /home/wildcat/.ssh/id_ecdsa type -1
 debug1: identity file /home/wildcat/.ssh/id_ecdsa-cert type -1
 debug1: Remote protocol version 2.0, remote software version Sun_SSH_2.0
 debug1: no match: Sun_SSH_2.0
 debug1: Enabling compatibility mode for protocol 2.0
 debug1: Local version string SSH-2.0-OpenSSH_5.8p1-hpn13v10lpk
 debug1: SSH2_MSG_KEXINIT sent
 *** HANG ***

And on the server:

 debug1: Reloading X.509 host keys to avoid PKCS#11 fork issues.monitor
 debug1: reading the context from the   child
 debug1: use_engine is 'no'
 debug1: list_hostkey_types: ssh-rsa,ssh-dss
 debug1: My KEX proposal before adding the GSS KEX algorithm:
 debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-  group1-sha1
 debug2: kex_parse_kexinit: ssh-rsa,ssh-dss debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour128,arcfour256,arcfour
 debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour128,arcfour256,arcfour debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,hmac-sha1-96,hmac-md5-96
 debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,hmac-sha1-96,hmac-md5-96
 debug2: kex_parse_kexinit: none,zlib
 debug2: kex_parse_kexinit: none,zlib
 debug2: kex_parse_kexinit: de-DE,en-US,es-ES,fr-FR,it-IT,ja-JP,ko-KR,pt-BR,zh-CN,zh-TW,i-default
 debug2: kex_parse_kexinit: de-DE,en-US,es-ES,fr-FR,it-IT,ja-JP,ko-KR,pt-BR,zh-CN,zh-TW,i-default
 debug2: kex_parse_kexinit: first_kex_follows 0
 debug2: kex_parse_kexinit: reserved 0
 *** HANG ***

Adding truss of the server process is helping us a lot:

  17477:  so_socket(PF_INET, SOCK_STREAM, IPPROTO_IP, 0, SOV_DEFAULT) = 3
  17469:  pollsys(0x080451C0, 1, 0x00000000, 0x00000000) (sleeping...)
  17477:  connect(3, 0x08047030, 16, SOV_DEFAULT) (sleeping...)

The lock is happening just after the connect() syscall. We can now check the pfiles of this process together with a netstat to identify which connection is causing trouble to be established:

  root@admblockum04:/local/home/ucc# pfiles 16585
 16585:  /usr/lib/ssh/sshd -f /etc/ssh/sshd_config -p 2222 -ddd -D
 Current rlimit: 256 file descriptor 0: S_IFCHR mode:0620 dev:532,0 ino:3502445063 uid:60004 gid:7 rdev:221,8 O_RDWR|O_NOCTTY|O_LARGEFILE /dev/pts/8 offset:43267 1: S_IFCHR mode:0620 dev:532,0 ino:3502445063 uid:60004 gid:7 rdev:221,8 O_RDWR|O_NOCTTY|O_LARGEFILE /dev/pts/8 offset:43267 2: S_IFCHR mode:0620 dev:532,0 ino:3502445063 uid:60004 gid:7 rdev:221,8 O_RDWR|O_NOCTTY|O_LARGEFILE /dev/pts/8 offset:43267 3: S_IFSOCK mode:0666 dev:540,0 ino:34566 uid:0 gid:0 size:0 O_RDWR SOCK_STREAM SO_SNDBUF(49152),SO_RCVBUF(131072) sockname: AF_INET 127.0.0.1  port: 55867 congestion control: newreno 4: S_IFSOCK mode:0666 dev:540,0 ino:5714 uid:0 gid:0 size:0 O_RDWR|O_NONBLOCK SOCK_STREAM SO_REUSEADDR,SO_KEEPALIVE,SO_SNDBUF(49152),SO_RCVBUF(128872) sockname: AF_INET6 ::ffff:10.2.12.155  port: 2222 peername: AF_INET6 ::ffff:10.2.60.1  port: 43575 congestion control: newreno 5: S_IFIFO mode:0000 dev:529,0 ino:70783 uid:0 gid:0 size:0 O_RDWR 6: S_IFIFO mode:0000 dev:529,0 ino:70783 uid:0 gid:0 size:0 O_RDWR 8: S_IFIFO mode:0000 dev:529,0 ino:70784 uid:0 gid:0 size:0 O_RDWR FD_CLOEXEC
 # netstat -an|grep 55867 127.0.0.1.55867      127.0.0.1.30003          0      0 131072      0 SYN_SENT

The port 30003 is the default port of tcsd daemon, which is managing physical cryptography (through /dev/tpm). If there is no hardware crypto devices, this daemon is disabled. It seems though that cryptoadm is linking tpm crypto mechanism by default, enabling ssh to trying to access this daemon.

Workaround found (just to confirm slowliness is caused by tcsd):

Run this command on the server:

 # nc -e 'cat /dev/null' localhost 30003

and try to ssh the box, it should be fast.

Permanent workaround:

Simply remove the pcks11_tpm provider from the crypto framework:

cryptoadm disable provider=/usr/lib/security/\$ISA/pkcs11_tpm.so mechanism=all
cryptoadm uninstall provider=/usr/lib/security/\$ISA/pkcs11_tpm.so

Other references:

This entry was posted in Solaris. Bookmark the permalink.

1 Response to SSH connection to Solaris 11 is sometimes slow…

  1. Hi,

    Thank you so much. I’ve just hit the same issue on an M3000 — 9/10 SSH logins were failing and so were outbound SSL requests (also I suspect it’s happened a few times over the last few months where it seemed tcp traffic stopped working in one of our zones, but that’s just a guess at this point)

    Really appreciate the post and the hard work

Leave a Reply

Your email address will not be published. Required fields are marked *