Author Archives: Thomas Gouverneur
Recently I tried ifstat on a freshly updated linux box with a 3.2 kernel, it was reporting nothing (0kb) although the network was heavily used. A quick one-liner confirmed this and I decided to keep it here for later use: … Continue reading
Situation Today at work, I needed to prepare the patching of a two-nodes cluster. Not only I should patch this cluster, but I should also mimic the same patching level as the currently used prod server. Well Well. Instead of … Continue reading
A few ago, I’ve found the sfpC.pl script from Glenn Brunette which was used to check the fingerprint of any Solaris file against the SunSolve Fingerprint Database. I’ve managed to port this script to be used against WeSunSolve fingerprint database, … Continue reading
I was wondering since some times how I could authenticate on MOS using CLI. Now I got my answer: By reversing the whole SSO auth process in a python script that generate a cookies.txt file, usable with wget. I know … Continue reading
6 Months ago, I created the “sunsolve.espix.org” website running at home on my personal VDSL link; 6 Months later, the website is still evolving. Where we started Everything started when SunSolve.sun.com has died. I remember my frustration at that time.. … Continue reading
And there it is… Solaris 10 update 10 ISO are now available for download. Get them here Happy live upgrade !
Yesterday, Oracle announced the Solaris 11 Early Adopters release, a new bunch of ISOs are available for download, only for Gold members of Oracle/SUN. Anyway, if you’re a lucky one, you can have it here. Rumors said in the wild … Continue reading
During the past days, I’ve been troubleshooting some Zpools with heavy databases usage. The issue reported by the customer was a huge performance decrease after the last reboot of the cluster. The issue has been finally troubleshooted and has been identified as being a ZFS Fragmentation problem. I’ll try now to share the lesson learned with this issue.
First of all, the performance problem has been investigated with a drill down method, we used teamquest to visualize the differences between now and the week before, what we discovered is that the I/O on the pool which holds the DBF of the database had literally exploded. We had sometimes more that 50K write iops balanced accross the different vdev of the affected pool.
The pool was actually configured like this one:
# zpool status i-ora-pro06-dat1-pl pool: i-ora-pro06-dat1-pl state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM i-ora-pro06-dat1-pl ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c6t60060E800571FC00000071FC000020C3d0 ONLINE 0 0 0 c6t60060E800570FB00000070FB000020C3d0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c6t60060E800571FC00000071FC000020C4d0 ONLINE 0 0 0 c6t60060E800570FB00000070FB000020C4d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c6t60060E800571FC00000071FC000020C5d0 ONLINE 0 0 0 c6t60060E800570FB00000070FB000020C5d0 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 c6t60060E800571FC00000071FC000020C6d0 ONLINE 0 0 0 c6t60060E800570FB00000070FB000020C6d0 ONLINE 0 0 0 errors: No known data errors
The SAN disks behind have been able to handle a lot of I/O and the SAN was also checked for any problem, but, clearly, the problem was the heavy IOPs load on the LUNs.
Following to this, we’ve ran a zpool iostat -v i-ora-pro06-dat1-pl 2 for a while, to confirm what we were thinking. This confirmed the heavy write load on the vdev.
Still with teamquest, we were able to see that the kind of write operations that were done on disks, were actually very tiny write blocks.
We have then opened a support case at the Oracle-SUN’s support and uploaded some Guds traces which exposed the problem we faced. Here are the complete explanation of this problem as well as the way to detect it and also the fix.
When the merge of Oracle and SUN became reality, we lost one of the greatest documentation portal for Solaris: Sun Solve. While I’m a Solaris sysadmin myself, I needed a tool to manage my daily patching, to ease search with … Continue reading