Today I wanted to know how much my web server was used, since it was online since a bit more than a year now…
But as there were a lot of virtual hosts on it, How could I see the global statistics ?
First, let’s take all the log files :
/tmp/ap $ i=0;for f in `find /home/www -name "*.log-*.gz"`; do echo "copy $f to $i.`basename $f`";cp $f $i.`basename $f`; i=`expr $i + 1`; done /tmp/ap $ ls |wc -l 10135
Wow… 10135 files.. Let’s just ungzip them and see..
/tmp/ap $ for i in *.gz; do gzip -d $i; done /tmp/ap $ du -hs . 442M .
Okay.. now, we will just concatenate all the log together. We will just keep the distinction between SSL and non-SSL traffic:
/tmp/ap $ cat *clear-access* > clear.log /tmp/ap $ cat *ssl-access* > ssl.log /tmp/ap $ ls -alh ssl.log clear.log -rw-r--r-- 1 wildcat wildcat 327M 2008-07-28 07:15 clear.log -rw-r--r-- 1 wildcat wildcat 26M 2008-07-28 07:15 ssl.log
So, now that we have our big file with all hits, we need to sort it by date, otherwise it won’t be treated by such engines like webalizer or awstats…
/tmp/ap $ cat clear.log | perl -ne 'BEGIN { @m{qw/Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec/}=("01".."12"); } m!^\S+ \S+ \S+ \[(\S+)/(\S+)/(\d{4}):(\d{2}):(\d{2}):(\d{2})\s+!; print $3.$m{$2}.$1.$4.$5.$6." ".$_' | sort -n | cut -d\ -f 2- > clear-sorted.log /tmp/ap $ cat ssl.log | perl -ne 'BEGIN { @m{qw/Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec/}=("01".."12"); } m!^\S+ \S+ \S+ \[(\S+)/(\S+)/(\d{4}):(\d{2}):(\d{2}):(\d{2})\s+!; print $3.$m{$2}.$1.$4.$5.$6." ".$_' | sort -n | cut -d\ -f 2- > ssl-sorted.log
And now we can pass this file trough awffull.
bin $ awffull -p -c ${BINDIR}/awffull.conf -o ${STATSDIR}/general/clear $LOGDIR/clear-sorted.log bin $ awffull -p -c ${BINDIR}/awffull.conf -o ${STATSDIR}/general/ssl $LOGDIR/ssl-sorted.log
That’s all folks!