I’m frequently monitoring webservers, cache servers, database servers, etc by tailing their log files. See my previous post on making logs easier to monitor by color.
Sometimes you also have too much data, and you don’t want to look at all of it. Use this to sample.
sample source:
#!/usr/bin/perl $|++; use strict; use Getopt::Long; my $USAGE = join '', <DATA>; my $T = 0; my $K = 0; my $P = 1; my $H = 0; my $N = 0; my $S = 0; GetOptions ("time|t=i" => \$T, "number|n=i" => \$N, "count|k=i" => \$K, "prob|p=f" => \$P, "shuffle|s" => \$S, "help|h" => \$H, ); if ( ($T > 0 && $P != 1) || ($K > 0 && $P != 1) || ($K < 0 || $P < 0 || $T < 0 || $N < 0 || $P > 1 ) || ($T > 0 && $N > 0) || ($H) ) { print $USAGE and exit(1); } my $position = 0; my @buf = (); my $before = time(); while ( my $element = <> ) { # sample full stream, report at the end # sample K elements every T seconds if ( $K > 0 ) { if ( scalar( @buf ) < $K ) { push @buf, [$position, $element]; } elsif ( $K/$position < rand() ) { my $index = int(rand($K)); $buf[ $index ] = [$position, $element]; #save position for sort } #time-based K-sampling if ( $T > 0 && time() > $before + $T ) { flush(); } #event-based K-sampling elsif ( $N > 0 && $position > $N ) { flush(); } } # sample with probability elsif ( $P < 1 && rand() < $P ) { print $element; } $position++; } flush(); sub flush { $before = time(); #Knuth shuffle if ( $S ) { for ( my $j = scalar( @buf ) - 1 ; $j >= 0 ; $j-- ) { my $swap = int(rand($j)); if ( $swap != $j ) { ($buf[ $j ], $buf[ $swap ]) = ($buf[ $swap ], $buf[ $j ]); } print $buf[ $j ]->[ 1 ]; } } else { foreach my $b ( sort {$a->[0] <=> $b->[0]} @buf ) { print $b->[1]; } } @buf = (); $position = 0; } __DATA__ Usage: sample -[[h][p][t[k[n]]]] Sample lines from a stream on STDIN. Write lines to STDOUT. -h show help (this message) -k sample K elements from stream (default 0) range: 0.. -p sample elements from stream with probability (default 1) range: 0 <= p <= 1 -n sample over windows of N elements (default 0) range: 0.. -t sample over windows of T seconds (default 0, instantaneous with -p, infinity with -k) range: 0.. -s shuffle outputs (default false) There are two modes of sampling: * sample with probability (-p) * sample a fixed number of elements (-k) Both modes sample over a given time interval in seconds (-t). -t defaults to zero (process full stream). -p can only be used alone. -n can only be used with -k Examples: * sample K elements from a stream: cat /etc/passwd | sample -k 5 * sample 1% of elements from a stream: tail -f /var/logs/httpd/access_log | sample -p 0.01 * sample K elements from a stream every 30 seconds: tail -f /var/logs/httpd/access_log | sample -k 5 -t 30 * sample K elements from a stream every 30 seconds, shuffled: tail -f /var/logs/httpd/access_log | sample -k 5 -t 30 -s * sample K elements from a stream every 100 elements: tail -f /var/logs/httpd/access_log | sample -k 5 -n 100 Copyright/License: Allen Day <allenday@ucla.edu>, licensed under GPL 2006-2008
Post a Comment