Perl

How to fix the meetup.com broken exported calendars.

I’m a big fan of meetup.com, but they’re so tragically unhip when it comes to mashups/integration/web 2.0.  One of my biggest gripes until about 6 months ago was that they had no facility (besides API) for exporting a calendar of meetups to my calendar app (I use Google Calendar), or any other calendar app for that matter.

They introduced an export feature recently, but it’s pretty useless.  Here’s why: they offer two calendars

  • [Calendar A] contains all upcoming items in all your meetup groups
  • [Calendar B] contains upcoming items which you have RSVP’d with “yes” or “maybe”.

That’s it.  The calendars exported don’t even contain links that allow you to RSVP from directly inside your calendar — you have click through to the meetup.com site, log in, then RSVP.  Ugh.

 

Come on, product guys.  What’s really called for is 4 separate calendars.

  • [Calendar "yes"] All groups, “yes” events
  • [Calendar "maybe"] All groups, “maybe” events
  • [Calendar "no"] All groups, “no” events
  • [Calendar "none"] All groups, events to which I have not yet submitted an RSVP.

I was finally just pissed off enough about the status quo that I fixed it for myself, and below I share the code.  You can try it out here: http://spicylogic.com/allenday/cgi-bin/mu.cgi?key=<your_api_key>&cal=<calendar> 

where <your_api_key> can be found here and <calendar> is one of “yes”, “no”, “none”, “maybe”.

Okay, here’s the code.  Install it on your own machine if possible, my ISP will appreciate it.  If you find fuckups, let me know and I’ll update the post.

#!/usr/bin/perl
use strict;
use CGI qw(:standard);
use Date::Manip qw(ParseDate ParseDateString ParseDateDelta DateCalc UnixDate);
use Date::Parse;
use HTML::Entities;
use LWP::Simple qw(get);
use XML::DOM;
 
use constant URL_EVENTS =&gt; 'http://api.meetup.com/events?key=%s&amp;member_id=%d&amp;format=xml';
 
print header(q(text/calendar));
 
my $parser = new XML::DOM::Parser ();
 
my $mode = param( 'cal' );
my $key  = param( 'key' );
my $user = param( 'user' );
 
if ( ! $mode || ! $key || ! $user ) {
  die
}
 
my $events_url = sprintf( URL_EVENTS, $key, $user );
#warn $events_url;
my $events_txt = get( $events_url );
#warn $events_txt;
my $events_dom = $parser-&gt;parse( $events_txt );
#warn $events_dom;
 
print qq(BEGIN:VCALENDAR\nPRODID:-//Meetup Inc//RemoteApi//EN\nVERSION:2.0\nMETHOD:PUBLISH\nCALSCALE:GREGORIAN\nX-ORIGINAL-URL:http://www.meetup.com/\nX-WR-CALNAME:mu $mode\n);
 
my $events = $events_dom-&gt;getElementsByTagName( 'item' );
for ( my $i = 0 ; $i &lt; $events-&gt;getLength() ; $i++ ) {
  my $event = $events-&gt;item( $i );
  my $n_id    = $event-&gt;getElementsByTagName( 'id'             )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_rsvp  = $event-&gt;getElementsByTagName( 'myrsvp'         )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_addr0 = $event-&gt;getElementsByTagName( 'venue_name'     )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_addr1 = $event-&gt;getElementsByTagName( 'venue_address1' )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_addr2 = $event-&gt;getElementsByTagName( 'venue_address2' )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_addr3 = $event-&gt;getElementsByTagName( 'venue_address3' )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_addr4 = $event-&gt;getElementsByTagName( 'venue_city'     )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_addr5 = $event-&gt;getElementsByTagName( 'venue_state'    )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_addr6 = $event-&gt;getElementsByTagName( 'venue_zip'      )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_desc  = $event-&gt;getElementsByTagName( 'description'    )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_link  = $event-&gt;getElementsByTagName( 'event_url'      )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_name  = $event-&gt;getElementsByTagName( 'name'           )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_lat   = $event-&gt;getElementsByTagName( 'venue_lat'      )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_lon   = $event-&gt;getElementsByTagName( 'venue_lon'      )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_start_time  = $event-&gt;getElementsByTagName( 'time'           )-&gt;item( 0 )-&gt;getFirstChild();
 
  my $start_time;
  my $end_time;
 
  #my $dummy_time = "20000101T000000Z";
  my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time());
  my $dummy_time = sprintf( q(%04d%02d%02dT%02d%02d%02dZ), $year + 1900, $mon + 1, $mday, $hour, $min, $sec );
 
  if ( $n_start_time ) {
    my ($ss,$mm,$hh,$day,$month,$year,$zone);
 
    ($ss,$mm,$hh,$day,$month,$year,$zone) = strptime( $n_start_time-&gt;toString() );
    $start_time = sprintf( q(%04d%02d%02dT%02d%02d%02dZ), $year + 1900, $month + 1, $day, $hh, $mm, $ss );
 
    my $eday = $day;
    if ( $hh == 23 ) {
      $eday = $day + 1;
    }
    my $ehh = ($hh + 1) % 24;
    $end_time   = sprintf( q(%04d%02d%02dT%02d%02d%02dZ), $year + 1900, $month + 1, $eday, $ehh, $mm, $ss );
  }
  else {
    $start_time = '';
    $end_time = '';
  }
 
  if ( $mode eq $n_rsvp-&gt;toString() ) {
    my $id   = $n_id-&gt;toString();
    my $name = $n_name ? $n_name-&gt;toString() : "";
    my $desc = $n_desc ? $n_desc-&gt;toString() : "";
    my $addr = ( $n_addr0 ? $n_addr0-&gt;toString().', ' : "" )
             . ( $n_addr1 ? $n_addr1-&gt;toString().', ' : "" )
             . ( $n_addr2 ? $n_addr2-&gt;toString().', ' : "" )
             . ( $n_addr3 ? $n_addr3-&gt;toString().', ' : "" )
             . ( $n_addr4 ? $n_addr4-&gt;toString().', ' : "" )
             . ( $n_addr5 ? $n_addr5-&gt;toString().', ' : "" )
             . ( $n_addr6 ? $n_addr6-&gt;toString() : "" );
    #$desc =~ s/(.)/(ord($1) &gt; 127) ? "" : $1/egs;
 
    $name = HTML::Entities::decode_entities( $name );
    $desc = HTML::Entities::decode_entities( $desc );
    $addr = HTML::Entities::decode_entities( $addr );
    $name =~ s/,/\\,/g;
    $desc =~ s/,/\\,/g;
    $addr =~ s/,/\\,/g;
 
    $desc =~ s#
#\\n#gs;
    $desc .= "\\n\\n\\nGoing?\\n\\n";
    foreach my $response ( qw( yes no maybe ) ) {
      $desc .= uc($response).qq(: http://api.meetup.com/rsvp?event_id=$id&amp;key=$key&amp;rsvp=$response\\n);
    }
 
    my $geo = $n_lat &amp;&amp; $n_lon ? "GEO:" . $n_lat-&gt;toString() . ";" . $n_lon-&gt;toString() . "\n" : undef;
 
    #print sprintf( qq(BEGIN:VEVENT\nSUMMARY:%s\nDESCRIPTION:%s\nLAST-MODIFIED:%s\nUID:%s\nCLASS:%s\nCREATED:%s\nDTSTAMP:%s\nDTSTART:%s\nDTEND:%s\nLOCATION:%s\n\nURL:%s\nEND:VEVENT\n),
    print sprintf( qq(BEGIN:VEVENT\nSUMMARY:%s\nDESCRIPTION:%s\nLAST-MODIFIED:%s\nUID:%s\nCLASS:%s\nCREATED:%s\nDTSTAMP:%s\nDTSTART:%s\nDTEND:%s\n%sLOCATION:%s\nURL:%s\nEND:VEVENT\n),
      $name,
      $desc,
      $start_time,
      "event_$id\@meetup.com",
      "PUBLIC",
      $dummy_time,
      $dummy_time,
      $start_time,
      $end_time,
      $geo,
      $addr,
      $n_link ? $n_link-&gt;toString() : "",
    );
  }
}
 
print qq(END:VCALENDAR\n);

Administration
Life
Networking
Perl
Software

Comments (1)

Permalink

Synthetic GFF Dataset for Genome Browser Benchmark

I deployed a Gbrowse/Chado installation last week at Dow Agrosciences.  It got me thinking about how slow and basic the searches are with the Bio::DB::Das::Chado* adaptor, and wouldn’t it be nice to use SOLR here?

I made up a test dataset of gene/mRNA/exon 3-tiered feature groups by permuting some gene model data from the knownGene annotation set of the Hg18 build of the human genome.  You can grab the data set and script used to generate it here.  There are several files mRNA.EN.txt.gz that contain gzipped gene models, where N=3..7 indicates there are 10^N models in the file, uniformly distributed across a 500-megabase reference sequence.

I’m planning to load these data into a couple of different systems and then compare performance on some of the typical Bio::DB::GFF API calls.  I can personally test on:

  • Chado
  • The default Bio::DB::GFF schema (does it have a name?)
  • The SOLR backend I’m about to implement

I know there are other feature DBs out there.  It would be good to include them as well in a later pass or to have someone else contribute the data once I get the benchmarking script written.

Genomics
Informatics
Java
Perl
Scalability
Science

Comments (0)

Permalink

ZIP code demographic data with Perl

I needed some demographics data earlier this week and tried using the SF3 files from census.gov’s “Census 2000″ data set.

What a time sink. Ugh.

The methods used are very well documented, and I learned a lot about the census. What I was not able to learn, however, was how to actually extract the data from the flat files. Look at what Joshua Tauberer went through to get some idea of the pain level.

Finally I got fed up and wrote a screen scraper for ZIPskinny.com in Perl. It’s one-off crappy code. You can get it from CPAN under namespace Geo::Demo::Zipskinny.

Hope it saves you some time. Leave me a comment if you have working code that can deal with SF3 files.

Here’s a little ZIP code to rich-vs-poor plot I made earlier.

Analytics
Perl
Science
Statistics

Comments (0)

Permalink

aggregate – report event counts from a stream

Another shell utility. This one is useful for, e.g. counting 404, 500, 200, 302 HTTP codes from a log file.

#!/usr/bin/perl
$|++;
use strict;
use Getopt::Long;
 
my $mode = 'line';
my $tick = 100;
my $help = undef;
my $keysfile = undef;
my %keys = ();
 
GetOptions(
  'mode|m=s' => \$mode,
  'tick|t=i' => \$tick,
  'help|h'   => \$help,
  'keys|k=f' => \$keysfile,
);
 
if ( $help || ( $mode ne 'line' && $mode ne 'time' ) || $tick <= 0 || ( defined($keysfile) && !-f $keysfile ) ) {
  my $USAGE = join '', <DATA>;
  print STDERR $USAGE and exit(1);
}
 
if ( $keysfile ) {
  open(K, $keysfile) or die "Couldn't open keys file '$keysfile': $!";
  while ( my $line = <K> ) {
    chomp $line;
    $keys{ $line }++;
  }
  close(K);
}
 
my %count = %keys;
my $offset = 0;
my $mark = 0;
my $offset = 0;
 
if ( $mode eq 'time' ) {
  $mark = time();
}
 
while ( my $element = <> ) {
  chomp $element;
  if ( scalar( %keys ) ) {
    $count{ $element }++ if $keys{ $element };
  }
  else {
    $count{ $element }++;
  }
 
  if ( $mode eq 'line' ) {
    $offset++;
    $mark++;
    if ( $mark >= $tick ) {
      $mark = 0;
      flush();
    }
  }
  elsif ( $mode eq 'time' ) {
    if ( $mark + $tick < time() ) {
      $offset = time();
      $mark = time();
      flush();
    }
  }
}
flush();
 
sub flush {
  print "summary/$tick @ $offset\n";
  foreach my $k ( sort keys %count ) {
    print "\t", $count{ $k }, "\t", $k, "\n";
  }
  %count = %keys;
}
 
__DATA__
Usage: aggregate [-h] [-m <time|line>] [-t <# of seconds or lines>] [-k <keys file>]
 
Read lines from STDIN.  Print lines by frequency per input lines or time.
 
  -h    show help (this message)
  -m    mode.  one of 'time' or 'line'.  defaults to 'line'.
  -t    aggregation size.  an integer.  value is # of lines ('line' mode) or # of
        seconds ('time' mode) after which an aggregation is triggered.  defaults to 100.
  -k    keys file.  a text file of strings to *exactly* match in the input, one per line.
        if a keys file is provided, lines not present in the keys file will be silently
        ignored.

Administration
Analytics
Perl

Comments (0)

Permalink

shuffle – randomize a stream of data

Here’s another little shell utility I’ve been sitting on for a while. This one shuffles the line-oriented data read from a pipe. It has the notion of buffering and partial flushing so we can handle streams / very large data sets.

#!/usr/bin/perl
$|++;
use strict;
use Getopt::Long;
 
my $USAGE = join '', <DATA>;
 
my $B = 0;
my $D = 1;
my $H = 0;
 
GetOptions ("buffer|b=i"   => \$B,
            "draw|d=i"     => \$D,
            "help|h"       => \$H,
           ); 
 
if ( $D == 1 && $B > 0 ) {
  $D = $B;
}
 
if (
  ($B < 0) ||
  ($D < 1) ||
  ($B > $D) ||
  ($H)
) {
  print $USAGE and exit(1);
}
 
 
my @buf = ();
 
while ( my $element = <> ) {
  #buffer whole stream
  if ( $B == 0 ) {
    push @buf, $element;
  }
  #no-op
  elsif ( $B == 1 ) {
    print $element;
  }
  #buffer window
  else {
    push @buf, $element;
    if ( scalar( @buf ) >= $D && scalar( @buf ) > $B ) {
      flush();
    }
  }
}
flush();
 
sub flush {
  for ( my $j = scalar( @buf ) - 1 ; $j >= 0 ; $j-- ) {
    my $swap = int(rand($j));
    if ( $swap != $j ) {
      ($buf[ $j ], $buf[ $swap ]) = ($buf[ $swap ], $buf[ $j ]);
    }
  }
  while ( scalar( @buf ) - 1 > $B - $D ) {
    print shift @buf;
  }
}
 
 
__DATA__
Usage: shuffle [-h] [-b <buffer size>] [-d <draw size>]
 
Shuffle lines from a stream on STDIN.  Write lines to STDOUT.
 
  -h    show help (this message)
  -b    buffer size
        (default 0.  indicates shuffle whole stream, then write)
        range: 1..
  -d    draw size
        (defaults to value of -b.  number of items to remove from the
        buffer when it fills)
        range: 1..buffer size
 
You have to parameters available (besides -h for help).
 
* buffer size (-b).  Determines how many elements to temporarily hold
before shuffling.  The advantage of this buffer is to allow shuffling on
very long streams that would not fit into system memory.  The
disadavantage is that it is not a truly random shuffle, as each input
element can appear at most buffer-size positions away from the original
position.  Buffer size defaults to zero, so make sure to set it if your
data set size is large.
 
* draw size (-d).  Determines how frequently the buffer is shuffled and
flushed.  Rather than shuffling/flushing all elements in the buffer, only
do D elements.  The advantage here is elements can appear more than
buffer-size positions away from the original position.  The disadvantage
is that shuffling is done B/D times more frequently.  Draw size defaults
to buffer size, and has no effect.  Set it to 1 to maximize randomness.
 
Copyright/License:
 
  Allen Day <allenday@ucla.edu>, licensed under GPL 2006-2008

Administration
Analytics
Perl

Comments (0)

Permalink

sample – probabilistic sampling from a stream of lines

I’m frequently monitoring webservers, cache servers, database servers, etc by tailing their log files. See my previous post on making logs easier to monitor by color.

Sometimes you also have too much data, and you don’t want to look at all of it. Use this to sample.

sample source:

#!/usr/bin/perl
$|++;
use strict;
use Getopt::Long;
 
my $USAGE = join '', <DATA>;
 
my $T = 0;
my $K = 0;
my $P = 1;
my $H = 0;
my $N = 0;
my $S = 0;
 
GetOptions ("time|t=i"     => \$T,
            "number|n=i"   => \$N,
            "count|k=i"    => \$K,
            "prob|p=f"     => \$P,
            "shuffle|s"    => \$S,
            "help|h"       => \$H,
           ); 
 
if (
  ($T > 0 && $P != 1) ||
  ($K > 0 && $P != 1) ||
  ($K < 0 || $P < 0 || $T < 0 || $N < 0 || $P > 1 ) ||
  ($T > 0 && $N > 0) ||
  ($H)
) {
  print $USAGE and exit(1);
}
 
my $position = 0;
my @buf = ();
my $before = time();
 
while ( my $element = <> ) {
  # sample full stream, report at the end
  # sample K elements every T seconds
  if ( $K > 0 ) {
    if ( scalar( @buf ) < $K ) {
      push @buf, [$position, $element];
    }
    elsif ( $K/$position < rand() ) {
      my $index = int(rand($K));
      $buf[ $index ] = [$position, $element]; #save position for sort
    }
    #time-based K-sampling
    if ( $T > 0 && time() > $before + $T ) {
      flush();
    }
    #event-based K-sampling
    elsif ( $N > 0 && $position > $N ) {
      flush();
    }
  }
  # sample with probability
  elsif ( $P < 1 && rand() < $P ) {
    print $element;
  }
  $position++;
}
flush();
 
sub flush {
  $before = time();
  #Knuth shuffle
  if ( $S ) {
    for ( my $j = scalar( @buf ) - 1 ; $j >= 0 ; $j-- ) {
      my $swap = int(rand($j));
      if ( $swap != $j ) {
        ($buf[ $j ], $buf[ $swap ]) = ($buf[ $swap ], $buf[ $j ]);
      }
      print $buf[ $j ]->[ 1 ];
    }
  }
  else {
    foreach my $b ( sort {$a->[0] <=> $b->[0]} @buf ) {
      print $b->[1];
    }
  }
  @buf = ();
  $position = 0;
}
 
 
__DATA__
Usage: sample -[[h][p][t[k[n]]]]
 
Sample lines from a stream on STDIN.  Write lines to STDOUT.
 
  -h    show help (this message)
  -k    sample K elements from stream
        (default 0)
        range: 0..
  -p    sample elements from stream with probability
        (default 1)
        range: 0 <= p <= 1
  -n    sample over windows of N elements
        (default 0)
        range: 0..
  -t    sample over windows of T seconds
        (default 0, instantaneous with -p, infinity with -k)
        range: 0..
  -s    shuffle outputs
        (default false)
 
There are two modes of sampling:
 
  * sample with probability (-p)
  * sample a fixed number of elements (-k)
 
Both modes sample over a given time interval in seconds (-t).
-t defaults to zero (process full stream).  -p can only be
used alone.  -n can only be used with -k
 
Examples:
 
  * sample K elements from a stream:
    cat /etc/passwd | sample -k 5
 
  * sample 1% of elements from a stream:
    tail -f /var/logs/httpd/access_log | sample -p 0.01
 
  * sample K elements from a stream every 30 seconds:
    tail -f /var/logs/httpd/access_log | sample -k 5 -t 30
 
  * sample K elements from a stream every 30 seconds, shuffled:
    tail -f /var/logs/httpd/access_log | sample -k 5 -t 30 -s
 
  * sample K elements from a stream every 100 elements:
    tail -f /var/logs/httpd/access_log | sample -k 5 -n 100
 
Copyright/License:
 
  Allen Day <allenday@ucla.edu>, licensed under GPL 2006-2008

Administration
Analytics
Perl

Comments (0)

Permalink

MogileFS + FUSE + Bigfile support

Jordan and I have been migrating all the TinyTube data over to a MogileFS storage system.

Seems to be a very easy-to-use solution for scalable storage of lots of little files. Basically just works out of the box, including the alpha-quality mount-filepaths FUSE adapter available for MogileFS in the SixApart SVN.

I have two problems with mount-filepaths though:

  1. Doesn’t recognize data loaded outside FUSE. It assumes I want to see/read files that were loaded via its mapping of FUSE directory/file names to MogileFS keys, and it doesn’t show other keys at all. I have lots of preloaded data, so this is a dealbreaker.
  2. Doesn’t support large files. MogileFS has a 64MB limit per file, and if you want to load a bigger file, it splits it into chunks. This FUSE adapter is not aware of how to deal with big files.

So I did some heavy modification of the script. Here’s my version. No directory support, but it lets me read my big files by the original key (i.e. a stat on “bigfile1.mpg” may trigger a stat on “_big_info:bigfile1.mpg”), and any other file I’ve loaded outside of FUSE (e.g. with mogtool).

You can name it as myfuse.pl and then, assuming the script is in the current directory, use it to mount a filesystem like so: mkdir ./myfuse; perl ./myfuse.pl ./myfuse.

Maybe I’ll get around to re-formatting this as a patch for SixApart — but probably not.

#!/usr/bin/perl
use strict;
use warnings;
use threads;
use threads::shared;
use Fuse;
use LWP::UserAgent;
use LWP::Simple;
use List::MoreUtils qw(uniq);
use MogileFS::Client;
use Path::Class;
use POSIX qw(ENOENT EISDIR EINVAL);
my $DEBUG = 0;
our (%FILE_CACHE, $CACHE_SIZE, $CACHE_AGE) = (('/' => {size => 0, age => 0}));
 
# create client object w/ server-configured namespace and IPs of trackers
my $ua = LWP::UserAgent->new;
my $class = 'default';
my $mogilefs = MogileFS::Client->new(
  domain => 'ifap',
  hosts  => [ '10.10.0.100:6001', '10.10.0.101:6001' ],
);
 
my ($mountpoint) = "";
$mountpoint = shift(@ARGV) if @ARGV;
Fuse::main(
    debug      => $DEBUG,
    mountpoint => $mountpoint,
    threaded   => 1,
 
    getattr    => "main::e_getattr",
    getdir     => "main::e_getdir",
    mknod      => "main::e_mknod",
    open       => "main::e_open",
    read       => "main::e_read",
    rename     => "main::e_rename",
    statfs     => "main::e_statfs",
    unlink     => "main::e_unlink",
    write      => "main::e_write",
);
 
sub e_getattr {
    my $filename = shift;
    $filename =~ s#^.*/##;
    warn "main::e_getattr $filename\n" if $DEBUG;
 
    my ( $size, $modes );
    my ( $dev, $ino, $rdev, $blocks, $gid, $uid, $nlink, $blksize )
        = ( 0, 0, 0, 1, 0, 0, 1, 1024 );
    my ( $atime, $ctime, $mtime ) = ( time, time, time );
 
    if ( $filename !~ m{\.} ) {
        #        warn "directory!";
        $size  = 0;
        $modes = ( 0040 << 9 ) + 0755;
    } else {
        #        warn "file!";
        $size  = 123;
        $modes = ( 0100 << 9 ) + 0644;
 
        my @paths = $mogilefs->get_paths( $filename, { noverify => 1 } );
        my ( $content_type, $document_length, $modified_time, $expires, $server );
 
        if ( scalar( @paths ) ) {
            ( $content_type, $document_length, $modified_time, $expires, $server ) = head( $paths[0] );
            $size = $document_length;
            ( $atime, $ctime, $mtime ) = ($modified_time) x 3;
        }
        else {
            @paths = $mogilefs->get_paths( '_big_info:' . $filename, { noverify => 1 } );
            if ( scalar( @paths ) ) {
                my $data = $mogilefs->get_file_data( '_big_info:' . $filename );
                my ( $_des, $_type, $_compress, $_filename, $_chunks, $_size, undef, @_parts ) = split /\n/, $$data;
                ( $size ) = $_size =~ m#^size (\d+)$#;
                OUTER: foreach my $_part ( @_parts ) {
                    $_part =~ m#paths: (.+?)$#;
                    my @_paths = split ', ', $_part;
                    foreach my $_path ( @_paths ) {
                        ( $content_type, $document_length, $modified_time, $expires, $server ) = head( $paths[0] );
                        ( $atime, $ctime, $mtime ) = ($modified_time) x 3;
                        last OUTER if $server;
                    }
                }
            }
        }
        return -ENOENT() unless @paths;
    }
 
    warn(
        join(
            ",",
            (   $dev,   $ino,     $modes, $nlink, $uid,
                $gid,   $rdev,    $size,  $atime, $mtime,
                $ctime, $blksize, $blocks
            )
        ),
        "\n"
    ) if $DEBUG;
 
    return (
        $dev,  $ino,   $modes, $nlink, $uid,     $gid, $rdev,
        $size, $atime, $mtime, $ctime, $blksize, $blocks
    );
}
 
sub e_getdir {
    my $prefix = shift;
    warn "main::e_getdir $prefix\n" if $DEBUG;
    my @filenames;
    $mogilefs->foreach_key(
#        prefix => $prefix,
        sub {
          my $filename = shift;
          push @filenames, $filename;
          push @filenames, file( $filename )->parent();
        }
    );
 
    @filenames = uniq @filenames;
    warn "returning: @filenames\n" if $DEBUG;
    return ( @filenames, 0 );
}
 
sub e_mknod {
    my $filename = shift;
    $filename =~ s#.*/##;
 
    warn "main::e_mknod $filename\n" if $DEBUG;
 
    my $fh = $mogilefs->new_file($filename, undef);
    if ( $fh ) {
        print $fh "\n";
 
        unless ($fh->close) {
            my ($code, $str) = ($mogilefs->errcode || -1, $mogilefs->errstr || '');
            warn "Error creating file:$code: $str" if $DEBUG;
            $! = $str;
            $? = $code;
            return -1;
        }
        return 0;
    }
    else {
        return -1;
    }
}
 
sub e_open {
    my $filename = shift;
    $filename =~ s#.*/##;
    warn "main::e_open $filename\n" if $DEBUG;
 
    return -EISDIR() unless $filename =~ m{\.};
    my @paths = $mogilefs->get_paths( $filename, { noverify => 1 } );
    if ( ! scalar( @paths ) ) {
        @paths = $mogilefs->get_paths( '_big_info:' . $filename, { noverify => 1 } );
    }
    return -ENOENT() unless @paths;
    return 0;
}
 
sub e_read {
    my ( $filename, $length, $offset ) = @_;
    $filename =~ s#.*/##;
    warn "main::e_read $filename $length $offset\n" if $DEBUG;
 
    return -EISDIR() if $filename =~ m{/$};
 
 
 
    my $maxoff = $offset + ( $length - 1 );
warn "requested offset=$offset length=$length bytes=$maxoff" if $DEBUG;
    my @paths = $mogilefs->get_paths( $filename, { noverify => 1 } );
    my $size;
    my ( $content_type, $document_length, $modified_time, $expires, $server );
 
    if ( scalar( @paths ) ) {
warn "it's a small file" if $DEBUG;
        ( $content_type, $document_length, $modified_time, $expires, $server ) = head( $paths[0] );
        $size = $document_length;
        return 0 if $offset == $document_length;
 
        $maxoff = $document_length if $maxoff > $document_length;
        my $range = $offset . "-" . $maxoff;
        warn "  Range: bytes=$range\n" if $DEBUG;
        my $response = $ua->get( $paths[0], "Range" => "bytes=$range" );
        if ( $response->is_success ) {
            return $response->content;
        } else {
            warn $response->as_string if $DEBUG;
        }
    }
    else {
        @paths = $mogilefs->get_paths( '_big_info:' . $filename, { noverify => 1 } );
        return -ENOENT() unless @paths;
warn "it's a big file" if $DEBUG;
 
        my $data = $mogilefs->get_file_data( '_big_info:' . $filename );
        my ( $_des, $_type, $_compress, $_filename, $_chunks, $_size, undef, @_parts ) = split /\n/, $$data;
        ( $size ) = $_size =~ m#^size (\d+)$#;
        return 0 if $offset == $size;
 
        $maxoff = $size if $maxoff > $size;
 
        my $part_min = -1;
        my $part_max = -1;
#part 2 bytes=23080964 md5=af45f7ac80ca34328db3c90de1db1ab0 paths: http://10.10.0.100:7500/dev8/0/000/119/0000119969.fid, http://10.10.0.101:7500/dev2/0/000/119/0000119969.fid
 
        my $buf = '';
        foreach my $_part ( @_parts ) {
            my ( $_bytes, $_paths ) = $_part =~ m#bytes=(\d+) .+? paths: (.+?)$#;
 
            $part_min = $part_max > 0 ? $part_max + 1 : 0;
            $part_max += $_bytes;
 
 
warn "examining part $part_min -> $part_max" if $DEBUG;
 
            #chunk too early
            next if ( $part_max < $offset );
            #chunk too late;
            next if ( $part_min > $maxoff );
 
warn "using part $part_min -> $part_max" if $DEBUG;
 
            my @_paths = split ', ', $_paths;
            foreach my $_path ( @_paths ) {
warn "offset=$offset part_min=$part_min maxoff=$maxoff part_max=$part_max" if $DEBUG;
                my $range = ($offset-$part_min) . "-" . ($maxoff-$part_min);
warn "getting Range: bytes=$range" if $DEBUG;
                my $response = $ua->get( $_path, "Range" => "bytes=$range" );
                if ( $response->is_success() ) {
                    $buf .= $response->content();
warn "data length=".length($buf) if $DEBUG;
                    if ( length( $buf ) == $length ) {
warn "got all the data (1)!" if $DEBUG;
                        return $buf;
                    }
                    elsif ( $offset + length( $buf ) == $maxoff ) {
warn "got all the data (2)!" if $DEBUG;
                        return $buf;
                    }
                    $offset += length( $buf );
                    last;
                }
            }
        }
        if ( length( $buf ) != $length ) {
            return -ENOENT();
        }
        else {
            return $buf;
        }
    }
}
 
sub e_rename {
    my ( $old, $new ) = @_;
    $old =~ s#.*/##;
    $new =~ s#.*/##;
 
    warn "main::e_rename: $old -> $new" if $DEBUG;
 
    # Rename this file
    $mogilefs->rename($old, $new);
 
    return 0;
}
 
sub e_statfs { return 255, 1, 1, 1, 1, 2 }
 
sub e_unlink {
    my $filename = shift;
    $filename =~ s#.*/##;
 
    warn "main::e_unlink: $filename" if $DEBUG;
 
    $mogilefs->delete($filename);
 
    return 0;
}
 
sub e_write {
    my ( $filename, $buf, $offset ) = @_;
    $filename =~ s#.*/##;
 
    warn("main::e_write: $filename pos=$offset len=".length($buf)) if $DEBUG;
 
    my $finfo = get_file_info($filename);
 
    return -ENOENT() unless $finfo;
 
    my $cont = get_file_data($filename);
 
    substr($$cont, $offset, length($buf), $buf);
 
    $mogilefs->store_content($filename, undef, $cont);
    rm_file_cache($filename);
 
    return length($buf);
}
 
#################################################
sub logmsg { warn(join "\t",@_) if $DEBUG }
 
sub get_file_data {
    my ($file) = @_;
    my $entry = $FILE_CACHE{$file};
    my $meta  = $FILE_CACHE{'/'};
 
    if ($entry) {
        # See if this data is too old
        if ((time - $entry->{created}) < $CACHE_AGE) {
            logmsg(1, "-- get_file_data: hit");
 
            # If its still valid, return it
            return $entry->{data};
        } else {
            logmsg(1, "-- get_file_data: miss - expired");
 
            rm_file_cache($file);
        }
    }
 
    my $cont = $mogilefs->get_file_data($file);
    my $size = length($$cont);
 
    if ($meta->{size} + $size > $CACHE_SIZE) {
        # If adding this would go beyond our max cache size, delete things until
        # we can fit it
        foreach my $f (sort {$a->{age} <=> $b->{age}} keys %FILE_CACHE) {
            next if $f eq '/';
 
            my $rm_size = rm_file_cache($f);
 
            logmsg(1, "-- get_file_data: purging - $rm_size bytes");
 
            last if $meta->{size} + $size < $CACHE_SIZE;
        }
    }
 
    logmsg(1, "-- get_file_data: added - $size bytes");
 
    # Create a new entry
    $FILE_CACHE{$file} = {created => time,
                          size    => $size,
                          data    => $cont};
    $meta->{size} += $size;
 
    return $cont;
}
 
sub get_file_info {
    my ($path) = @_;
 
    if ($path eq '/') {
        return {name         => '/',
                is_directory => 1};
    }
    else {
      foreach my $f ( $mogilefs->list_keys( $path ) ) {
        return $f if $f eq $path;
      }
    }
    return undef;
}
 
sub rm_file_cache {
    my ($file) = @_;
    my $entry = delete $FILE_CACHE{$file};
    return unless $entry;
 
    # Decrement how large our cache size is
    my $size = $entry->{size};
    $FILE_CACHE{'/'}->{size} -= $size;
 
    return $size;
}
 
__DATA__
#des no description
#type file
#compressed 0
#filename somefile.mpeg
#chunks 2
#size 90189828
#
#part 1 bytes=67108864 md5=8066369552b71fd49cfbe9ccdce74051 paths: http://10.10.0.100:7500/dev6/0/000/119/0000119968.fid, http://10.10.0.101:7500/dev2/0/000/119/0000119968.fid
#part 2 bytes=23080964 md5=af45f7ac80ca34328db3c90de1db1ab0 paths: http://10.10.0.100:7500/dev8/0/000/119/0000119969.fid, http://10.10.0.101:7500/dev2/0/000/119/0000119969.fid

Administration
Distributed Systems
Perl
Scalability

Comments (2)

Permalink

pcoc – Piped Command Output Colorizer

I’m frequently monitoring webservers, cache servers, database servers, etc by tailing their log files, e.g.

tail -f /etc/httpd/logs/access_log

I like the –color option provided by grep, but found it to be too limited (only one allowed, no wildcard support). After a bit of searching to see if a tool existed for doing arbitrary colorizing, I found
acoc, the Arbitrary Command Output Colourer.

…which almost did what I needed, but couldn’t read from a pipe. So I wrote pcoc, the Piped Command Output Colorizer. I’m only publishing this because I’ve been using it for about 1 1/2 years, and still find it useful.

Source code at the end of this post. Here’s an example that highlights iPhone/iPod user agents and requests with a 500/400/404 HTTP response:

tail -f ./logs/access_log | pcoc -f '(iPod)=bold cyan' -f '(iPhone)=bold magenta' -f '\b(500|404|400)\b=red on_black'

Sorry, no screenshots :( .

pcoc source:

#!/usr/bin/perl
use strict;
use Getopt::Long;
use Term::ANSIColor qw(colored);
$|++;
 
my %format = ();
GetOptions( "format|f=s" => \%format);
 
if ( ! keys %format ) {
  print <<"EOF";
Synopsis:
        pcoc - Piped Command Output Colorizer.  Inspired by acoc.
 
Usage:
 
        $0 -f '<regex1>=<color1>' -f '<regex2>=<color2>'
 
$0 reads from a pipe and colorizes each line based on format (-f) parameters.
 
Arguments:
 
-f '<regex>=<color>'  Required, multiple values okay. 
 
        <regex>: A regular expression from which \$1 will be colorized
 
        <color>: One or more colorization keywords, see perldoc
        Term::ANSIColor, but briefly they are:
 
        boldness:
                bold
        foreground:
                red yellow green blue magenta cyan black white
        background:
                on_red on_yellow on_green on_blue on_magenta on_cyan
                on_black on_white
 
Examples:
 
        #highlight the account's shell in bold green
        cat /etc/passwd | $0 -f '.+:([^:]+)\$=bold green'
 
        #... and the username in red with black background
        cat /etc/passwd | $0 -f '([^:]+)=red on_black' -f '.+:([^:]+)\$=bold green'
 
Copyright/License:
 
        Allen Day <allenday\@ucla.edu>, licensed under GPL 2006-2008
 
EOF
  exit(1);
}
 
while ( my $line = <> ) {
  chomp( $line );
  foreach my $f ( keys %format ) {
    my @c = split ',', $format{ $f };
 
    if ( $line =~ qr/$f/ ) {
      while ( my ( $s, $t ) = $f =~ m/^(.*?)\(+(.+?)\)+/ ) {
        my $c = pop @c || last;
        $line =~ s/($s)($t)/$1.colored($2,$c)/e;
        $f =~ s/^(.*?)\((.+?)\)/$1$2/;
      }
    }
  }
  print "$line\n";
}

Administration
Analytics
Perl

Comments (0)

Permalink