Software

How to fix the meetup.com broken exported calendars.

I’m a big fan of meetup.com, but they’re so tragically unhip when it comes to mashups/integration/web 2.0.  One of my biggest gripes until about 6 months ago was that they had no facility (besides API) for exporting a calendar of meetups to my calendar app (I use Google Calendar), or any other calendar app for that matter.

They introduced an export feature recently, but it’s pretty useless.  Here’s why: they offer two calendars

  • [Calendar A] contains all upcoming items in all your meetup groups
  • [Calendar B] contains upcoming items which you have RSVP’d with “yes” or “maybe”.

That’s it.  The calendars exported don’t even contain links that allow you to RSVP from directly inside your calendar — you have click through to the meetup.com site, log in, then RSVP.  Ugh.

 

Come on, product guys.  What’s really called for is 4 separate calendars.

  • [Calendar "yes"] All groups, “yes” events
  • [Calendar "maybe"] All groups, “maybe” events
  • [Calendar "no"] All groups, “no” events
  • [Calendar "none"] All groups, events to which I have not yet submitted an RSVP.

I was finally just pissed off enough about the status quo that I fixed it for myself, and below I share the code.  You can try it out here: http://spicylogic.com/allenday/cgi-bin/mu.cgi?key=<your_api_key>&cal=<calendar> 

where <your_api_key> can be found here and <calendar> is one of “yes”, “no”, “none”, “maybe”.

Okay, here’s the code.  Install it on your own machine if possible, my ISP will appreciate it.  If you find fuckups, let me know and I’ll update the post.

#!/usr/bin/perl
use strict;
use CGI qw(:standard);
use Date::Manip qw(ParseDate ParseDateString ParseDateDelta DateCalc UnixDate);
use Date::Parse;
use HTML::Entities;
use LWP::Simple qw(get);
use XML::DOM;
 
use constant URL_EVENTS =&gt; 'http://api.meetup.com/events?key=%s&amp;member_id=%d&amp;format=xml';
 
print header(q(text/calendar));
 
my $parser = new XML::DOM::Parser ();
 
my $mode = param( 'cal' );
my $key  = param( 'key' );
my $user = param( 'user' );
 
if ( ! $mode || ! $key || ! $user ) {
  die
}
 
my $events_url = sprintf( URL_EVENTS, $key, $user );
#warn $events_url;
my $events_txt = get( $events_url );
#warn $events_txt;
my $events_dom = $parser-&gt;parse( $events_txt );
#warn $events_dom;
 
print qq(BEGIN:VCALENDAR\nPRODID:-//Meetup Inc//RemoteApi//EN\nVERSION:2.0\nMETHOD:PUBLISH\nCALSCALE:GREGORIAN\nX-ORIGINAL-URL:http://www.meetup.com/\nX-WR-CALNAME:mu $mode\n);
 
my $events = $events_dom-&gt;getElementsByTagName( 'item' );
for ( my $i = 0 ; $i &lt; $events-&gt;getLength() ; $i++ ) {
  my $event = $events-&gt;item( $i );
  my $n_id    = $event-&gt;getElementsByTagName( 'id'             )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_rsvp  = $event-&gt;getElementsByTagName( 'myrsvp'         )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_addr0 = $event-&gt;getElementsByTagName( 'venue_name'     )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_addr1 = $event-&gt;getElementsByTagName( 'venue_address1' )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_addr2 = $event-&gt;getElementsByTagName( 'venue_address2' )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_addr3 = $event-&gt;getElementsByTagName( 'venue_address3' )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_addr4 = $event-&gt;getElementsByTagName( 'venue_city'     )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_addr5 = $event-&gt;getElementsByTagName( 'venue_state'    )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_addr6 = $event-&gt;getElementsByTagName( 'venue_zip'      )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_desc  = $event-&gt;getElementsByTagName( 'description'    )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_link  = $event-&gt;getElementsByTagName( 'event_url'      )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_name  = $event-&gt;getElementsByTagName( 'name'           )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_lat   = $event-&gt;getElementsByTagName( 'venue_lat'      )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_lon   = $event-&gt;getElementsByTagName( 'venue_lon'      )-&gt;item( 0 )-&gt;getFirstChild();
  my $n_start_time  = $event-&gt;getElementsByTagName( 'time'           )-&gt;item( 0 )-&gt;getFirstChild();
 
  my $start_time;
  my $end_time;
 
  #my $dummy_time = "20000101T000000Z";
  my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time());
  my $dummy_time = sprintf( q(%04d%02d%02dT%02d%02d%02dZ), $year + 1900, $mon + 1, $mday, $hour, $min, $sec );
 
  if ( $n_start_time ) {
    my ($ss,$mm,$hh,$day,$month,$year,$zone);
 
    ($ss,$mm,$hh,$day,$month,$year,$zone) = strptime( $n_start_time-&gt;toString() );
    $start_time = sprintf( q(%04d%02d%02dT%02d%02d%02dZ), $year + 1900, $month + 1, $day, $hh, $mm, $ss );
 
    my $eday = $day;
    if ( $hh == 23 ) {
      $eday = $day + 1;
    }
    my $ehh = ($hh + 1) % 24;
    $end_time   = sprintf( q(%04d%02d%02dT%02d%02d%02dZ), $year + 1900, $month + 1, $eday, $ehh, $mm, $ss );
  }
  else {
    $start_time = '';
    $end_time = '';
  }
 
  if ( $mode eq $n_rsvp-&gt;toString() ) {
    my $id   = $n_id-&gt;toString();
    my $name = $n_name ? $n_name-&gt;toString() : "";
    my $desc = $n_desc ? $n_desc-&gt;toString() : "";
    my $addr = ( $n_addr0 ? $n_addr0-&gt;toString().', ' : "" )
             . ( $n_addr1 ? $n_addr1-&gt;toString().', ' : "" )
             . ( $n_addr2 ? $n_addr2-&gt;toString().', ' : "" )
             . ( $n_addr3 ? $n_addr3-&gt;toString().', ' : "" )
             . ( $n_addr4 ? $n_addr4-&gt;toString().', ' : "" )
             . ( $n_addr5 ? $n_addr5-&gt;toString().', ' : "" )
             . ( $n_addr6 ? $n_addr6-&gt;toString() : "" );
    #$desc =~ s/(.)/(ord($1) &gt; 127) ? "" : $1/egs;
 
    $name = HTML::Entities::decode_entities( $name );
    $desc = HTML::Entities::decode_entities( $desc );
    $addr = HTML::Entities::decode_entities( $addr );
    $name =~ s/,/\\,/g;
    $desc =~ s/,/\\,/g;
    $addr =~ s/,/\\,/g;
 
    $desc =~ s#
#\\n#gs;
    $desc .= "\\n\\n\\nGoing?\\n\\n";
    foreach my $response ( qw( yes no maybe ) ) {
      $desc .= uc($response).qq(: http://api.meetup.com/rsvp?event_id=$id&amp;key=$key&amp;rsvp=$response\\n);
    }
 
    my $geo = $n_lat &amp;&amp; $n_lon ? "GEO:" . $n_lat-&gt;toString() . ";" . $n_lon-&gt;toString() . "\n" : undef;
 
    #print sprintf( qq(BEGIN:VEVENT\nSUMMARY:%s\nDESCRIPTION:%s\nLAST-MODIFIED:%s\nUID:%s\nCLASS:%s\nCREATED:%s\nDTSTAMP:%s\nDTSTART:%s\nDTEND:%s\nLOCATION:%s\n\nURL:%s\nEND:VEVENT\n),
    print sprintf( qq(BEGIN:VEVENT\nSUMMARY:%s\nDESCRIPTION:%s\nLAST-MODIFIED:%s\nUID:%s\nCLASS:%s\nCREATED:%s\nDTSTAMP:%s\nDTSTART:%s\nDTEND:%s\n%sLOCATION:%s\nURL:%s\nEND:VEVENT\n),
      $name,
      $desc,
      $start_time,
      "event_$id\@meetup.com",
      "PUBLIC",
      $dummy_time,
      $dummy_time,
      $start_time,
      $end_time,
      $geo,
      $addr,
      $n_link ? $n_link-&gt;toString() : "",
    );
  }
}
 
print qq(END:VCALENDAR\n);

Administration
Life
Networking
Perl
Software

Comments (1)

Permalink

Synthetic GFF Dataset for Genome Browser Benchmark

I deployed a Gbrowse/Chado installation last week at Dow Agrosciences.  It got me thinking about how slow and basic the searches are with the Bio::DB::Das::Chado* adaptor, and wouldn’t it be nice to use SOLR here?

I made up a test dataset of gene/mRNA/exon 3-tiered feature groups by permuting some gene model data from the knownGene annotation set of the Hg18 build of the human genome.  You can grab the data set and script used to generate it here.  There are several files mRNA.EN.txt.gz that contain gzipped gene models, where N=3..7 indicates there are 10^N models in the file, uniformly distributed across a 500-megabase reference sequence.

I’m planning to load these data into a couple of different systems and then compare performance on some of the typical Bio::DB::GFF API calls.  I can personally test on:

  • Chado
  • The default Bio::DB::GFF schema (does it have a name?)
  • The SOLR backend I’m about to implement

I know there are other feature DBs out there.  It would be good to include them as well in a later pass or to have someone else contribute the data once I get the benchmarking script written.

Genomics
Informatics
Java
Perl
Scalability
Science

Comments (0)

Permalink

Taste item-item recommender example

I threw together a Mahout/Taste based item-item based recommender last night.

	public static void itemItemRecommendations(String path, String file) {
		File f = new File(path, file);
	    try {
			DataModel model = new FileDataModel(f);
			model.refresh(null);
		    ItemSimilarity itemSimilarity = new LogLikelihoodSimilarity(model);
		    ItemBasedRecommender itemRecommender = new GenericItemBasedRecommender(model, itemSimilarity);
		    for ( Item i : model.getItems() )
			    for ( RecommendedItem j : itemRecommender.mostSimilarItems(i.getID(), 50) )
			    	if ( j.getValue() >= 0.7 )
			    		System.out.println(i.getID() + "\t" + j.getItem().getID() + "\t" + String.format("%.3f", j.getValue()));
		} catch (FileNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (TasteException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}

This outputs item1 –recommends–>item2 pairs with a weight. I’m taking this and putting it into a solr document so I can display related item2s alongside item1 when it’s viewed.

Input data are comma-delimited tuples like so:

1fe7401b81eed49353d0cbeba5383848,5212,0.6
3c1832954a6e8781836fed670bb37b24,5212,1
70273e4c7c77700ee97acb8d0306c405,5213,0.8
1f057ccde135acbc881008bbf466e7e1,5213,1
51d44c7baca65ad39d11ba87bf2d438b,5213,1
adc924559b37114cd97d1f5cf7c71419,5213,1
78e254b4a11e61d76ff63cea02de4de8,5213,1
5c373ec7d9ad4a6f392c291d8ccba5ce,5213,0.2
fab8537564094fa8885f6214e6b682e1,5213,1
127f46aabcdbc2d2d04da8398a996c75,5213,1

Works great. Thanks Sean.

Analytics
Java
Mahout

Comments (1)

Permalink

Upcoming AI / Machine Learning Conferences

A (partial) list I found today. Doesn’t include NIPS, so I’m not sure how exhaustive it is, but it has a bunch I haven’t seen before.

http://www.kmining.com/info_conferences.html

Analytics
Informatics
Mathematics
Networking
Science
Software
Statistics

Comments (0)

Permalink

Parallel DNS reverse lookups

Need to do lots of reverse DNS lookups for some reason? Maybe b/c you’re trying to get a seed list for a web crawl or hack attempt on a bunch of ISPs. Who cares. Here’s a quick way to generate names from a big list of IPs like:

1.1.1.1
1.1.1.2
[...]
254.254.254.253
254.254.254.254

We can use hadoop streaming to chunk the list so we can do the DNS lookups in parallel. Easy and requires little to know thought:

./bin/hadoop jar contrib/streaming/*-streaming.jar -input /home/aday/classC.dat -output /home/aday/classC_dns.dat -mapper 'perl -ne '\''print `host $_`'\''' -numReduceTasks 0

We wrap the host call in backticks so we can trap non-zero exit codes and get an error message on stdout courtesy of perl.

Distributed Systems
Hadoop
Java

Comments (0)

Permalink

ZIP code demographic data with Perl

I needed some demographics data earlier this week and tried using the SF3 files from census.gov’s “Census 2000″ data set.

What a time sink. Ugh.

The methods used are very well documented, and I learned a lot about the census. What I was not able to learn, however, was how to actually extract the data from the flat files. Look at what Joshua Tauberer went through to get some idea of the pain level.

Finally I got fed up and wrote a screen scraper for ZIPskinny.com in Perl. It’s one-off crappy code. You can get it from CPAN under namespace Geo::Demo::Zipskinny.

Hope it saves you some time. Leave me a comment if you have working code that can deal with SF3 files.

Here’s a little ZIP code to rich-vs-poor plot I made earlier.

Analytics
Perl
Science
Statistics

Comments (0)

Permalink

Java port of GNU getopt

This looks useful
http://www.urbanophile.com/arenn/hacking/getopt/gnu.getopt.Getopt.html

Java

Comments (0)

Permalink

Webserver logs access time by region/language

As anyone with a popular website knows, there’s a big difference in the resources required for peak vs. off-peak hours and you typically have to pay for peak usage even if you don’t always use it (e.g. 95th percentile bandwidth billing)

Frugal as I am, I was curious to see if I could increase traffic during what are off-peak hours. Seemed sensible that people in different regions of the world might be accessing during off-hours.

So I aggregated data by country code/language and 10-minute time segment. Applied a Daniell smoothing kernel (a sliding window) of 6 segments (1 hour) and plotted a a row-scaled heatmap in R. Rows are clustered so similar access patterns are next to one another, with the left-hand-side dendrogram indicating dissimilarity between rows. Yellow-white is a traffic burst. I’ll post the code and data later for how I made this.

access times by country/language

As it turns out, the main off-peak trough corresponds to the middle of the Pacific ocean. Kinda watery for people to live there. Oh well, I tried.

Analytics
R
Science
Statistics

Comments (0)

Permalink

R Matrix sparseVector operations

I’ve only done minus(vec1,vec2) so far. More to come.

library(Matrix);
 
#F = strsplit(as.character(mm[1,2]),', ')[[1]]
#G = matrix( as.numeric(unlist(strsplit(F[c(-1,-length(F))],':'))), nrow=2 )
#tt = new('dsparseVector', x=G[2,], i=as.integer(G[1,]), length=max(as.integer(G[1,])))
 
minus = function(v1,v2) {
  i = sort(union(v1@i,v2@i));
  s = length(i);
 
  x = vector(mode='numeric',length=s);
  for ( k in 1:s ) {
    z = i[k];
    if ( z < length(v1) ) {
      x[k] = as.numeric(v1[z]);
    }
    if ( z < length(v2) ) {
      x[k] = x[k] - as.numeric(v2[z]);
    }
  }
  new("dsparseVector", x=x, i=i, length=max(v1@i,v2@i))
}

R
Statistics

Comments (0)

Permalink

WordPress – collapse redundant tags

I’ve been experimenting with automation of WordPress posts. Probably I’m doing something wrong with the way I make the XML RPC calls, but I find that I end up with redundant tags in my database. For instance, if I tagged two separate, RPC-posted posts with “orange”, I get two different tags both called “orange”. Until I figure out how to fix this properly, here’s a little script that will clean up the database by consolidating all redundantly named tags to one tag. You might want to back up your database before using this…

#!/usr/bin/perl
use strict;
use DBI;
 
######configuration
my $PREFIX = 'wp_h5otpn_';
my $DB = '';
my $HOST = '';
my $USER = '';
my $PASS = '';
######
my $dbh = DBI->connect(qq(dbi:mysql:database=$DB;host=$HOST), $USER, $PASS) or die $!;
 
my $term_sth   = $dbh->prepare(qq(SELECT * FROM (SELECT name, count(name) AS c FROM ${PREFIX}terms GROUP BY name) AS d WHERE d.c > 1));
my $name_sth   = $dbh->prepare(qq(SELECT term_id FROM ${PREFIX}terms WHERE name = ?));
my $update_sth = $dbh->prepare(qq(UPDATE ${PREFIX}term_relationships SET term_taxonomy_id = (SELECT term_taxonomy_id FROM ${PREFIX}term_taxonomy WHERE term_id = ?) WHERE term_taxonomy_id = (SELECT term_taxonomy_id FROM ${PREFIX}term_taxonomy WHERE term_id = ?)));
my $delete1_sth = $dbh->prepare(qq(DELETE FROM ${PREFIX}term_taxonomy WHERE term_id = ?));
my $delete2_sth = $dbh->prepare(qq(DELETE FROM ${PREFIX}terms WHERE term_id = ?));
$term_sth->execute();
 
while ( my ( $name, $count ) = $term_sth->fetchrow_array() ) {
  $name_sth->execute( $name );
  my $new = undef;
  while ( my ( $term_id ) = $name_sth->fetchrow_array() ) {
    if ( ! $new ) {
      $new = $term_id;
      next;
    }
    warn "$name\t$term_id\t->\t$new";
    $update_sth->execute( $new, $term_id );
    $delete1_sth->execute( $term_id );
    $delete2_sth->execute( $term_id );
  }
}
 
__DATA__
SELECT t.term_id, t.name, r.*, s.* FROM wp_h5otpn_terms AS t, wp_h5otpn_term_taxonomy AS r, wp_h5otpn_term_relationships AS s WHERE s.term_taxonomy_id = r.term_taxonomy_id AND r.term_id = t.term_id AND r.taxonomy = 'post_tag' AND t.name = 'whatever';

Administration
WordPress

Comments (0)

Permalink