Random musings

Eclipse setup notes for Android and PhoneGap Development

  1. Install Eclipse for Java
  2. Install Android SDK
  3. Install Eclipse plugins
  • Subclipse http://subclipse.tigris.org/update_1.6.x
  • Egit http://download.eclipse.org/egit/updates
  • PyDev http://pydev.org/updates
  • ADT http://dl-ssl.google.com/android/eclipse

Random musings

Comments (0)

Permalink

8 Keys to Effective Crowdsourcing

The key to effective crowdsourcing is effective communication.  You communicate with your crowdsourced workers so that you can train them.  Training has a measurable cost, and you want to minimize this cost to make most effective use of your time and your budget.

Consider the situation when you’re in a professional position, or the flipside and you’re training someone to take on a new role.  Assuming you are/have the “right” person with regard to relevant skills to perform the requisite tasks, why is training required?  Knowledge transfer needs to occur.  The same is also true for crowdsourced workers.  So how can we effectively transfer knowledge to workers who may only be spending a few seconds on your task?

Key 1: Be consistent.

Use similar phrasings and images for all of your task descriptions.  This allows workers to come up to speed in a minimum amount of time.  Imagine how hard it would be to read your email if each message opened in a differently styled window.  Similar phrasings/images are just one example of how to employ…

Key 2: Use variables.

Smartsheet.com got this right.  Have a look at these 2 tasks submitted from Smartsheet to Amazon’s Mechanical Turk:

Look closely at what’s going on here.  The two tasks’ input variables (Blog Name and Blog URL) are identical, only their values change.  Note also that there are 2114 tasks just like this available.  Workers like to have lots of very similar tasks because…

Key 3: Batch tasks.

Crowdsourced workers like batches of similar tasks because it presents an opportunity for them to set up a workflow, or even write a small computer program to do the tasks for them, for you.  The cost of learning how to do a task is amortized over the entire batch, letting them make more efficient use of time (and letting you make more efficient use of your budget).

Key 4: Be visual.

The adage “a picture is worth one thousand words” couldn’t be more fitting to communicating with crowdsourced workers.  Images are very information dense, are more friendly to scanning, and are able to more quickly communicate non-linear process structure when compared to text.  The most effective visual tool I have found thus far is to…

Key 5: Use flow charts.

Consider learning to use flow charts, and also to extend your visual vocabulary.  I’m an avid user of OmniGraffle for creating diagrams for crowdsourcing (as well as for myself).  I’ll be presenting some flow charts in the future.  You will find that by presenting your task graphically and in a formal way as a flow chart (as opposed to simply giving graphical examples), users will do more work for the same price because you’ve made it easier for them.  The flow chart also forces you be clear about what you want, which brings us to…

Key 6: Know what you want.  Be unambiguous.

Know what you expect the worker to do for you.  Make each task so simple that it’s virtually impossible for a worker to do it incorrectly.  Break up complex tasks into their most elementary pieces.  Ideally one task = one decision.  Make each task closed-ended.  Do not leave any room for ambiguity.

Designing tasks in this way requires more effort on your part, but will result in less money spent and higher-quality results.

Key 7: Improve through iteration.

Being unambiguous on the first try is nigh on impossible.  It’s for the same reason that you “bounce” ideas off of your peers/friends — to see how your approach to an idea or task might be sub-optimal or misunderstood.

Iteratively remove ambiguity.  Submit a sampling tasks out of a larger batch with a test task description.  See where the crowdsourced workers make mistakes.  Re-examine your task description to a) find the misunderstanding, and b) disambiguate it.

Key 8. Build validators into your tasks.

Make sure the worker’s work is validated before it gets to you.  This could mean having workers check each others’ work, and can even involve some fancy statistics.  It could also mean writing a bit of javascript or some other backend systems to validate worker inputs (e.g. you ask for a minimum 300-word document.  count the words with javascript before they submit).  This is getting a bit more advanced, but opens more opportunity for more complex tasks by delegating part of the work to the computer.

Computing
Crowdsourcing
Random musings
Scalability

Comments (0)

Permalink

G1 cupcake upgrade

I’m using a T-mobile Android G1 on AT&T network. I don’t have a T-mobile SIM at all, and I wasn’t getting the Cupcake upgrade through the system menu, either when using AT&T EDGE or using WiFi.

So I followed this guide: http://www.androidandme.com/2009/05/guides/beginners-guide-for-rooting-your-android-g1-to-install-cupcake/

Worked great and easy/no-hassle to follow for an Android n00b like myself. Best and most comprehensive one I could find.

Random musings

Comments (0)

Permalink

Standalone BitTorrent Checksum Verification Tool

I’m writing some scripts to let me automate the downloading and seeding of torrents. The idea is to have torrents pulled in from RSS or a screenscrape (much as Azureus does this, but I want to script everything with Python/Mainline BitTorrent, bash, and Perl), then to sit on the torrents for a day or so after their last mtime, then checksum them and if they’re good move them elsewhere for watching, etc.

Part of this requires checksumming the files, and Mainline doesn’t ship with a standalone utility to do this. So I wrote one in Perl, see below. This only handles single-file torrents for now (i.e. no directories of files).

#!/usr/bin/perl
$|++;
use strict;
use Convert::Bencode_XS qw(bdecode);
use Data::Dumper;
use Digest::SHA1 qw(sha1);
use URI::Escape qw(uri_escape);
 
my $base = shift @ARGV;
my $torrent = "$base.torrent";
 
open( T, $torrent) or die $!;
my $torrent_data = join '', <T>;
close( T );
 
my $metainfo = bdecode( $torrent_data );
 
my $file_name = "$base/" . $metainfo->{'info'}->{'name'};
my $file_length = $metainfo->{'info'}->{'length'};
my $piece_length = $metainfo->{'info'}->{'piece length'};
 
my $pieces = $metainfo->{'info'}->{'pieces'};
my @pieces = ();
my $offset = 0;
while ( $offset < length( $pieces ) ) {
  my $p = substr( $pieces, $offset, 20 );
  $offset += 20;
  push @pieces, $p;
}
 
open( F, $file_name ) or die;
my $seek = 0;
foreach my $p ( @pieces ) {
  my $buf = '';
  seek( F, $seek, 0 );
  read( F, $buf, $piece_length );
  if ( $p eq sha1( $buf ) ) {
    print '.';
  }
  else {
    print 'x';
  }
  $seek += $piece_length;
}
close( F );

Random musings

Comments (0)

Permalink

a small world / celebrity encounter

1. Noticed a new Tesla Motors showroom in Los Angeles at Sepulveda/405 on Santa Monica Blvd last night on my drive to my guitar lesson. Made a mental note to stop by and check it out, looks like they have some museum-style exhibits –fuel cell cutaways, etc of the car.

2. Was talking it up to some friends/coworkers earlier today.

3. Was planning to go to the Hadoop Meetup tonight at Mahalo, but skipped it and worked late.

4. Saw an orange Lotus pass me on the way home… but wait… it has a TESLA logo! I pursued, thinking it was kind of late for a test drive.

5. Pulled up at a red light to listen to the silence / congratulate the driver on his nice ride. Honked my horn.

6. Driver rolls down the window and turns to me, and it’s none other than the CEO of Mahalo, Jason Calacanis!

Life
Networking
Random musings

Comments (0)

Permalink

Blogroll fleshed out

I’ve recently and in the past had several requests for my newsfeed reading list.

I don’t really use a proper newsreader. I use the “Web Clips” option in Gmail. It’s like a poor man’s newsreader. You can add RSS feeds under Settings > Web Clips, and article titles will appear in the bar above your mailbox, interleaved among the advertising that Google places there. You can only have up to 40 (shame on Google!) but it works well enough.

Anyway, I dumped out the links today into my blogroll. You can see them in the sidebar.

Random musings

Comments (0)

Permalink

Building a men’s wardrobe from scratch, part 2

In part 1, I described my experience with trying various designers for fit, and assessing what type of wardrobe items I should buy to begin. I also described where I’ve been considering picking up some items, and how I might go about doing that.

In evaluating the possible strategies for building the wardrobe, I’m considering the following factors:

  • time units to acquire a garment
  • effort per unit time to acquire a garment
  • garment price, as a fraction of retail
  • garment quality

These are not necessarily conflicting factors, but there are definitely some inverse relationships. For instance, if I want to minimize time and effort, price will certainly go up. So I can wait; I’m not in a rush. I’m also not much of an active shopper and I don’t want to spend lots of time running around town or shopping online, so I want to reduce effort. I also don’t want to compromise on quality, and I’m willing to pay more to get what I want. I’m also willing to wait longer to get a lower price.

Here’s what I’m doing:

  1. Subscribe to department store mailing lists. Here are some deep links to sign up for:

    This way I’ll be sure to be notified of clearances like the Barneys Barker Hangar warehouse sale. My impression so far is that Barneys sends out a ton of spam — like 1-2 per day! Yuck! I haven’t received any mail from the other two yet in the ~5 days I’ve been on-list.

  2. Subscribe to eBay watch lists. You can set up a watch. It’s like a brokerage trade trigger and emails if new items are listed that match your search terms/categories/sizes.
    • For example, I’m subscribed to a search for Kiton 42L sportcoat.
    • I’m also subscribed to the word couture along with some other terms like so. You see this word appear in the higher-end lines for many designers, or sartorially-oriented sellers on eBay will use it in their titles/product descriptions. Along the same lines, you could subscribe to Purple Label to get alerts on high-end Ralph Lauren items if you like those.
  3. Subscribe to the AskAndyAboutClothes sale forum. It appears to be a very active sale forum. The sellers are frequently announcing sales in there and linking to their eBay profiles. I’m using this as a form of vetting of the sellers on eBay as the AAAC forum seems reputable. This subscription doesn’t allow filtering of items by size, etc. The very low prices more than make up for the effort of checking the forum regularly.

Business
Life
Random musings

Comments (0)

Permalink

Building a men’s wardrobe from scratch, part 1

swatches

I’ve been looking into building a wardrobe for a few months now. I’m going to summarize what I’ve learned in this post. Mainly through hyperlinks. I’m writing at novice level of knowledge at best, so caveat emptor with all of this.

Motivation for this post: I’ve been finding that with greater frequency I’m not able to attend social events because I don’t own formal (or even semi-formal) clothing. That’s right, I only own jeans, short sleeve buttoned- and t-shirts, jeans, flip flops, and sneakers. Goose thief on the AskAndyAboutClothes forum put it really well, so I’ll quote from his thread:

I am new to this forum, but have been lurking for a few months. I would humbly like to consult the collective wisdom of this community for assistance.

I am a 31 year old writer and have recently moved to Los Angeles. Like many in my field, I kick it casual in t-shirts and jeans with a pair of sneakers to round out the slacker uniform.

Recently I suited up for a meeting to accept a new job. The suit was OTR, but fit well and I splurged on a MTM shirt.

As far as shirts go, there is no going back. A garment made to fit my measurements not only makes sense, but paying a craftsman is a rewarding experience. The difference in comfort was also incredible.

What surprised me most however was how differently people treated me. Not that they are rude to me when I am not dressed up, but by me doing so, it actually seemed to brighten the mood of those I walked by.

Which led me to ask the question. If I feel more comfortable, look better, and have greater power to please – why am I still dressing like a man child?

Nice. Sounds a lot like me. Even lives in LA.

Anyway, first thing I did a couple months back was go to several department stores in Los Angeles. First stop was Bloomingdale’s in Century City, followed by Alandales in Culver City, and finally in Beverly Hills I went to Barney’s NY, Neiman Marcus, and Saks Fifth Avenue. I did this over the space of 2 days, and tried on every designer label I could find so I could calibrate quickly and compare everything while I was in the mode of comparing what to me seemed like very similar items.

I was surpised to find there is actually quite a bit of variety, and found there were generally two types of coats, those I liked and those I didn’t :) . Seriously though, there are some coats which are Neapolitan-style, and others that are Roman. They differ in the amount of structure built into the garment. A salesman at Barney’s described the Roman style as being more like a suit of armor. It definitely felt like that, and I definitely preferred the Neapolitan style.

I also noticed that the amount of handwork that goes into each garment really does make a huge difference in my perceived quality of the fit and appearance of the garment. On the appearance it’s possible that the fabrics just looked nicer, but I got to the point that I could tell the mass-produced coats from the handmade coats even without looking at the label. They’re that different. Noticeably more comfortable. Visual detail is also noticeable. For instance, the garments with more handwork have little things like pique/contrast stitching on the lapels and working buttonholes on the sleeves. Of course, the price of the garment goes up along with the number of hours of human labor used to make them. I found that I particularly liked the garments from Kiton****, Isaia, Zegna. Dolce & Gabbana and Theory were also nice.

Now, I’m just starting to build out a wardrobe, so I’m looking for a few basic key pieces from which I can get the most value. A sportcoat fits the bill here. As I learned, a sportcoat is a specific type of coat that generally has the pockets sewn on the outside with flaps at the top. It’s meant to be worn with jeans or pants. A blazer is a subclass of sportcoat that is made of solid fabric, typically navy, charcoal, or black. All other sportcoats are made of patterned fabric.

Again, not looking to spend a lot of money. It seems there are several ways to stretch the dollars.

  1. Sale shop. Last-call clearance items are typically 50-80% off. I’m now on several dept. store mailing lists for a few months. The best sale is around the beginning of the year because it’s where you can get garments that can be worn year-round, as opposed to the lighter weight garments that would be available in a summer or fall clearance sale.
  2. Outlet malls. Maybe Palm Springs or Las Vegas for those also in SoCal. This might not work for me, as I’m a 42L which is a slightly unusual size.
  3. Barneys has a warehouse clearance sale twice a year at the Barker Hangar in Santa Monica, approximately in August and February.
  4. Ebay. I can occassionally find a Kiton 42L on Ebay, but it’s hard. I’ve bid on a few items, but I don’t really know yet what a good deal is. I’m also a little freaked out about the prevalance of counterfeit garments on Ebay. I’ll keep looking and post on this again later. It looks promising.
  5. Thrift stores. Apparently you can get good stuff at Goodwill, etc. Maybe not in this economy though, it could be well picked over. I also don’t know which Goodwill stores have the good stuff. Beverly Hills maybe? Need to do more research here.
  6. Student deals. I’m not a student anymore, but for those readers who are… some of the manufacturers give steep discounts for grad students. Brooks Brothers is apparently having a 40% discount for grad students right now. Check it out.

I looked into other guides on how to build a wardrobe. I found some good stuff here:

  • askandyaboutclothes.com – amazing article on pattern matching on an amazing site. I’ve been doing a lot of reading here for the last couple of days. Wow.
  • Wikihow has an article on building a basic men’s wardrobe. Seems like reasonable advice. AskAndy… also has some great threads/advice on this topic, like this one and this one.

I really liked the sales staff in Alandales. Stan was the salesman helping me, and he gave me a lot of attention and answered a lot of questions. These guys are working w/o commission, and it showed. I didn’t feel pressured/ignored like I did in the dept. stores (exception: Barneys was also great), and the staff seemed knowledgeable.

I’m going to take the advice of the Wikihow article and get a few made-to-measure (MTM) shirts from Alandales. They take measurements and send off to a tailor that makes shirts for them. Seems like an inexpensive way to get the process going, and I’ll need a few shirts no matter what other items I buy anyway. I might also buy an off-the-rack (OTR) high-end designer shirt from Ebay to see if its worth spending any money here.

To be continued…

Business
Life
Random musings

Comments (0)

Permalink

Some Kickass Wordpress Plugins

I was looking for a plugin to count unique views per post and found Lester Chan’s bunch of Wordpress plugins. I’ll be adding some of these shortly.

One of the most viewed posts on this blog is one of my first: A review of Costco stainless steel cookware. Would you have guessed that? I wonder if it’s highly rated… we’ll find out.

I’m still surprised it’s my #1 post. Perhaps it’s more evidence that I should be posting more non-technical stuff, and some kind of argument against making a super-niche blog.

Life
PHP
Random musings

Comments (0)

Permalink

Thoughts on Hadoop JobTracker/TaskTracker Scheduling

Had a brief, interesting conversation on freenode #hadoop today with Rapleaf Engineer Nathan Marz today about scheduling in Hadoop.

Pretty much supports my sense that scheduling is not Hadoop’s strong suit. It’s really pretty shitty. Would be great to see some more cross-pollination between the Beowulf (SGE, PBS, Globus) and MapReduce (Hadoop, HBase) communities. The former have more mature scheduling, resource management and permissions models. They don’t really do a good job thought with providing a framework for distributed, parallel computing at the application level though — everything is roll-your-own. Perhaps Hadoop could be integrated as a parallel environment to consume resources from a SGE master [1, 2] rather than managing its own mapper/reducer pools.

A less ambitious scheduler improvement is to modify the way the Hadoop scheduler allocates map/reduce resources. The main itch I’m trying to scratch right now has to do with the coupling of map/reduce allocation. There are some cases where it seems this shouldn’t be done. Read the dialog with Nathan below if you care to know more.

allenday is it possible to decouple mapper and reducer slot allocation for jobs?
allenday i mean, if a job is #1 in the MR queue, but it is not yet ready to reduce, can it be prevented from consuming reducer slots?
|<– Smokinn has left irc.freenode.net ()
|<– savage- has left irc.freenode.net (Read error: 110 (Connection timed out))
–>| overlast (n=overlast@19.181.210.220.dy.bbexcite.jp) has joined #hadoop
|<– overlast has left irc.freenode.net (“Leaving…”)
nathanmarz allenday: i think that would be hard… reducing starts while the mapping is happening (copy stage)
allenday nathanmarz, i frequently find that while the reduce has “started”, it can just sit there for a long time doing nothing
allenday this is most common with nutch
allenday so there could be a bunch of other jobs further back in the queue that get starved for reduces b/c the head of the queue is squatting on the slots
nathanmarz it just sits there in the reduce phase?
allenday for sure nutch does, yeah
allenday during fetch, when it crawling
|<– cutting has left irc.freenode.net (“Leaving.”)
nathanmarz i see
nathanmarz i don’t have that much familiarity with nutch
nathanmarz is it possible to increase the number of reducers?
allenday yep, but then you can get into i/o trouble later
nathanmarz for the job i mean, not the cluster
allenday oh
allenday it sounds like you propose having these squatters consume minimal # of reducers (e.g. only 1)
nathanmarz actually, the opposite
nathanmarz let’s say you have 16 reduce slots
nathanmarz and the job i set to use 16 reducers
nathanmarz each one of those reducers potentially has to go over a lot of data
nathanmarz if the job is instead set to use a lot more reducers, like 100 or something
nathanmarz than an individual reducer will go a lot faster
nathanmarz and potentially, those freed reduce slots will go to jobs with higher priority
allenday ok, so you introduce priority to bump the further back ahead in the queue
nathanmarz yea
allenday is that settable in jobconf?
nathanmarz you can set num reducers
–>| tobias_au (n=opera@CPE-121-50-201-65.dsl.OntheNet.net) has joined #hadoop
allenday so let’s suppose the job that squats on reduce slots gets to the head of the queue. regardless of if it has 16 or 100 reducers configured
nathanmarz JobConf#setNumReduceTasks
allenday and that it it still in map phase only. has not begun reducing yet
allenday until one of those reduces finishes (i.e. the map has finished) all slots are still filled
allenday it’s only when the first reduce finishes that the job at #2 can take over a reduce slot
nathanmarz right
nathanmarz yea that’s true
allenday that’s bad
nathanmarz this scheme doesn’t help until mappers finished
allenday you really want this #1 job
allenday when it is allocating reducers
allenday to have low priority in acquiring the slots
nathanmarz right
nathanmarz well you don’t want it to acquire any slots until mappers finish
allenday so you give reduce slots to #2, #3, #4, etc. until everyone who wants slots has them. then you assign to #1
allenday or until #1 is ready…
allenday is it just me or does the queueing system in hadoop kind of suck?
allenday i am coming here from sun grid which puts a lot of emphasis on this aspect
nathanmarz well, the priority system will work if you start job #1 after the other jobs
nathanmarz if you start the other jobs after #1 then they will get starved of reducers
allenday heh, but the whole reason it is in #1 is because it was submitted first, right? isn’t hadoop FIFO wrt jobs?
nathanmarz if they’re the same priority
nathanmarz so maybe decreasing the reducers job #1 uses is the way to go
nathanmarz set it so it doesn’t use all the reduce slots on the cluster
allenday i need to do some research to see if there are jira open for improving the scheduler. or if there are some commercial plugins to improve the scheduling
nathanmarz definitely room for improvement, agreed
allenday yeah, that was what i thought you meant initially. it’s a hack too though, and breaks down when the number of jobs gets large
allenday i’m surprised they are coupled. do you understand how it works when the mapper hands off to the reducer?
allenday b/c i don’t and i need to
nathanmarz yes
allenday can i get the 2min version?
nathanmarz the reason the reducers start while the mappers are running is because there’s some work they can do without all the map data
nathanmarz each reducer needs to copy the relevant outputs from all the mappers to its machine
nathanmarz this is called the “copy” phase and can occur in parallel with mapping
allenday ok, i’ve seen that
allenday so what we need is a flag taht indicates there will be no data to copy until maps all finish
nathanmarz yea
nathanmarz a flag that says not to pipeline the process
allenday default behavior is to have the flag off and copy greedily
allenday which is like it does now
allenday turn the flag on says to wait until upstream map finishes before grabbing a reduce slot and kicking off the copy
allenday **all upstream maps
nathanmarz http://hadoop.apache.org/core/docs/current/hadoop-default.html
nathanmarz those are all the hadoop config parameters
nathanmarz you might be able to find something in there
allenday yeah, i fiind goodies in there every time i read that page :)
allenday i am only ~1mo into hadoop
allenday here’s another scheduling related question/issue i’m having
allenday i find that job i/o and cpu usage tend to synchronize after a while
allenday b/c if there is a slow moving job in the queue, all the others tend to get jammed behind it
allenday have you seen this?
nathanmarz no, i haven’t
nathanmarz but that’s interesting
allenday it comes back to resource (mis)allocation by the scheduler
nathanmarz how are you measuring that?
allenday it’s this same issue where jobs will consume all the slots
allenday so if you have a slow moving thing blocking all the resources
allenday no one else can get past
allenday then when the slow moving job finishes, the others all start getting processed very quickly (high cpu load during map), then as they begin to finish there is a flurry of i/o
allenday it’s like congestion on the freeway where one car slams on the breaks it sends this wave of traffic jam behind it
allenday assuming the freeway is already close to capacity (not sparse)

Distributed Systems
Hadoop
Java
Random musings
SGE

Comments (1)

Permalink