I’m in the process of setting up an analytic workflow at BiggerBoat. It’s looking like the main theme in data structures around here will be the sparse matrix. So I’ve been playing with opensource technologies for sparse matrices. Apache Hadoop’s HBase is looking like a good choice for now, maybe Hive later.
Right now I’m getting familiar with the former. As part of this, I’m improving the docs on the wiki to make them more user- (as opposed to core developer-) friendly. My documentation goal right now is to add some data transformation example code. There are already lots of hadoop examples for doing text -> text mapping, e.g. grep, cat, etc. For HBase not so much. I.e.
- text to text (done, many examples
- flatfile to HBase table (Bulk loader in the HBase wiki, I haven’t tried it yet)
- HBase table to flatfile
- HBase table to HBase table
I’ll be adding updated, complete, and simple code for the latter two (three?) in the next few days to the HBase/MapReduce page.
HBase bulk load/import example | ♥data♥ | 28-Aug-08 at 7:16 pm | Permalink
[...] my earlier post, I’ve almost finished an (actually compilable, functional) bulk loader example. Should be [...]