<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Allen Day's Blog &#187; Mathematics</title>
	<atom:link href="http://www.spicylogic.com/allenday/blog/category/science/mathematics/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.spicylogic.com/allenday/blog</link>
	<description>♥data♥</description>
	<lastBuildDate>Mon, 21 Jun 2010 23:28:18 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Upcoming AI / Machine Learning Conferences</title>
		<link>http://www.spicylogic.com/allenday/blog/2008/12/05/upcoming-ai-machine-learning-conferences/</link>
		<comments>http://www.spicylogic.com/allenday/blog/2008/12/05/upcoming-ai-machine-learning-conferences/#comments</comments>
		<pubDate>Fri, 05 Dec 2008 19:49:13 +0000</pubDate>
		<dc:creator>allenday</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Informatics]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Networking]]></category>
		<category><![CDATA[Science]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.spicylogic.com/allenday/blog/2008/12/05/upcoming-ai-machine-learning-conferences/</guid>
		<description><![CDATA[A (partial) list I found today.  Doesn&#8217;t include NIPS, so I&#8217;m not sure how exhaustive it is, but it has a bunch I haven&#8217;t seen before.
http://www.kmining.com/info_conferences.html
]]></description>
			<content:encoded><![CDATA[<p>A (partial) list I found today.  Doesn&#8217;t include NIPS, so I&#8217;m not sure how exhaustive it is, but it has a bunch I haven&#8217;t seen before.</p>
<p><a href="http://www.kmining.com/info_conferences.html">http://www.kmining.com/info_conferences.html</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spicylogic.com/allenday/blog/2008/12/05/upcoming-ai-machine-learning-conferences/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sparse Matrices in R</title>
		<link>http://www.spicylogic.com/allenday/blog/2008/05/05/sparse-matrices-in-r/</link>
		<comments>http://www.spicylogic.com/allenday/blog/2008/05/05/sparse-matrices-in-r/#comments</comments>
		<pubDate>Mon, 05 May 2008 07:16:28 +0000</pubDate>
		<dc:creator>allenday</dc:creator>
				<category><![CDATA[Informatics]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.spicylogic.com/allenday/blog/?p=10</guid>
		<description><![CDATA[
I&#8217;ve had a need over the last week to work with some sparse matrix data in R.  This was a totally new problem for me, and I can now sympathize with anyone else having to do this and will document the experience.
It seems that the de-facto standard for moving sparse matrices is around is [...]]]></description>
			<content:encoded><![CDATA[<div style="float:right"><img style="width:300px" src="http://www.mathworks.com/matlabcentral/files/17385/dijkstra.jpg"/></div>
<p>I&#8217;ve had a need over the last week to work with some sparse matrix data in <a href="http://cran.r-project.org">R</a>.  This was a totally new problem for me, and I can now sympathize with anyone else having to do this and will document the experience.</p>
<p>It seems that the de-facto standard for moving sparse matrices is around is to use the <a href="http://math.nist.gov/MatrixMarket/collections/hb.html">Harwell-Boeing file format</a>, aka &#8220;harbo&#8221;.  It&#8217;s a horrible and largely undocumented fixed-width (think Fortran) file format.  The best documentation I could find was in source code <a href="http://acts.nersc.gov/tau/programs/pdgssvx/dreadhb.c">here</a>, although you may be able to piece more of it together with <a href="http://www.koders.com/default.aspx?s=harwell-boeing">Koders</a>.  R does include a harbo reader as part of the <a href="http://cran.r-project.org/web/packages/SparseM/">SparseM</a> package.</p>
<p>Given that I&#8217;m more comfortable working in Perl than in R or Fortran, I decided to have a look on CPAN to see what was available.  As it turns out, there is a package called <a href="http://search.cpan.org/~tpederse/Text-SenseClusters-1.01/">Text::SenseClusters</a> from <a href="http://www.d.umn.edu/~tpederse/senseclusters.html">Ted Pedersen</a> that ships with a nifty program, <a href="http://search.cpan.org/~tpederse/Text-SenseClusters-0.98/Toolkit/svd/mat2harbo.pl">mat2harbo.pl</a>.  I found the preferred sparse matrix &#8220;mat&#8221; format used by Text::SenseClusters to be more reasonable than harbo. Here&#8217;s an example.</p>
<pre>5 5 15
2 9 4 9
1 6 2 5 3 7 4 8 5 6
1 4 2 5
1 7 2 6 3 7
1 9 2 8 3 9</pre>
<p>.  There is a header line &#8220;
<pre>5 5 15</pre>
<p>&#8221; that defines the matrix rows, columns, and number of non null fields.  Each subsequent (possibly blank) line gives index/value pairs for the non-null positions in that row.  Easy!</p>
<p>At this point I was formulating a plan to:</p>
<ol>
<li>use my matrix writer to write in &#8220;mat&#8221; format to <code>file1.mat</code>.</li>
<li>convert <code>file1.mat</code> to <code>file2.harbo</code> using <code>mat2harbo.pl</code> from Text::SenseClusters.</li>
<li>import file2.harbo into R using the <code>read.matrix.hb()</code> function in the <code>SparseM</code> package.</li>
<li>convert the SparseM matrix to an R graph (<code>graph</code> package).</li>
<li>get back to my original problem&#8230; analyzing the matrix in R with <a href="http://www.boost.org/">Boost</a> via the <code><a href="http://cran.r-project.org/web/packages/RBGL/">RBGL</a></code> package.</li>
</ol>
<p>Well, it wasn&#8217;t that easy.</p>
<p>Step 1 went okay.  Step 2 had problems with null columns, and had some glitches in the output format.  Some of these glitches were easy to fix (e.g. matrix definition of &#8220;rra&#8221; to &#8220;RRA&#8221;), but others were very difficult due to the fact that mat2harbo.pl didn&#8217;t provide &#8220;full&#8221; harbo support, and the SparseM reader needed some of the file format features that weren&#8217;t supported.</p>
<p>So I wrote my own &#8220;mat&#8221; file -&gt; R matrix.rsc object constructor myself.  Here it is:</p>

<div class="wp_syntax"><div class="code"><pre class="c">read.<span style="color: #202020;">matrix</span>.<span style="color: #202020;">pair</span> <span style="color: #66cc66;">=</span> <span style="color: #000000; font-weight: bold;">function</span> <span style="color: #66cc66;">&#40;</span>file,debug<span style="color: #66cc66;">=</span><span style="color: #000000; font-weight: bold;">FALSE</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span>
mat.<span style="color: #202020;">lines</span> <span style="color: #66cc66;">=</span> readLines<span style="color: #66cc66;">&#40;</span>file<span style="color: #66cc66;">&#41;</span>;
header <span style="color: #66cc66;">=</span> mat.<span style="color: #202020;">lines</span><span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#93;</span>;
F <span style="color: #66cc66;">=</span> strsplit<span style="color: #66cc66;">&#40;</span>header,<span style="color: #ff0000;">' '</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#91;</span><span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#93;</span>;
nrow <span style="color: #66cc66;">=</span> as.<span style="color: #202020;">integer</span><span style="color: #66cc66;">&#40;</span>F<span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span>;
ncol <span style="color: #66cc66;">=</span> as.<span style="color: #202020;">integer</span><span style="color: #66cc66;">&#40;</span>F<span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">2</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span>;
nelem <span style="color: #66cc66;">=</span> as.<span style="color: #202020;">integer</span><span style="color: #66cc66;">&#40;</span>F<span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">3</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;
ja <span style="color: #66cc66;">=</span> vector<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;list&quot;</span>,nrow<span style="color: #66cc66;">&#41;</span>;
ra <span style="color: #66cc66;">=</span> vector<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;list&quot;</span>,nrow<span style="color: #66cc66;">&#41;</span>;
ia <span style="color: #66cc66;">=</span> vector<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;list&quot;</span>,nrow<span style="color: #66cc66;">&#41;</span>;
ia<span style="color: #66cc66;">&#91;</span><span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">=</span> c<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #b1b100;">for</span> <span style="color: #66cc66;">&#40;</span> i in <span style="color: #cc66cc;">2</span><span style="color: #66cc66;">:</span><span style="color: #66cc66;">&#40;</span>nrow<span style="color: #cc66cc;">+1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span> <span style="color: #339933;">#nrow</span>
mat.<span style="color: #202020;">line</span> <span style="color: #66cc66;">=</span> strsplit<span style="color: #66cc66;">&#40;</span>mat.<span style="color: #202020;">lines</span><span style="color: #66cc66;">&#91;</span>i<span style="color: #66cc66;">&#93;</span>,<span style="color: #ff0000;">' '</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#91;</span><span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#93;</span>;
<span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span> length<span style="color: #66cc66;">&#40;</span>mat.<span style="color: #202020;">line</span><span style="color: #66cc66;">&#41;</span> &gt; <span style="color: #cc66cc;">0</span> <span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span>
<span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span> debug <span style="color: #66cc66;">&#41;</span> print<span style="color: #66cc66;">&#40;</span>paste<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'non-empty row'</span>,i<span style="color: #cc66cc;">-1</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
ja<span style="color: #66cc66;">&#91;</span><span style="color: #66cc66;">&#91;</span>i<span style="color: #cc66cc;">-1</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">=</span> mat.<span style="color: #202020;">line</span><span style="color: #66cc66;">&#91;</span>  seq<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span>,length<span style="color: #66cc66;">&#40;</span>mat.<span style="color: #202020;">line</span><span style="color: #66cc66;">&#41;</span>,by<span style="color: #66cc66;">=</span><span style="color: #cc66cc;">2</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#93;</span>;
ra<span style="color: #66cc66;">&#91;</span><span style="color: #66cc66;">&#91;</span>i<span style="color: #cc66cc;">-1</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">=</span> mat.<span style="color: #202020;">line</span><span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">+</span>seq<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span>,length<span style="color: #66cc66;">&#40;</span>mat.<span style="color: #202020;">line</span><span style="color: #66cc66;">&#41;</span>,by<span style="color: #66cc66;">=</span><span style="color: #cc66cc;">2</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#93;</span>;
ia<span style="color: #66cc66;">&#91;</span><span style="color: #66cc66;">&#91;</span>i<span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">=</span> ia<span style="color: #66cc66;">&#91;</span><span style="color: #66cc66;">&#91;</span>i<span style="color: #cc66cc;">-1</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">+</span> length<span style="color: #66cc66;">&#40;</span>mat.<span style="color: #202020;">line</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">/</span><span style="color: #cc66cc;">2</span>;
<span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span> debug <span style="color: #66cc66;">&#41;</span> print<span style="color: #66cc66;">&#40;</span>paste<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'  pos:'</span>,ja<span style="color: #66cc66;">&#91;</span><span style="color: #66cc66;">&#91;</span>i<span style="color: #cc66cc;">-1</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span> debug <span style="color: #66cc66;">&#41;</span> print<span style="color: #66cc66;">&#40;</span>paste<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'  dat:'</span>,ra<span style="color: #66cc66;">&#91;</span><span style="color: #66cc66;">&#91;</span>i<span style="color: #cc66cc;">-1</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #66cc66;">&#125;</span> <span style="color: #b1b100;">else</span> <span style="color: #66cc66;">&#123;</span>
<span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span> debug <span style="color: #66cc66;">&#41;</span> print<span style="color: #66cc66;">&#40;</span>paste<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'    empty row'</span>,i<span style="color: #cc66cc;">-1</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
ia<span style="color: #66cc66;">&#91;</span><span style="color: #66cc66;">&#91;</span>i<span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">=</span> ia<span style="color: #66cc66;">&#91;</span><span style="color: #66cc66;">&#91;</span>i<span style="color: #cc66cc;">-1</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#93;</span>;
<span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&#125;</span>
ans.<span style="color: #202020;">ja</span> <span style="color: #66cc66;">=</span> as.<span style="color: #202020;">integer</span><span style="color: #66cc66;">&#40;</span>unlist<span style="color: #66cc66;">&#40;</span>ja<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
ans.<span style="color: #202020;">ra</span> <span style="color: #66cc66;">=</span> as.<span style="color: #202020;">integer</span><span style="color: #66cc66;">&#40;</span>unlist<span style="color: #66cc66;">&#40;</span>ra<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
ans.<span style="color: #202020;">ia</span> <span style="color: #66cc66;">=</span> as.<span style="color: #202020;">integer</span><span style="color: #66cc66;">&#40;</span>unlist<span style="color: #66cc66;">&#40;</span>ia<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
dimension <span style="color: #66cc66;">=</span> as.<span style="color: #202020;">integer</span><span style="color: #66cc66;">&#40;</span>c<span style="color: #66cc66;">&#40;</span>nrow,ncol<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span> debug <span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span>
print<span style="color: #66cc66;">&#40;</span>paste<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'nrow'</span>,nrow<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
print<span style="color: #66cc66;">&#40;</span>paste<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'ncol'</span>,ncol<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
print<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'ra'</span><span style="color: #66cc66;">&#41;</span>;print<span style="color: #66cc66;">&#40;</span>ans.<span style="color: #202020;">ra</span><span style="color: #66cc66;">&#41;</span>;
print<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'ja'</span><span style="color: #66cc66;">&#41;</span>;print<span style="color: #66cc66;">&#40;</span>ans.<span style="color: #202020;">ja</span><span style="color: #66cc66;">&#41;</span>;
print<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'ia'</span><span style="color: #66cc66;">&#41;</span>;print<span style="color: #66cc66;">&#40;</span>ans.<span style="color: #202020;">ia</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #66cc66;">&#125;</span>
rd.<span style="color: #202020;">o</span> <span style="color: #66cc66;">=</span> new<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;matrix.csr&quot;</span>, ra <span style="color: #66cc66;">=</span> ans.<span style="color: #202020;">ra</span>, ja <span style="color: #66cc66;">=</span> ans.<span style="color: #202020;">ja</span>, ia <span style="color: #66cc66;">=</span> ans.<span style="color: #202020;">ia</span>, dimension <span style="color: #66cc66;">=</span> dimension<span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#125;</span></pre></div></div>

<p>This let me just read the &#8220;mat&#8221; file directly into R.  After that, the conversion to a graph object seems to work okay.  I say seems to because <strike>I&#8217;m still waiting</strike> for the SparseM -&gt; graph conversion routine to finish.  It&#8217;s a 50K x 50K matrix with about 2 million edges, so it&#8217;s taking a little while&#8230;</p>
<p>Took about as long to convert as it took me to post this.  Everything is fine.  Now I get back to doing all-by-all <a href="http://en.wikipedia.org/wiki/Dijkstra's_algorithm">Dijkstra</a> on the graph, or at least find a reasonably fast way to allow for one-off queries.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.spicylogic.com/allenday/blog/2008/05/05/sparse-matrices-in-r/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
