Since Ruby is my new favorite toy, I thought it would be fun to try my hand at C extensions.  I came across David Blei’s C code for Latent Dirichlet Allocation and it looked simple enough to convert into a Ruby module.  Ruby makes it very easy to wrap some C functions (which is good to know if you need a really fast implementation of something that gets called alot).  Wrapping a C library is slightly harder, but not horribly so.  Probably most of my challenge was the fact that it’s been so long since I wrote anything in C.

Since the code is open source, I decided to release the Ruby wrapper as a gem on GitHub.  I chose GitHub over RubyForge, because it uses Git and freakin’ rocks.  But GitHub is a story for another day.  Feel free to contribute and extend the project if you’re so inclined.

A basic usage example:

require 'lda'
# create an Lda object for training
lda = Lda::Lda.new
corpus = Lda::Corpus.new("data/data_file.dat")
lda.corpus = corpus
# run EM algorithm using random starting points
lda.em("random")
lda.load_vocabulary("data/vocab.txt")
# print the topic 20 words per topic
lda.print_topics(20)

You can also download the gem from GitHub directly:

gem sources -a http://gems.github.com
sudo gem install ealdent-lda-ruby

You only need the first line if you haven’t added GitHub to your sources before.

Comments
  1. [...] lda, machine learning, nlp, ruby, rubygems, topic modeling. Leave a Comment A while back I ported David Blei’s lda-c code for performing Latent Dirichlet Allocation to Ruby.  Basically I [...]

  2. Image Owen Dall says:

    Thanks much for this, Jason. Been discussing LDA vs, LSI with Jimmy Lin of UMD. Was glad to find a Ruby implementation!

    Owen

  3. Image Marc says:

    Thank you! Didn’t really expect to find an LDA library for ruby, but here it is :)

  4. Image dfrankow says:

    What is the exact format for data_file.dat and vocab.txt?

    I know they are like SVMlight, but I don’t remember that format. Words are indexes, then it is a doc per line, but I don’t remember the specifics.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s