Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.
Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.
TIMESTAMPS
The Wayback Machine - http://web.archive.org/web/20161126004857/http://www.cs.princeton.edu:80/~blei/lda-c/
Latent Dirichlet Allocation in C
Latent Dirichlet allocation
This is a C implementation of variational EM for latent Dirichlet
allocation (LDA), a topic model for text or other discrete data. LDA
allows you to analyze of corpus, and extract the topics that combined
to form its documents. For example, click
here to see the topics estimated from a small corpus of
Associated Press documents. LDA is fully described in
Blei et al. (2003) .
This code contains:
an implementation of variational inference for the per-document
topic proportions and per-word topic assignments
a variational EM procedure for estimating the topics
and exchangeable Dirichlet hyperparameter
2246 documents from the Associated Press [ download ].
Top 20 words from 100 topics estimated from the AP corpus [pdf].
Bug fixes and updates
To learn about bug-fixes, updates, and discuss LDA and related
techniques, please join the topic-models mailing list,
topic-models [at] lists.cs.princeton.edu.