Modeling the Evolution of Science

David M. Blei and John D. Lafferty
Princeton University and Carnegie Mellon University

[Click here for the browser]

This browseable 75-topic dynamic topic model of the Journal Science (1880-2002) is part of the on-line supplement to the submission "Modeling the Evolution of Science." This browser allows a user to visualize the dynamic topic model, and use the hidden topics that it has uncovered to guide an exploration of the original collection of documents. Each article was OCRed by JSTOR, who graciously supplied the data.

Topic time-line pages

The first page of the browser contains the top five words from each topic, taken every ten years. Clicking on a particular topic at a particular year leads to a page that contains the top 100 words from that topic, links to its distribution for previous and future years, and the articles that exhibit that topic with the highest proportion. These topic pages can be explored to see how a topic has changed over time, and to explore an organization of articles according to the topics they exhibit.

Document pages

Clicking on the title of an article leads to a page which lists the main topics that it exhibits, and other similar articles according to the time-corrected topic similarity metric described the paper. For those users who have access to JSTOR, the title of the article at the top of the page is a link to the original scanned document.

An example

As an example, begin with the topic from 1890 with top words {steam, gas, engine, power, water}. The top panel provides links to this topic in the past and future. Clicking ahead to 1950, the same topic has top words {air rubber water gas glass}. Clicking to 1995, the top words are {materials polymer polymers glass devices}.

Staying in the 1995 page, the right panel has links to documents that are found to exhibit this topic. Clicking on "Electric Cars and Lead" leads to the corresponding document page. At the top are the topics that exhibit this document. Here, this includes {energy fusion power cost emissions} in addition to the original topic. Under the related topics are other documents that are similar to the source document according to time-corrected document similarity. Among the similar documents are "More Power, More Pollution" from 1969 and "Notes on Engineering" from 1897.

With the basic elements of topic pages, document pages, and links between them, we can explore the entire collection of Science according to the hidden topical structure that the dynamic topic model has uncovered. This provides an organizational structure to the corpus that would not be feasible to build by hand.