Word Cloud 2:

  Now I want to create another word cloud, but this time using a different document. This document contains the titles of every thesis written in the Kansas State University's Department of Geography since 2000. I want to create a word cloud to find out which words are used the most in creating a title for a thesis statement. I will use very similar code as before, but with a different choice of stopwords removed. Observe below:

> library("tm")
> library("SnowballC")
> library("wordcloud")
> library("RColorBrewer")
> filePath <- "GeographySince2000.txt"
> text <- readLines(filePath)
> docs <- Corpus(VectorSource(text))
> docs <- tm_map(docs, stripWhitespace)
> docs <- tm_map(docs, removeNumbers)
> docs <- tm_map(docs, removePunctuation)
> docs <- tm_map(docs, content_transformer(tolower))
> docs <- tm_map(docs, removeWords, stopwords("english"))
> docs <- tm_map(docs, removeWords, c("kansas","missouri","american","nebraska","america","colorado","india","flint","eastern","northern","southern","western","northeast","teton","hantavirus","among","north","south","east","west","central","using","along","great","united","states"))
> dtm <- TermDocumentMatrix(docs)
> m <- as.matrix(dtm)
> v <- sort(rowSums(m), decreasing=TRUE)
> d <- data.frame(word = names(v), freq = v)
> set.seed(1234)
> wordcloud(words = d$word, freq = d$freq, min.freq = 3, max.words = 200, random.order = FALSE, rot.per = 0.35, colors = brewer.pal(8, "Dark2"))

The resulting word cloud. This word cloud is based on a text document containing the titles of every thesis produced by students at Kansas State University's Geography Department since 2000.

  Based on this word cloud, the term "change" is used the most. One of the hallmark features of science and math is its ability to observe change. This is probably why the term "change" is used the most among all words in this cloud.