Word Cloud 2:
Now I want to create another word cloud, but this time using a different document. This document contains the titles of every thesis written in the Kansas State University's Department of Geography since 2000. I want to create a word cloud to find out which words are used the most in creating a title for a thesis statement. I will use very similar code as before, but with a different choice of stopwords removed. Observe below:
> library("tm")
> library("SnowballC")
> library("wordcloud")
> library("RColorBrewer")
> filePath <- "GeographySince2000.txt"
> text <- readLines(filePath)
> docs <- Corpus(VectorSource(text))
> docs <- tm_map(docs, stripWhitespace)
> docs <- tm_map(docs, removeNumbers)
> docs <- tm_map(docs, removePunctuation)
> docs <- tm_map(docs, content_transformer(tolower))
> docs <- tm_map(docs, removeWords, stopwords("english"))
> docs <- tm_map(docs, removeWords, c("kansas","missouri","american","nebraska","america","colorado","india","flint","eastern","northern","southern","western","northeast","teton","hantavirus","among","north","south","east","west","central","using","along","great","united","states"))
> dtm <- TermDocumentMatrix(docs)
> m <- as.matrix(dtm)
> v <- sort(rowSums(m), decreasing=TRUE)
> d <- data.frame(word = names(v), freq = v)
> set.seed(1234)
> wordcloud(words = d$word, freq = d$freq, min.freq = 3, max.words = 200, random.order = FALSE, rot.per = 0.35, colors = brewer.pal(8, "Dark2"))
Based on this word cloud, the term "change" is used the most. One of the hallmark features of science and math is its ability to observe change. This is probably why the term "change" is used the most among all words in this cloud.