Word Cloud:
In this assignment, we will be using R to create a word cloud of a speech of our choice. If one is to do this on their own, they must install the following packages for R on their computer: tm
, SnowballC
, wordcloud
, and RColorBrewer
. I was provided a script in R that I could use for this assignment by my professor. I will be using it to create a word cloud for Jesus's Sermon on the Mount, delivered in 33 AD. In this speech, Jesus lays out his moral teachings to the audience on a hill near the Sea of Galilee in northern Israel. The full text of this speech can be found in any Bible in Matthew 5, 6, and 7. I transcribed this speech from the King James Bible to a .txt file to be analyzed in R, and have a world cloud created from it. Click here to view this file. Below will be the code I used to create this word cloud.
NOTE: In the fifth line of the code, the filename of the txt file may be different, depending on what you choose to call it. You just need to have a txt file present in the directory you hooked up R to, and you need to call it by its name, whatever it is.
> library("tm")
> library("SnowballC")
> library("wordcloud")
> library("RColorBrewer")
> filePath <- "Sermon on the Mount.txt"
> text <- readLines(filePath)
> docs <- Corpus(VectorSource(text))
> docs <- tm_map(docs, stripWhitespace)
> docs <- tm_map(docs, removeNumbers)
> docs <- tm_map(docs, removePunctuation)
> docs <- tm_map(docs, content_transformer(tolower))
> docs <- tm_map(docs, removeWords, stopwords("english"))
> docs <- tm_map(docs, removeWords, c("thee", "thou","shall","thy","thine","shalt","unto","whosoever","neither","every","may","yea","nay","yet","hath","doth","say","seeth","heareth","lest","doest","knoweth","whatsoever","asketh","leadeth","will","verily","bringeth","prayest","wherefore","till","clothe","mine"))
> dtm <- TermDocumentMatrix(docs)
> m <- as.matrix(dtm)
> v <- sort(rowSums(m), decreasing=TRUE)
> d <- data.frame(word = names(v), freq = v)
> set.seed(1234)
> wordcloud(words = d$word, freq = d$freq, min.freq = 2, max.words = 200, random.order = FALSE, rot.per = 0.35, colors = brewer.pal(8, "Dark2"))
Here is the word cloud we get:
As you would expect from a work of text transcribed from the 16th century, there is lots of archaic language. Words like thee, thou, thine, etc. I could not find a script to remove them, so I had to do them manually to eliminate any and all stop words in order to make this word cloud as informative as possible. The words that Jesus uses the most are "heaven" and "father". This makes sense in a religious text, since the term "father" refers to God, and God is in Heaven.