Summarizes the sento_corpus
object and returns insights about the evolution of
documents, features and tokens over time.
corpus_summarize(x, by = "day", features = NULL)
x | is a |
---|---|
by | a single |
features | a |
returns a list
containing:
a data.table
with statistics about the number of documents, total, average, minimum and maximum
number of tokens and the number of texts per features for each date.
a list
with three plots representing the above statistics.
This function summarizes the sento_corpus
object by generating statistics about
documents, features and tokens over time. The insights can be narrowed down to a chosen set of metadata
features. The same tokenization as in the sentiment calculation in compute_sentiment
is used.
Jeroen Van Pelt, Samuel Borms, Andres Algaba
data("usnews", package = "sentometrics") corpus <- sento_corpus(usnews) # summary of corpus by day summary1 <- corpus_summarize(corpus) # summary of corpus by month for both journals summary2 <- corpus_summarize(corpus, by = "month", features = c("wsj", "wapo"))