Summarizes the sento_corpus
object and returns insights about the evolution of
documents, features and tokens over time.
corpus_summarize(x, by = "day", features = NULL)
is a sento_corpus
object created with sento_corpus
a single character
vector to specify the frequency time interval over which the statistics
need to be calculated.
a character
vector that can be used to select a subset of the features to analyse.
returns a list
containing:
a data.table
with statistics about the number of documents, total, average, minimum and maximum
number of tokens and the number of texts per features for each date.
a list
with three plots representing the above statistics.
This function summarizes the sento_corpus
object by generating statistics about
documents, features and tokens over time. The insights can be narrowed down to a chosen set of metadata
features. The same tokenization as in the sentiment calculation in compute_sentiment
is used.
data("usnews", package = "sentometrics")
corpus <- sento_corpus(usnews)
# summary of corpus by day
summary1 <- corpus_summarize(corpus)
# summary of corpus by month for both journals
summary2 <- corpus_summarize(corpus, by = "month",
features = c("wsj", "wapo"))