Summarizes the sento_corpus object and returns insights about the evolution of documents, features and tokens over time.

corpus_summarize(x, by = "day", features = NULL)

Arguments

x is a sento_corpus object created with sento_corpus a single character vector to specify the frequency time interval over which the statistics need to be calculated. a character vector that can be used to select a subset of the features to analyse.

Value

returns a list containing:

stats

a data.table with statistics about the number of documents, total, average, minimum and maximum number of tokens and the number of texts per features for each date.

plots

a list with three plots representing the above statistics.

Details

This function summarizes the sento_corpus object by generating statistics about documents, features and tokens over time. The insights can be narrowed down to a chosen set of metadata features. The same tokenization as in the sentiment calculation in compute_sentiment is used.

Examples

data("usnews", package = "sentometrics")

corpus <- sento_corpus(usnews)

# summary of corpus by day
summary1 <- corpus_summarize(corpus)

# summary of corpus by month for both journals
summary2 <- corpus_summarize(corpus, by = "month",
features = c("wsj", "wapo"))