Summarizes the sento_corpus object and returns insights about the evolution of documents, features and tokens over time.

corpus_summarize(x, by = "day", features = NULL)

Arguments

x

is a sento_corpus object created with sento_corpus

by

a single character vector to specify the frequency time interval over which the statistics need to be calculated.

features

a character vector that can be used to select a subset of the features to analyse.

Value

returns a list containing:

stats

a data.table with statistics about the number of documents, total, average, minimum and maximum number of tokens and the number of texts per features for each date.

plots

a list with three plots representing the above statistics.

Details

This function summarizes the sento_corpus object by generating statistics about documents, features and tokens over time. The insights can be narrowed down to a chosen set of metadata features. The same tokenization as in the sentiment calculation in compute_sentiment is used.

Author

Jeroen Van Pelt, Samuel Borms, Andres Algaba

Examples

data("usnews", package = "sentometrics") corpus <- sento_corpus(usnews) # summary of corpus by day summary1 <- corpus_summarize(corpus) # summary of corpus by month for both journals summary2 <- corpus_summarize(corpus, by = "month", features = c("wsj", "wapo"))