Aggregates sentiment measures by combining across provided lexicons, features, and time weighting
schemes dimensions. For do.global = FALSE
, the combination occurs by taking the mean of the relevant
measures. For do.global = TRUE
, this function aggregates all sentiment measures into a weighted global textual
sentiment measure for each of the dimensions.
# S3 method for class 'sento_measures'
aggregate(
x,
features = NULL,
lexicons = NULL,
time = NULL,
do.global = FALSE,
do.keep = FALSE,
...
)
a sento_measures
object created using sento_measures
.
a list
with unique features to aggregate at given name, e.g., list(feat12 = c("feat1", "feat2"))
. See x$features
for the exact names to use. Use NULL
(default) to apply no merging across this dimension. If do.global = TRUE
, should be a numeric
vector of
weights, of size length(x$features)
, in the same order. A value of NULL
means equally weighted.
a list
with unique lexicons to aggregate at given name, e.g., list(lex12 = c("lex1", "lex2"))
. See x$lexicons
for the exact names to use. Use NULL
(default) to apply no merging across this dimension. If do.global = TRUE
, should be a numeric
vector of
weights, of size length(x$lexicons)
, in the same order. A value of NULL
means equally weighted.
a list
with unique time weighting schemes to aggregate at given name, e.g., list(tw12 = c("tw1", "tw2"))
. See x$time
for the exact names to use. Use NULL
(default)
to apply no merging across this dimension. If do.global = TRUE
, should be a numeric
vector of
weights, of size length(x$time)
, in the same order. A value of NULL
means equally weighted.
a logical
indicating if the sentiment measures should be aggregated into weighted
global sentiment indices.
a logical
indicating if the original sentiment measures should be kept (i.e., the aggregated
sentiment measures will be added to the current sentiment measures as additional indices if do.keep = TRUE
).
not used.
If do.global = FALSE
, a modified sento_measures
object, with the aggregated sentiment
measures, including updated information and statistics, but the original sentiment scores data.table
untouched.
If do.global = TRUE
, a data.table
with the different types of weighted global sentiment measures,
named "globLex"
, "globFeat"
, "globTime"
and "global"
, with "date"
as the first
column. The last measure is an average of the the three other measures.
If do.global = TRUE
, the measures are constructed from weights that indicate the importance (and sign)
along each component from the lexicons
, features
, and time
dimensions. There is no restriction in
terms of allowed weights. For example, the global index based on the supplied lexicon weights ("globLex"
) is obtained
first by multiplying every sentiment measure with its corresponding weight (meaning, the weight given to the lexicon the
sentiment is computed with), then by taking the average per date.
data("usnews", package = "sentometrics")
data("list_lexicons", package = "sentometrics")
data("list_valence_shifters", package = "sentometrics")
# construct a sento_measures object to start with
corpus <- sento_corpus(corpusdf = usnews)
corpusSample <- quanteda::corpus_sample(corpus, size = 500)
l <- sento_lexicons(list_lexicons[c("LM_en", "HENRY_en")],
list_valence_shifters[["en"]])
ctr <- ctr_agg(howTime = c("equal_weight", "linear"),
by = "year", lag = 3)
sento_measures <- sento_measures(corpusSample, l, ctr)
# aggregation across specified components
smAgg <- aggregate(sento_measures,
time = list(W = c("equal_weight", "linear")),
features = list(journals = c("wsj", "wapo")),
do.keep = TRUE)
# aggregation in full
dims <- get_dimensions(sento_measures)
smFull <- aggregate(sento_measures,
lexicons = list(L = dims[["lexicons"]]),
time = list(T = dims[["time"]]),
features = list(F = dims[["features"]]))
# "global" aggregation
smGlobal <- aggregate(sento_measures, do.global = TRUE,
lexicons = c(0.3, 0.1),
features = c(1, -0.5, 0.3, 1.2),
time = NULL)
if (FALSE) { # \dontrun{
# aggregation won't work, but produces informative error message
aggregate(sento_measures,
time = list(W = c("equal_weight", "almon1")),
lexicons = list(LEX = c("LM_en")),
features = list(journals = c("notInHere", "wapo")))} # }