Extract documents related to sentiment peaks

This function extracts the documents with most extreme sentiment (lowest, highest or both in absolute terms). The extracted documents are unique, even when, for example, all most extreme sentiment values (across sentiment calculation methods) occur only for one document.

peakdocs(sentiment, n = 10, type = "both", do.average = FALSE)

Arguments

sentiment: a sentiment object created using compute_sentiment or as.sentiment.
n: a positive numeric value to indicate the number of documents associated to sentiment peaks to extract. If n < 1, it is interpreted as a quantile (for example, 0.07 would mean the 7% most extreme documents).
type: a character value, either "pos", "neg" or "both", respectively to look for the n documents related to the most positive, most negative or most extreme (in absolute terms) sentiment occurrences.
do.average: a logical to indicate whether peaks should be selected based on the average sentiment value per document.

Value

A vector of type "character" corresponding to the n extracted document identifiers.

Author

Samuel Borms

Examples

set.seed(505)

data("usnews", package = "sentometrics")
data("list_lexicons", package = "sentometrics")
data("list_valence_shifters", package = "sentometrics")

l <- sento_lexicons(list_lexicons[c("LM_en", "HENRY_en")])

corpus <- sento_corpus(corpusdf = usnews)
corpusSample <- quanteda::corpus_sample(corpus, size = 200)
sent <- compute_sentiment(corpusSample, l, how = "proportionalPol")

# extract the peaks
peaksAbs <- peakdocs(sent, n = 5)
peaksAbsQuantile <- peakdocs(sent, n = 0.50)
peaksPos <- peakdocs(sent, n = 5, type = "pos")
peaksNeg <- peakdocs(sent, n = 5, type = "neg")