R/sentolexicons.R
sento_lexicons.Rd
Structures provided lexicon(s) and optionally valence words. One can for example combine (part of) the
built-in lexicons from data("list_lexicons")
with other lexicons, and add one of the built-in valence word lists
from data("list_valence_shifters")
. This function makes the output coherent, by converting all words to
lowercase and checking for duplicates. All entries consisting of more than one word are discarded, as required for
bag-of-words sentiment analysis.
sento_lexicons(lexiconsIn, valenceIn = NULL, do.split = FALSE)
a named list
of (raw) lexicons, each element as a data.table
or a data.frame
with
respectively a character
column (the words) and a numeric
column (the polarity scores). This argument can be
one of the built-in lexicons accessible via sentometrics::list_lexicons
.
a single valence word list as a data.table
or a data.frame
with respectively a "x"
and a "y"
or "t"
column. The first column has the words, "y"
has the values for bigram
shifting, and "t"
has the types of the valence shifter for a clustered approach to sentiment calculation
(supported types: 1
= negators, 2
= amplifiers, 3
= deamplifiers, 4
= adversative conjunctions).
Type 4
is only used in a clusters-based sentence-level sentiment calculation.
If three columns are provided, only the first two will be considered. This argument can be one of the
built-in valence word lists accessible via sentometrics::list_valence_shifters
. A word that appears in both a
lexicon and the valence word list is prioritized as a lexical entry during sentiment calculation. If
NULL
, valence shifting is not applied in the sentiment analysis.
a logical
that if TRUE
splits every lexicon into a separate positive polarity and negative
polarity lexicon.
A list
of class sento_lexicons
with each lexicon as a separate element according to its name, as a
data.table
, and optionally an element named valence
that comprises the valence words. Every "x"
column
contains the words, every "y"
column contains the scores. The "t"
column for valence shifters
contains the different types.
data("list_lexicons", package = "sentometrics")
data("list_valence_shifters", package = "sentometrics")
# lexicons straight from built-in word lists
l1 <- sento_lexicons(list_lexicons[c("LM_en", "HENRY_en")])
# including a self-made lexicon, with and without valence shifters
lexIn <- c(list(myLexicon = data.table::data.table(w = c("nice", "boring"), s = c(2, -1))),
list_lexicons[c("GI_en")])
valIn <- list_valence_shifters[["en"]]
l2 <- sento_lexicons(lexIn)
l3 <- sento_lexicons(lexIn, valIn)
l4 <- sento_lexicons(lexIn, valIn[, c("x", "y")], do.split = TRUE)
l5 <- sento_lexicons(lexIn, valIn[, c("x", "t")], do.split = TRUE)
l6 <- l5[c("GI_en_POS", "valence")] # preserves sento_lexicons class
if (FALSE) { # \dontrun{
# include lexicons from lexicon package
lexIn2 <- list(hul = lexicon::hash_sentiment_huliu, joc = lexicon::hash_sentiment_jockers)
l7 <- sento_lexicons(c(lexIn, lexIn2), valIn)} # }
if (FALSE) { # \dontrun{
# faulty extraction, no replacement allowed
l5["valence"]
l2[0]
l3[22]
l4[1] <- l2[1]
l4[[1]] <- l2[[1]]
l4$GI_en_NEG <- l2$myLexicon} # }