Here is an overview of the most important anticipated developments, and known bugs or minor unfinished business. If you want to help out on some of these things, contact Samuel Borms, or simply open an issue and file a pull request on GitHub.
Implement a sento_train()
function to for instance
generate a lexicon from a corpus.
Add topic modeling functionality into the
add_features()
function (or as part of the
sento_train()
function).
Expand the number of available models in the
sento_model()
function (e.g. constrained regression, and
PCA).
Implement an optimization approach into the
aggregate.sento_measures(..., do.global = TRUE)
function to
extract optimized weights across dimensions (make it possibly available
through the sento_model()
function); this includes allowing
weights to be set in the aggregate.sento_measures()
function instead of averaging by default.
Implement fast textual sentiment computation for lexicons with ngrams.
Implement a scale.sentiment()
function.
Add a head.sento_measures()
and a
tail.sento_measures()
function.
Implement a structure to support high-frequency intraday aggregation.
Make more lexicons available (e.g. in German and Spanish).
Give more control to the user to play with
glmnet
parameters in the
sento_model()
function.
Write a helper function to aggregate an attributions
object into clusters.
Resolve inconsistency with data.frame
input columns
("text(s)"
& "(doc_)id"
) in the
sentometrics
,
quanteda
and
tm
corpus creators.
Prepare functional CRAN version of
sentometrics.app
package.
Find additional computational speed gains (especially after recent additions which introduced some overhead).
Add a "binary"
option to
get_hows()[["words"]]
that turns the sentiment computation
into an indicator-like calculation (value of 1 if a text has at least
one lexicon word).
Optimize parallelization of iterative model runs (e.g. avoid unnecessary copying of objects across cores).
Add a delete_features()
function as an intuitive
counterpart to add_features()
.
Solve issue that column names of sentiment measures output do not deal well with special characters but still get through.
Handle data.frame
and matrix
input in
sento_model(..., y, ...)
function more
consistently.
Add references to external textdata
package in examples (e.g. for extra lexicons).
Be more flexible for the features in a sento_corpus
object by also allowing values outside 0 and 1.
Make sure subsetting does not maintain a sentiment
object when it is not supposed to be.
Remove all but one (not all) duplicate entries in the
sento_lexicons()
function.
Make sure you can also add the "language"
identifier
to a corpus with add_features()
.