Here is an overview of some of the (not all!) anticipated developments, and known bugs or minor unfinished business. The main objective is to converge towards a stable 1.0.0 release. If you want to help out on some of these things, contact the maintainer, or file a pull request on GitHub.
sento_train() function to for instance generate a lexicon from a corpus.
Add straightforward topic modelling functionality into the
add_features() function (or as part of the
Expand the number of available models in the
sento_model() function (e.g. constrained regression, and PCA).
Implement an optimization approach into the
aggregate.sento_measures(..., do.global = TRUE) function to extract optimized weights across dimensions (make it possibly available through the
sento_model() function); this includes allowing weights to be set in the
aggregate.sento_measures() function instead of averaging by default.
Implement fast textual sentiment computation for lexicons with ngrams.
head.sento_measures() and a
Implement a structure to support high-frequency intraday aggregation.
Make more lexicons available (e.g. German and Spanish).
Give more control to the user to play with
glmnet parameters in the
Write a helper function to aggregate an
attributions object into clusters.
Resolve inconsistency with
data.frame input columns (
"(doc_)id") in the
tm corpus creators.
Prepare functional CRAN version of
Find additional computational speed gains (especially after recent additions which introduced some overhead).
"binary" option to
get_hows()[["words"]] that turns the sentiment computation into an indicator-like calculation (value of 1 if a text has at least one lexicon word).
Optimize parallelization of iterative model runs (e.g. avoid unnecessary copying of objects across cores).
delete_features() function as an intuitive counterpart to
Solve issue that column names of sentiment measures output do not deal well with weird characters (e.g. é) but still get through.
matrix input in
sento_model(..., y, ...) function more consistently.
Add references to external
textdata package in examples (e.g. for extra lexicons).
Be more flexible for the features in a
sento_corpus object by also allowing values outside 0 and 1.
Make sure subsetting does not maintain a
sentiment object when it is not supposed to be.
Remove all but one (not all) duplicate entries in the
Make sure you can also add the
"language" identifier to a corpus with