Sets up control object for linear or nonlinear modeling of a response variable onto a large panel of
textual sentiment measures (and potentially other variables). See sento_model
for details on the
estimation and calibration procedure.
ctr_model(
model = c("gaussian", "binomial", "multinomial"),
type = c("BIC", "AIC", "Cp", "cv"),
do.intercept = TRUE,
do.iter = FALSE,
h = 0,
oos = 0,
do.difference = FALSE,
alphas = seq(0, 1, by = 0.2),
lambdas = NULL,
nSample = NULL,
trainWindow = NULL,
testWindow = NULL,
start = 1,
do.shrinkage.x = FALSE,
do.progress = TRUE,
nCore = 1
)
Arguments
model |
a character vector with one of the following: "gaussian" (linear regression), "binomial"
(binomial logistic regression), or "multinomial" (multinomial logistic regression). |
type |
a character vector indicating which model calibration approach to use. Supports "BIC ",
"AIC " and "Cp " (Mallows's Cp) as sparse regression adapted information criteria (Tibshirani and Taylor,
2012; Zou, Hastie and Tibshirani, 2007), and "cv " (cross-validation based on the train
function from the caret package). The adapted information criteria are only available for a linear regression. |
do.intercept |
a logical , TRUE by default fits an intercept. |
do.iter |
a logical , TRUE induces an iterative estimation of models at the given nSample size and
performs the associated out-of-sample prediction exercise through time. |
h |
an integer value that shifts the time series to have the desired prediction setup; h = 0 means
no change to the input data (nowcasting assuming data is aligned properly), h > 0 shifts the dependent variable by
h periods (i.e., rows) further in time (forecasting), h < 0 shifts the independent variables by h
periods. |
oos |
a non-negative integer to indicate the number of periods to skip from the end of the training sample
up to the out-of-sample prediction(s). This is either used in the cross-validation based calibration approach
(if type = "cv "), or for the iterative out-of-sample prediction analysis (if do.iter = TRUE ). For
instance, given \(t\), the (first) out-of-sample prediction is computed at \(t +\) oos \(+ 1\). |
do.difference |
a logical , TRUE will difference the target variable y supplied in the
sento_model function with as lag the absolute value of the h argument, but
abs(h) > 0 is required. For example, if h = 2 , and assuming the y variable is properly aligned
date-wise with the explanatory variables denoted by \(X\) (the sentiment measures and other in x ), the regression
will be of \(y_{t + 2} - y_t\) on \(X_t\). If h = -2 , the regression fitted is \(y_{t + 2} - y_t\) on
\(X_{t+2}\). The argument is always kept at FALSE if the model argument is one of
c("binomial", "multinomial") . |
alphas |
a numeric vector of the alphas to test for during calibration, between 0 and 1. A value of
0 pertains to Ridge regression, a value of 1 to LASSO regression; values in between are pure elastic net. |
lambdas |
a numeric vector of the lambdas to test for during calibration, \(>= 0\).
A value of zero means no regularization, thus requires care when the data is fat. By default set to
NULL , such that the lambdas sequence is generated by the glmnet function
or set to 10^seq(2, -2, length.out = 100) in case of cross-validation. |
nSample |
a positive integer as the size of the sample for model estimation at every iteration (ignored if
do.iter = FALSE ). |
trainWindow |
a positive integer as the size of the training sample for cross-validation (ignored if
type != "cv "). |
testWindow |
a positive integer as the size of the test sample for cross-validation (ignored if type !=
"cv "). |
start |
a positive integer to indicate at which point the iteration has to start (ignored if
do.iter = FALSE ). For example, given 100 possible iterations, start = 70 leads to model estimations
only for the last 31 samples. |
do.shrinkage.x |
a logical vector to indicate which of the other regressors provided through the x
argument of the sento_model function should be subject to shrinkage (TRUE ). If argument is of
length one, it applies to all external regressors. |
do.progress |
a logical , if TRUE progress statements are displayed during model calibration. |
nCore |
a positive integer to indicate the number of cores to use for a parallel iterative model
estimation (do.iter = TRUE ). We use the %dopar% construct from the foreach package. By default,
nCore = 1 , which implies no parallelization. No progress statements are displayed whatsoever when nCore > 1 .
For cross-validation models, parallelization can also be carried out for a single-shot model (do.iter = FALSE ),
whenever a parallel backend is set up. See the examples in sento_model . |
Value
A list
encapsulating the control parameters.
References
Tibshirani and Taylor (2012). Degrees of freedom in LASSO problems.
The Annals of Statistics 40, 1198-1232, doi: 10.1214/12-AOS1003
.
Zou, Hastie and Tibshirani (2007). On the degrees of freedom of the LASSO.
The Annals of Statistics 35, 2173-2192, doi: 10.1214/009053607000000127
.
See also
Author
Samuel Borms, Keven Bluteau
Examples