Sets up control object for linear or nonlinear modeling of a response variable onto a large panel of
textual sentiment measures (and potentially other variables). See sento_model
for details on the
estimation and calibration procedure.
ctr_model(
model = c("gaussian", "binomial", "multinomial"),
type = c("BIC", "AIC", "Cp", "cv"),
do.intercept = TRUE,
do.iter = FALSE,
h = 0,
oos = 0,
do.difference = FALSE,
alphas = seq(0, 1, by = 0.2),
lambdas = NULL,
nSample = NULL,
trainWindow = NULL,
testWindow = NULL,
start = 1,
do.shrinkage.x = FALSE,
do.progress = TRUE,
nCore = 1
)
a character
vector with one of the following: "gaussian"
(linear regression), "binomial"
(binomial logistic regression), or "multinomial"
(multinomial logistic regression).
a character
vector indicating which model calibration approach to use. Supports "BIC
",
"AIC
" and "Cp
" (Mallows's Cp) as sparse regression adapted information criteria (Tibshirani and Taylor,
2012; Zou, Hastie and Tibshirani, 2007), and "cv
" (cross-validation based on the train
function from the caret package). The adapted information criteria are only available for a linear regression.
a logical
, TRUE
by default fits an intercept.
a logical
, TRUE
induces an iterative estimation of models at the given nSample
size and
performs the associated out-of-sample prediction exercise through time.
an integer
value that shifts the time series to have the desired prediction setup; h = 0
means
no change to the input data (nowcasting assuming data is aligned properly), h > 0
shifts the dependent variable by
h
periods (i.e., rows) further in time (forecasting), h < 0
shifts the independent variables by h
periods.
a non-negative integer
to indicate the number of periods to skip from the end of the training sample
up to the out-of-sample prediction(s). This is either used in the cross-validation based calibration approach
(if type =
"cv
"), or for the iterative out-of-sample prediction analysis (if do.iter = TRUE
). For
instance, given \(t\), the (first) out-of-sample prediction is computed at \(t +\) oos
\(+ 1\).
a logical
, TRUE
will difference the target variable y
supplied in the
sento_model
function with as lag the absolute value of the h
argument, but
abs(h) > 0
is required. For example, if h = 2
, and assuming the y
variable is properly aligned
date-wise with the explanatory variables denoted by \(X\) (the sentiment measures and other in x
), the regression
will be of \(y_{t + 2} - y_t\) on \(X_t\). If h = -2
, the regression fitted is \(y_{t + 2} - y_t\) on
\(X_{t+2}\). The argument is always kept at FALSE
if the model
argument is one of
c("binomial", "multinomial")
.
a numeric
vector of the alphas to test for during calibration, between 0 and 1. A value of
0 pertains to Ridge regression, a value of 1 to LASSO regression; values in between are pure elastic net.
a numeric
vector of the lambdas to test for during calibration, \(>= 0\).
A value of zero means no regularization, thus requires care when the data is fat. By default set to
NULL
, such that the lambdas sequence is generated by the glmnet
function
or set to 10^seq(2, -2, length.out = 100)
in case of cross-validation.
a positive integer
as the size of the sample for model estimation at every iteration (ignored if
do.iter = FALSE
).
a positive integer
as the size of the training sample for cross-validation (ignored if
type !=
"cv
").
a positive integer
as the size of the test sample for cross-validation (ignored if type !=
"cv
").
a positive integer
to indicate at which point the iteration has to start (ignored if
do.iter = FALSE
). For example, given 100 possible iterations, start = 70
leads to model estimations
only for the last 31 samples.
a logical
vector to indicate which of the other regressors provided through the x
argument of the sento_model
function should be subject to shrinkage (TRUE
). If argument is of
length one, it applies to all external regressors.
a logical
, if TRUE
progress statements are displayed during model calibration.
a positive integer
to indicate the number of cores to use for a parallel iterative model
estimation (do.iter = TRUE
). We use the %dopar%
construct from the foreach package. By default,
nCore = 1
, which implies no parallelization. No progress statements are displayed whatsoever when nCore > 1
.
For cross-validation models, parallelization can also be carried out for a single-shot model (do.iter = FALSE
),
whenever a parallel backend is set up. See the examples in sento_model
.
A list
encapsulating the control parameters.
Tibshirani and Taylor (2012). Degrees of freedom in LASSO problems. The Annals of Statistics 40, 1198-1232, doi:10.1214/12-AOS1003 .
Zou, Hastie and Tibshirani (2007). On the degrees of freedom of the LASSO. The Annals of Statistics 35, 2173-2192, doi:10.1214/009053607000000127 .
# information criterion based model control functions
ctrIC1 <- ctr_model(model = "gaussian", type = "BIC", do.iter = FALSE, h = 0,
alphas = seq(0, 1, by = 0.10))
ctrIC2 <- ctr_model(model = "gaussian", type = "AIC", do.iter = TRUE, h = 4, nSample = 100,
do.difference = TRUE, oos = 3)
# cross-validation based model control functions
ctrCV1 <- ctr_model(model = "gaussian", type = "cv", do.iter = FALSE, h = 0,
trainWindow = 250, testWindow = 4, oos = 0, do.progress = TRUE)
ctrCV2 <- ctr_model(model = "binomial", type = "cv", h = 0, trainWindow = 250,
testWindow = 4, oos = 0, do.progress = TRUE)
ctrCV3 <- ctr_model(model = "multinomial", type = "cv", h = 2, trainWindow = 250,
testWindow = 4, oos = 2, do.progress = TRUE)
ctrCV4 <- ctr_model(model = "gaussian", type = "cv", do.iter = TRUE, h = 0, trainWindow = 45,
testWindow = 4, oos = 0, nSample = 70, do.progress = TRUE)