Set up control for sentiment-based sparse regression modeling

Sets up control object for linear or nonlinear modeling of a response variable onto a large panel of textual sentiment measures (and potentially other variables). See sento_model for details on the estimation and calibration procedure.

ctr_model(
  model = c("gaussian", "binomial", "multinomial"),
  type = c("BIC", "AIC", "Cp", "cv"),
  do.intercept = TRUE,
  do.iter = FALSE,
  h = 0,
  oos = 0,
  do.difference = FALSE,
  alphas = seq(0, 1, by = 0.2),
  lambdas = NULL,
  nSample = NULL,
  trainWindow = NULL,
  testWindow = NULL,
  start = 1,
  do.shrinkage.x = FALSE,
  do.progress = TRUE,
  nCore = 1
)

Arguments

model: a character vector with one of the following: "gaussian" (linear regression), "binomial" (binomial logistic regression), or "multinomial" (multinomial logistic regression).
type: a character vector indicating which model calibration approach to use. Supports "BIC", "AIC" and "Cp" (Mallows's Cp) as sparse regression adapted information criteria (Tibshirani and Taylor, 2012; Zou, Hastie and Tibshirani, 2007), and "cv" (cross-validation based on the train function from the caret package). The adapted information criteria are only available for a linear regression.
do.intercept: a logical, TRUE by default fits an intercept.
do.iter: a logical, TRUE induces an iterative estimation of models at the given nSample size and performs the associated out-of-sample prediction exercise through time.
h: an integer value that shifts the time series to have the desired prediction setup; h = 0 means no change to the input data (nowcasting assuming data is aligned properly), h > 0 shifts the dependent variable by h periods (i.e., rows) further in time (forecasting), h < 0 shifts the independent variables by h periods.
oos: a non-negative integer to indicate the number of periods to skip from the end of the training sample up to the out-of-sample prediction(s). This is either used in the cross-validation based calibration approach (if type = "cv"), or for the iterative out-of-sample prediction analysis (if do.iter = TRUE). For instance, given \(t\), the (first) out-of-sample prediction is computed at \(t +\) oos \(+ 1\).
do.difference: a logical, TRUE will difference the target variable y supplied in the sento_model function with as lag the absolute value of the h argument, but abs(h) > 0 is required. For example, if h = 2, and assuming the y variable is properly aligned date-wise with the explanatory variables denoted by \(X\) (the sentiment measures and other in x), the regression will be of \(y_{t + 2} - y_t\) on \(X_t\). If h = -2, the regression fitted is \(y_{t + 2} - y_t\) on \(X_{t+2}\). The argument is always kept at FALSE if the model argument is one of c("binomial", "multinomial").
alphas: a numeric vector of the alphas to test for during calibration, between 0 and 1. A value of 0 pertains to Ridge regression, a value of 1 to LASSO regression; values in between are pure elastic net.
lambdas: a numeric vector of the lambdas to test for during calibration, \(>= 0\). A value of zero means no regularization, thus requires care when the data is fat. By default set to NULL, such that the lambdas sequence is generated by the glmnet function or set to 10^seq(2, -2, length.out = 100) in case of cross-validation.
nSample: a positive integer as the size of the sample for model estimation at every iteration (ignored if do.iter = FALSE).
trainWindow: a positive integer as the size of the training sample for cross-validation (ignored if type != "cv").
testWindow: a positive integer as the size of the test sample for cross-validation (ignored if type != "cv").
start: a positive integer to indicate at which point the iteration has to start (ignored if do.iter = FALSE). For example, given 100 possible iterations, start = 70 leads to model estimations only for the last 31 samples.
do.shrinkage.x: a logical vector to indicate which of the other regressors provided through the x argument of the sento_model function should be subject to shrinkage (TRUE). If argument is of length one, it applies to all external regressors.
do.progress: a logical, if TRUE progress statements are displayed during model calibration.
nCore: a positive integer to indicate the number of cores to use for a parallel iterative model estimation (do.iter = TRUE). We use the %dopar% construct from the foreach package. By default, nCore = 1, which implies no parallelization. No progress statements are displayed whatsoever when nCore > 1. For cross-validation models, parallelization can also be carried out for a single-shot model (do.iter = FALSE), whenever a parallel backend is set up. See the examples in sento_model.

Value

A list encapsulating the control parameters.

References

Tibshirani and Taylor (2012). Degrees of freedom in LASSO problems. The Annals of Statistics 40, 1198-1232, doi:10.1214/12-AOS1003 .

Zou, Hastie and Tibshirani (2007). On the degrees of freedom of the LASSO. The Annals of Statistics 35, 2173-2192, doi:10.1214/009053607000000127 .

Author

Samuel Borms, Keven Bluteau

Examples