Quantile Regressions in Stan: Part III

stan
quantile
Author

Sean Pinkney

Published

October 23, 2022

This is part III of the quantile regression series. Part I and part II show the Bayesian quantile regression using the asymmetric Laplace distribution, augmented scheme, and score. In this post I’m going to show the multiple quantile regression given in Wu and Narisetty (2021), which is a follow up to part II.

Multiple quantile regression with score

The score method includes the regressors into the objective function as \[ s_\tau(\beta) = \sum_{i=1}^{n} x_i \psi_\tau(y_i - x_i^{\top}\beta). \] where \(\psi_\tau(u) = \tau - I(u < 0)\).

Wu and Narisetty (2021) consider the following “working” likelihood \[ \mathcal{L}(y \mid x,\, \beta) = C \exp\bigg(-\frac{1}{2n} s_\tau(\beta)^{\top} W s_\tau(\beta) \bigg), \] where \(W\) is a \(p \times p\) positive definite weight matrix, and C is a constant free of \(\beta\). The quadratic form and the given objective function implies that the quantile estimator is maximized at the typical quantile estimate.

The \(W\) matrix for multiple quantiles is given as

\[ W = (Q \otimes G )^{-1}, \text{ where } Q = (\min(\tau_i, \tau_j) - \tau_i \tau_j)_{ij}, \; G = \frac{1}{n}\sum_{i=1}^n x_i x_i^{\top}. \]

A nice property of \(W\) is that it depends all on user input data so can be constructed outside of the MCMC iterations.

Score function

The following Stan model implements the multiple \(\tau\) code.

Show the code
functions{
  matrix kronecker(matrix A, matrix B) {
    matrix[rows(A) * rows(B), cols(A) * cols(B)] C;
    int m = rows(A);
    int n = cols(A);
    int p = rows(B);
    int q = cols(B);
    for (i in 1:m) {
      for (j in 1:n) {
        int row_start = (i - 1) * p + 1;
        int row_end = (i - 1) * p + p;
        int col_start = (j - 1) * q + 1;
        int col_end = (j - 1) * q + q;
        C[row_start:row_end, col_start:col_end] = A[i, j] * B;
      }
    }
   return C;
}
  
  vector q_loss(real q, vector u){
    return (abs(u) + (2 * q - 1) * u);
 }
    
  vector score(vector q, vector y, matrix x, array[] vector beta) {
    int N = num_elements(y);
    int P = num_elements(beta[ , 1]);
    int K = num_elements(q);
    matrix[K, P] out;
      
    for (k in 1:K) {
      out[k] = transpose(transpose(x) * q_loss(q[k], y - x * to_vector(beta[ ,k])));
    }
      
    return to_vector(out);
  }
    
  matrix make_w (matrix x, vector q) {
    int N = rows(x);
    int P = cols(x);
    int m = num_elements(q);
    matrix[m * P, m * P] out;
      
    matrix[m, m] Q;
    matrix[P, P] G = crossprod(x) ;
    
  //  G[1:P, 1] /= log(N);
  //  G[1, 2:P] = transpose(G[2:P, 1]);
      
    for (i in 1:m) {
      Q[i, i] = q[i] * (1 - q[i]);
      for (j in 1:i - 1) {
        Q[i, j] = min([q[i], q[j]]) - q[i] * q[j];
        Q[j, i] = Q[i, j];
      }
    }
  
    return kronecker(N * inverse(G), inverse(Q));
  }
}
data {
  int N;               // Number of observation
  int P;               // Number of predictors
  int K;               // Number of quantiles
  vector[K] q;
  vector[N] y;         
  matrix[N, P] x;
}
transformed data {
  matrix[K * P, K * P] W = make_w(x, q);
}
parameters {
  // can add ordered constraint here
   array[P] vector[K] beta;
   real<lower=0> sigma;
}
model {
  vector[K * P] score_vec = score(q, y, x, beta) / sigma;
  
  for (i in 1:K) 
    beta[, i] ~ normal(0, 4);
  
  sigma ~ exponential(1);
 
  target += -quad_form(W, score_vec) / (2 * N * K) - K * N * log(sigma);

}

I’ll simulate the same data as in part II:

Show the code
set.seed(12312)
library(quantreg)
N     <- 1000
x     <- runif(N, max=10)
alpha <- -1
beta  <- 2
y     <- alpha + beta * x + rnorm(N, sd = .5 * x)
q     <- c(0.05, 0.5, 0.95)

# frequentist estimate
out_freq <- quantreg::rq(y ~ x, tau = q)

out_score <- multiple_score_qr$sample(
  data = list(N = N,
              P = 2,
              K = length(q),
              q = q,
              y = y,
              x = as.matrix(data.frame(alpha = 1,  x = x))),
  seed = 12123123, 
  parallel_chains = 4,
  iter_warmup = 500,
  iter_sampling = 500,
    refresh = 0,
  show_messages = FALSE
)
Running MCMC with 4 parallel chains...

Chain 2 finished in 1.8 seconds.
Chain 1 finished in 2.0 seconds.
Chain 3 finished in 1.9 seconds.
Chain 4 finished in 2.1 seconds.

All 4 chains finished successfully.
Mean chain execution time: 2.0 seconds.
Total execution time: 2.3 seconds.
Show the code
out_score$summary()
# A tibble: 8 × 10
  variable       mean    median     sd    mad       q5       q95  rhat ess_bulk
  <chr>         <dbl>     <dbl>  <dbl>  <dbl>    <dbl>     <dbl> <dbl>    <dbl>
1 lp__      -3083.    -3083.    1.81   1.73   -3087.   -3081.     1.00     746.
2 beta[1,1]    -1.40     -1.39  0.274  0.286     -1.85    -0.969  1.00     872.
3 beta[2,1]     1.30      1.30  0.0430 0.0437     1.23     1.37   1.00     905.
4 beta[1,2]    -0.867    -0.864 0.124  0.124     -1.07    -0.667  1.00     971.
5 beta[2,2]     1.97      1.97  0.0206 0.0196     1.94     2.00   1.00    1000.
6 beta[1,3]    -0.905    -0.936 0.218  0.207     -1.21    -0.503  1.00     971.
7 beta[2,3]     2.83      2.84  0.0456 0.0447     2.75     2.90   1.01     944.
8 sigma         1.69      1.69  0.0229 0.0239     1.66     1.73   1.00    1558.
# … with 1 more variable: ess_tail <dbl>

Test on ImmunogG data

Show the code
library(Brq)
data("ImmunogG")
dat <- data.frame(y = ImmunogG$IgG, 
                  alpha = 1, 
                  x = ImmunogG$Age, 
                  xsq = ImmunogG$Age^2)

out_score_mod <- multiple_score_qr$sample(
  data = list(N = nrow(ImmunogG),
              P = 3,
              K = 3,
              q = c(0.05, 0.5, 0.95),
              y = ImmunogG$IgG,
              x = as.matrix(dat[, 2:4])),
  seed = 12123123, 
  parallel_chains = 4,
  max_treedepth = 12,
  iter_warmup = 500,
  iter_sampling = 500,
    refresh = 0,
  show_messages = FALSE
)
Running MCMC with 4 parallel chains...

Chain 1 finished in 5.8 seconds.
Chain 2 finished in 6.3 seconds.
Chain 4 finished in 6.9 seconds.
Chain 3 finished in 7.1 seconds.

All 4 chains finished successfully.
Mean chain execution time: 6.5 seconds.
Total execution time: 7.2 seconds.
Show the code
out_freq_1 <- rq(y ~ x + xsq, 
               data = dat,
               tau = 0.05)
out_freq_2 <- rq(y ~ x + xsq, 
               data = dat,
               tau = 0.5)
out_freq_3 <- rq(y ~ x + xsq, 
               data = dat,
               tau = 0.95)

Results

Modified score:

Show the code
out_score_mod$summary()
# A tibble: 11 × 10
   variable       mean    median     sd    mad       q5       q95  rhat ess_bulk
   <chr>         <dbl>     <dbl>  <dbl>  <dbl>    <dbl>     <dbl> <dbl>    <dbl>
 1 lp__      -605.     -605.     2.25   2.11   -609.    -602.      1.00     749.
 2 beta[1,1]    0.648     0.682  0.443  0.441    -0.116    1.33    1.00     770.
 3 beta[2,1]    1.20      1.20   0.387  0.376     0.545    1.85    1.01     695.
 4 beta[3,1]   -0.138    -0.139  0.0648 0.0620   -0.247   -0.0296  1.01     727.
 5 beta[1,2]    2.69      2.68   0.316  0.319     2.18     3.22    1.01     677.
 6 beta[2,2]    1.20      1.21   0.238  0.235     0.810    1.58    1.01     652.
 7 beta[3,2]   -0.0773   -0.0784 0.0374 0.0374   -0.137   -0.0137  1.01     703.
 8 beta[1,3]    7.12      7.12   0.609  0.620     6.16     8.18    1.00     791.
 9 beta[2,3]   -0.385    -0.393  0.494  0.505    -1.19     0.416   1.01     707.
10 beta[3,3]    0.237     0.239  0.0788 0.0785    0.105    0.366   1.01     705.
11 sigma        1.19      1.19   0.0270 0.0278    1.15     1.23    1.00    1373.
# … with 1 more variable: ess_tail <dbl>

Quantreg

Quantreg:

Show the code
summary(out_freq_1, se = "iid")

Call: rq(formula = y ~ x + xsq, tau = 0.05, data = dat)

tau: [1] 0.05

Coefficients:
            Value    Std. Error t value  Pr(>|t|)
(Intercept)  0.76514  0.30477    2.51058  0.01259
x            1.19143  0.24792    4.80577  0.00000
xsq         -0.13371  0.04037   -3.31187  0.00104
Show the code
summary(out_freq_2, se = "iid")

Call: rq(formula = y ~ x + xsq, tau = 0.5, data = dat)

tau: [1] 0.5

Coefficients:
            Value    Std. Error t value  Pr(>|t|)
(Intercept)  2.80112  0.59354    4.71936  0.00000
x            1.15865  0.48282    2.39976  0.01703
xsq         -0.07523  0.07863   -0.95677  0.33947
Show the code
summary(out_freq_3, se = "iid")

Call: rq(formula = y ~ x + xsq, tau = 0.95, data = dat)

tau: [1] 0.95

Coefficients:
            Value    Std. Error t value  Pr(>|t|)
(Intercept)  7.23563  0.67142   10.77659  0.00000
x           -0.48165  0.54617   -0.88186  0.37857
xsq          0.24602  0.08895    2.76597  0.00603

References

Wu, Teng, and Naveen N. Narisetty. 2021. Bayesian Multiple Quantile Regression for Linear Models Using a Score Likelihood.” Bayesian Analysis 16 (3): 875–903. https://doi.org/10.1214/20-BA1217.