Helpful Stan Functions
Multivariate Wallenius' Noncentral Hypergeometric Distribution Functions

Functions

real multi_wallenius_integral (real t, real xc, array[] real theta, array[] real x_r, array[] int x_i)
 
real multi_wallenius_lpmf (data array[] int k, vector m, vector p, data array[] real x_r, data real tol)
 

Detailed Description

The probability mass function of the multivariate Wallenius' hypergeometric distribution is given by

\[ f(\mathbf{x} \mid \mathbf{m}, \mathbf{\omega}) = \left(\prod_{i=1}^c \binom{m_i}{x_i} \right) \int_0^1 \prod_{i=1}^c (1-t^{\omega_i/D})^{x_i} \operatorname{d}t \]

where \( \mathbf{m} = (m_1, \ldots, m_c) \in \mathbb{N}^c \) is the size of each \(c\) group of the population. The population is given by \( N = \sum_{i=1}^c m_i \) where the realizations from each group is given by \( \mathbf{x} \). The weights of each group are contained in the \(c\)-sized simplex \( \mathbf{\omega} \) and \( D \) is

\[ D = \mathbf{\omega} \cdot (\mathbf{m} - \mathbf{x})) = \sum_{i=1}^c \omega_i(m_i - x_i). \]

real multi_wallenius_integral(real t, // Function argument
real xc, array[] real theta, // parameters
array[] real x_r, // data (real)
array[] int x_i) {
// data (integer)
real Dinv = 1 / theta[1];
int Cp1 = num_elements(x_i);
int n = x_i[1];
real v = 1;
for (i in 2 : Cp1)
v *= pow(1 - t ^ (theta[i] * Dinv), x_i[i]);
return v;
}
real multi_wallenius_lpmf(data array[] int k, vector m, vector p, data array[] real x_r,
data real tol) {
int C = num_elements(m);
real D = dot_product(to_row_vector(p), (m - to_vector(k[2 : C + 1])));
real lp = log(integrate_1d(multi_wallenius_integral, 0, 1,
append_array({D}, to_array_1d(p)), x_r, k, tol));
for (i in 1 : C)
lp += -log1p(m[i]) - lbeta(m[i] - k[i + 1] + 1, k[i + 1] + 1);
return lp;
}
real multi_wallenius_integral(real t, real xc, array[] real theta, array[] real x_r, array[] int x_i)
Definition: multi_wallenius_hypergeometric.stanfunctions:36
real multi_wallenius_lpmf(data array[] int k, vector m, vector p, data array[] real x_r, data real tol)
Definition: multi_wallenius_hypergeometric.stanfunctions:70

Function Documentation

◆ multi_wallenius_integral()

real multi_wallenius_integral ( real  t,
real  xc,
array[]real  theta,
array[]real  x_r,
array[]int  x_i 
)

Multivariate Wallenius' Noncentral Hypergeometric Integral

\[ I(t \mid \mathbf{\omega}, D) = \int_0^1 \prod_{i=1}^c (1-t^{\omega_i/D})^{x_i} \operatorname{d}t \]

Parameters
tReal number on [0,1]
xc
thetaArray of real parameters
x_rArray of data (real)
x_iArray of data (integer)
Returns
integrand

◆ multi_wallenius_lpmf()

real multi_wallenius_lpmf ( data array[]int  k,
vector  m,
vector  p,
data array[]real  x_r,
data real  tol 
)

Multivariate Wallenius' Noncentral Hypergeometric Distribution

Note that Stan cannot estimate discrete parameters so the realizations k must be data. This is enforced via the integral. If one wants to estimate missing data then an approximation will have to be performed. This can be accomplished by updating the function to take in a vector instead of an array of integers.

Parameters
kArray of integer data
mVector of population margin sizes
pSimplex of margin probabilities
x_rArray of data (real)
tolTolerance of integration function
Returns
log probability