spcal.dists

Distributions and related code.

spcal.dists.lognormal

Lognormal distribution.

spcal.dists.lognormal.cdf(x: ndarray, mu: float, sigma: float) ndarray

Cummulative density function of a log-normal distribution.

Parameters:
  • x – x values

  • mu – mean of underlying normal distribution

  • sigma – shape parameter

Returns:

CDF at all x

spcal.dists.lognormal.pdf(x: ndarray, mu: float, sigma: float) ndarray

Probabilty density function of a log-normal distribution.

Parameters:
  • x – x values

  • mu – mean of underlying normal distribution

  • sigma – shape parameter

Returns:

PDF at all x

spcal.dists.lognormal.quantile(quantile: ndarray, mu: float, sigma: float) ndarray

Quantile (inverse CDF) function of a log-normal distribution.

Parameters:
  • quantile – values at which to evaluate

  • mu – mean of underlying normal distribution

  • sigma – shape parameter

Returns:

quantile at all quantile

spcal.dists.normal

Normal distribution.

spcal.dists.normal.cdf(x: ndarray, mu: float = 0.0, sigma: float = 1.0) ndarray

Cummulative density function of a normal distribution.

Parameters:
  • x – values

  • mu – mean

  • sigma – standard deviation

Returns:

CDF at all x

spcal.dists.normal.erf(x: float | ndarray) float | ndarray

Error function approximation.

The maximum error is 1.5e-7 [1].

Parameters:

x – value

Returns:

approximation of error function

References

spcal.dists.normal.erfinv(x: float | ndarray) float | ndarray

The inverse error function.

Maximum error is ~ 1.061e-9.

Parameters:

x – input (-1 - 1)

Returns:

inverse error

spcal.dists.normal.pdf(x: ndarray, mu: float = 0.0, sigma: float = 1.0) ndarray

Probability density function of a normal distribution.

Parameters:
  • x – values

  • mu – mean

  • sigma – standard deviation

Returns:

PDF at all x

spcal.dists.normal.quantile(x: ndarray, mu: float = 0.0, sigma: float = 1.0) ndarray

Quantile (inverse-CDF) function of a normal distribution.

Parameters:
  • x – quantile values

  • mu – mean

  • sigma – standard deviation

Returns:

quantile at all x

spcal.dists.normal.standard_quantile(p: float | ndarray) float | ndarray

Approximation of the standard normal quantile.

The maximum error is 1.5e-9 [2].

Parameters:

p – quantile (0 - 1)

Returns:

quantile of the standard normal at p

References

spcal.dists.poisson

Poisson distribution.

spcal.dists.poisson.cdf(k: ndarray, lam: float) ndarray

Poisson cummulative distribution function.

\(\sum_{j=0}^{\lfloor k \rfloor} \text{PMF}(j, \lambda)\)

Parameters:
  • k – index values, integer

  • lam – expected rate of occurences

Returns:

CDF at all k

spcal.dists.poisson.pdf(k: ndarray, lam: float) ndarray

Poisson probability mass function.

\(\frac{\lambda^k e^{-k}}{k!}\)

Parameters:
  • k – index values, integer

  • lam – expected rate of occurrences

Retuns:

PMF at all k

spcal.dists.poisson.quantile(q: float, lam: float) int

Poisson quantile function

spcal.dists.util

Compound-poisson calculation.

spcal.dists.util.compound_poisson_lognormal_quantile_approximation(q: float, lam: float, mu: float, sigma: float) float

Appoximation of a compound Poisson-Lognormal quantile.

Calculates the zero-truncated quantile of the distribution by appoximating the log-normal sum for each value k given by the Poisson distribution. The CDF is calculated for each log-normal, weighted by the Poisson PDF for k. The quantile is taken from the sum of the CDFs.

<5% error for lam < 100.0; sigma < 0.5

Parameters:
  • q – quantile

  • lam – mean of the Poisson distribution

  • mu – log mean of the log-normal distribution

  • sigma – log stddev of the log-normal distribution

Returns:

the q th value of the compound Poisson-Lognormal

spcal.dists.util.compound_poisson_lognormal_quantile_lookup(q: ndarray | float, lam: ndarray | float, mu: ndarray | float, sigma: ndarray | float) ndarray | float

The quantile of a compound Poisson-Lognormal distribution.

Interpolates values from a simulation of 1e10 zero-truncated values. The lookup table spans lambda values from 0.01 to 100.0, sigmas of 0.25 to 0.95 and zt-quantiles of 1e-3 to 1.0 - 1e-7. Maximum error is ~ 0.2 %.

Parameters:
  • q – quantile

  • lam – mean of the Poisson distribution

  • mu – log mean of the log-normal distribution

  • sigma – log stddev of the log-normal distribution

Returns:

the q th value of the compound Poisson-Lognormal

spcal.dists.util.extract_compound_poisson_lognormal_parameters(x: ndarray, mask: ndarray | None = None) ndarray

Finds the parameters of compound-Poisson-lognormal distributed data, x.

\[\begin{split}N &\sim Poisson(\lambda) \\ X &\sim Lognormal(\mu, \sigma) \\ Y &= \sum_{n=1}^{N} X_{n}\end{split}\]

The value of \(\lambda\) is extracted using the percentage of zeros in x.

\[\lambda = -\log{P(0)}\]

The expected value and variance of the underlying lognormal are extracted from the mean and variance of x.

\[\begin{split}E(Y) &= \lambda E(X) \\ V(Y) &= \lambda E(X^2)\end{split}\]

Parameters \(\mu\) and \(\sigma\) are then extracted using the method of moments.

Parameters:
  • x – raw ICP-ToF signal of shape (samples, features)

  • mask – mask of valid values, defaults to all non-nan

Returns:

array of […, (lambda, mu, sigma)]

spcal.dists.util.extract_compound_poisson_lognormal_parameters_iterative(x: ndarray, alpha: float = 1e-05, dilation: int = 50, max_iters: int = 100, iter_eps: float = 0.01, bounds: ndarray | None = None) tuple[float, float, float]

Finds the parameters of compound Poisson – lognormal distributed data, x.

Parameters are iterative found using extract_compound_poisson_lognormal_parameters, a threshold based on these parameters is set then the parameters extracted again. This is repeated until either the threshold or both µ and σ no longer change. Parameters can be confined using the bounds argument, useful for reducing iterations in samples with many paraticles. By default only σ is bounded, 0.2 – 1.0.

Parameters:
  • x – data

  • alpha – alpha value to use during thresholding

  • dilation – number of points to remove around detected peaks

  • max_iters – maximum number of iterations

  • iter_eps – smallest change in threshold allowed

  • bounds – array of shape (3, 2) of parameter bounds

Returns:

lam, mu, sigma

spcal.dists.util.sum_iid_lognormals(n: int | ndarray, mu: float, sigma: float, method: str = 'Fenton-Wilkinson') tuple[float | ndarray, float | ndarray]

Sum of n identical independant log-normal distributions.

The sum is approximated by another log-normal distribution, defined by the returned parameters. By feaults, the Fenton-Wilkinson approximation is used for good right-tail accuracy [3].

Parameters:
  • n – int or array of ints

  • mu – log mean of the underlying distributions

  • sigma – log stddev of the underlying distributions

  • method – approximation to use, ‘Fenton-Wilkinson’ or ‘Lo’

Returns:

mu, sigma of the log-normal approximation

References

spcal.dists.util.zero_trunc_quantile(lam: ndarray | float, y: ndarray | float) ndarray | float

Returns the zero-truncated Poisson quantile.

Parameters:
  • lam – Poisson rate parameter(s)

  • y – quantile(s) of non-truncated dist

Returns:

quantile(s) of the zero-truncated dist