Hypothesis Testing#
Test Statistics#
Test statistics for hypothesis testing.
- class everwillow.hypotest.test_statistics.TestStatistic[source]
Bases:
ModuleAbstract base class for test statistics.
Test statistics compute likelihood ratios and store relevant data in the TestStatResult.extras dict. The statistical interpretation (p-values) is handled separately by Distribution classes.
- Subclasses must implement:
_compute: Compute the core test statistic formula.
- compute(nll_fn, params, observation, poi_key, poi_test, **fit_kwargs)[source]
Compute the test statistic.
- Parameters:
nll_fn (Callable[[PyTree, PyTree], float]) – Negative log-likelihood function taking (params, observation).
params (State) – Initial parameter state.
observation (PyTree) – Observed data passed to nll_fn.
poi_key (str | tuple[str | int, ...]) – Canonical key for the parameter of interest, e.g. “mu”.
poi_test (float) – Test value for the POI.
**fit_kwargs (Any) – Additional arguments passed to fit().
- Returns:
TestStatResult with value, test, q_asimov, and extras.
- Return type:
TestStatResult
- class everwillow.hypotest.test_statistics.CowanTestStatistic(mu_asimov=0.0)[source]
Bases:
TestStatisticCowan test statistics with Asimov-based variance estimation.
There are five Cowan test statistics:
TMu: used for two-sided confidence intervals
TMuTilde: used for confidence intervals with a positive signal (Feldman-Cousins)
Q0: used for discovery tests (rejecting the \(\mu = 0\) hypothesis)
QMu: used for exclusion of a non-zero signal hypothesis
QMuTilde: used for exclusion of a non-zero signal hypothesis with a positive signal
This subclass extends
TestStatisticwith methods to efficiently compute the variance of \(\hat{\mu}\) using the Asimov dataset, as described in the Cowan paper.Asimov data can be provided in two ways:
asimov_observation: pre-computed Asimov dataset.predict_fn: generate Asimov atmu_asimov(default depends on the test statistic; override via themu_asimovkwarg).
If neither is provided,
q_asimovwill be None. This can cause p-value computations for test statistics that requireq_asimovto fail.- mu_asimov
Default POI value for Asimov dataset generation.
- Type:
- compute(nll_fn, params, observation, poi_key, poi_test, *, asimov_observation=None, predict_fn=None, mu_asimov=None, **fit_kwargs)[source]
Compute the test statistic.
- Parameters:
nll_fn (Callable[[PyTree, PyTree], float]) – Negative log-likelihood function taking (params, observation).
params (State) – Initial parameter state.
observation (PyTree) – Observed data passed to nll_fn.
poi_key (str | tuple[str | int, ...]) – Canonical key for the parameter of interest, e.g. “mu”.
poi_test (float) – Test value for the POI.
asimov_observation (PyTree | None) – Pre-computed Asimov dataset.
predict_fn (Callable[[State], PyTree] | None) – Function to generate expected observation from parameters.
mu_asimov (float | None) – POI value at which to generate the Asimov dataset. Defaults to
self.mu_asimov.**fit_kwargs (Any) – Additional arguments passed to fit().
- Returns:
TestStatResult with value, test, q_asimov, and extras.
- Return type:
TestStatResult
- class everwillow.hypotest.test_statistics.QTilde(mu_asimov=0.0)[source]
Bases:
CowanTestStatisticModified profile likelihood ratio for upper limits (Eq. 16).
The test statistic is:
\[\begin{split}\tilde{q}_\mu = \begin{cases} -2 \ln \frac{L(\mu, \hat{\hat{\theta}}(\mu))}{L(0, \hat{\hat{\theta}}(0))} & \text{if } \hat{\mu} < 0 \\ -2 \ln \frac{L(\mu, \hat{\hat{\theta}}(\mu))}{L(\hat{\mu}, \hat{\theta})} & \text{if } 0 \leq \hat{\mu} \leq \mu \\ 0 & \text{if } \hat{\mu} > \mu \end{cases}\end{split}\]where \(\hat{\hat{\theta}}(\mu)\) is the conditional MLE of the nuisance parameters given \(\mu\), and \(\hat{\mu}, \hat{\theta}\) are the unconditional MLEs. The boundary at \(\hat{\mu} > \mu\) protects against excluding signal when there is an upward fluctuation.
This is the standard test statistic for CLs upper limit calculations.
- class everwillow.hypotest.test_statistics.QMu(mu_asimov=0.0)[source]
Bases:
CowanTestStatisticProfile likelihood ratio for upper limits (Eq. 14).
The test statistic is:
\[\begin{split}q_\mu = \begin{cases} -2 \ln \lambda(\mu) & \text{if } \hat{\mu} \leq \mu \\ 0 & \text{if } \hat{\mu} > \mu \end{cases}\end{split}\]where \(\lambda(\mu) = L(\mu, \hat{\hat{\theta}}(\mu)) / L(\hat{\mu}, \hat{\theta})\) is the profile likelihood ratio. The boundary at \(\hat{\mu} > \mu\) protects against excluding signal when there is an upward fluctuation.
- class everwillow.hypotest.test_statistics.Q0(mu_asimov=1.0)[source]
Bases:
CowanTestStatisticDiscovery test statistic for testing \(\mu = 0\) (Eq. 12).
The test statistic is:
\[\begin{split}q_0 = \begin{cases} -2 \ln \lambda(0) & \text{if } \hat{\mu} \geq 0 \\ 0 & \text{if } \hat{\mu} < 0 \end{cases}\end{split}\]where \(\lambda(0) = L(0, \hat{\hat{\theta}}(0)) / L(\hat{\mu}, \hat{\theta})\) is the profile likelihood ratio evaluated at \(\mu = 0\). The boundary at \(\hat{\mu} < 0\) prevents “discovery” of negative signal.
- mu_asimov
Default POI value for Asimov generation. Defaults to 1.0 (signal hypothesis).
- Type:
- compute(nll_fn, params, observation, poi_key, poi_test, *, asimov_observation=None, predict_fn=None, mu_asimov=None, **fit_kwargs)[source]
Compute q_0 discovery test statistic.
Note
The
poi_testargument is ignored; Q0 always tests μ=0 by design.- Return type:
TestStatResult
- class everwillow.hypotest.test_statistics.TMu(mu_asimov=0.0)[source]
Bases:
CowanTestStatisticProfile likelihood ratio for two-sided confidence intervals (Eq. 8).
The test statistic is:
\[t_\mu = -2 \ln \lambda(\mu)\]where \(\lambda(\mu) = L(\mu, \hat{\hat{\theta}}(\mu)) / L(\hat{\mu}, \hat{\theta})\) is the profile likelihood ratio. No boundary is applied; \(t_\mu\) can take any non-negative value regardless of \(\hat{\mu}\).
Distributions#
Distributions for converting test statistics to p-values.
- class everwillow.hypotest.distributions.Distribution[source]
Bases:
ModuleAbstract base for test statistic distributions.
- Subclasses must implement:
null_pval: p-value under null hypothesis (\(\mu'= \mu\) where \(\mu\) is the hypothesis being tested).alt_pval: p-value under an alternative hypothesis (\(\mu'=0\) for exclusion, \(\mu'=1\) for discovery).
- abstractmethod null_pval(result)[source]
p-value under the null hypothesis (\(\mu' = \mu\)).
- Parameters:
result (TestStatResult) – Test statistic result.
- Returns:
Null p-value, or None if required data (e.g. q_asimov) is missing.
- Return type:
Array | None
- abstractmethod alt_pval(result)[source]
p-value under an alternative hypothesis.
- Parameters:
result (TestStatResult) – Test statistic result.
- Returns:
Alternative p-value, or None if required data (e.g. q_asimov) is missing.
- Return type:
Array | None
- null_significance(result)[source]
Significance under the null hypothesis: \(Z = \Phi^{-1}(1 - p_\text{null})\).
- Parameters:
result (TestStatResult) – Test statistic result.
- Returns:
Significance Z, or None if pnull is None.
- Return type:
Array | None
- alt_significance(result)[source]
Significance under the alternative hypothesis: \(Z = \Phi^{-1}(1 - p_\text{alt})\).
- Parameters:
result (TestStatResult) – Test statistic result.
- Returns:
Significance Z, or None if palt is None.
- Return type:
Array | None
- expected_pvalues(result)[source]
Compute expected p-values at standard sigma bands.
- Parameters:
result (TestStatResult) – Test statistic result.
- Returns:
ExpectedBands with (pnull, palt) at each sigma level.
- Raises:
NotImplementedError – If the distribution does not support expected p-value computation.
- Return type:
ExpectedBands | None
- class everwillow.hypotest.distributions.TMuAsymptotic[source]
Bases:
DistributionAsymptotic distribution for \(t_\mu\) (two-sided, Eq. 38).
Used with the \(t_\mu\) test statistic for two-sided confidence intervals.
- cdf(q, mu, mu_prime, sigma)[source]
CDF: \(F(t_\mu \mid \mu') = \Phi(\sqrt{t} + \frac{\mu-\mu'}{\sigma}) + \Phi(\sqrt{t} - \frac{\mu-\mu'}{\sigma}) - 1\).
- Return type:
- null_pval(result)[source]
Null p-value: \(p = 2(1 - \Phi(\sqrt{t_\mu}))\). No \(\sigma\) needed.
- Return type:
- class everwillow.hypotest.distributions.TMuTildeAsymptotic[source]
Bases:
DistributionAsymptotic distribution for \(\tilde{t}_\mu\) (two-sided with physical bound, Eq. 40/44).
Used with the \(\tilde{t}_\mu\) test statistic for two-sided tests with the physical constraint \(\mu \geq 0\). The CDF has a piecewise structure with the \(\Phi + \Phi - 1\) form in both regions (Eq. 44).
- cdf(q, mu, mu_prime, sigma)[source]
CDF: \(F(\tilde{t}_\mu \mid \mu')\) — piecewise at threshold \(\mu^2/\sigma^2\).
- Return type:
- null_pval(result)[source]
Null p-value (\(\mu' = \mu\)), where \(q_A = \mu^2/\sigma^2\).
\[\begin{split}p_{\mu'=\mu} = \begin{cases} 2\bigl(1 - \Phi(\sqrt{\tilde{t}})\bigr) & \text{if } \tilde{t} \leq q_A \\ 2 - \Phi(\sqrt{\tilde{t}}) - \Phi\!\left(\frac{\tilde{t} + q_A}{2\sqrt{q_A}}\right) & \text{if } \tilde{t} > q_A \end{cases}\end{split}\]- Return type:
Array | None
- alt_pval(result)[source]
Alt p-value (\(\mu' = 0\)), where \(q_A = \mu^2/\sigma^2\).
\[\begin{split}p_{\mu'=0} = \begin{cases} 2 - \Phi(\sqrt{\tilde{t}} + \sqrt{q_A}) - \Phi(\sqrt{\tilde{t}} - \sqrt{q_A}) & \text{if } \tilde{t} \leq q_A \\ 2 - \Phi(\sqrt{\tilde{t}} + \sqrt{q_A}) - \Phi\!\left(\frac{\tilde{t} - q_A}{2\sqrt{q_A}}\right) & \text{if } \tilde{t} > q_A \end{cases}\end{split}\]- Return type:
Array | None
- class everwillow.hypotest.distributions.Q0Asymptotic[source]
Bases:
DistributionAsymptotic distribution for \(q_0\) (discovery, Eq. 49).
Used with the \(q_0\) test statistic for discovery significance.
- cdf(q, mu, mu_prime, sigma)[source]
CDF: \(F(q_0 \mid \mu') = \Phi(\sqrt{q_0} - \mu'/\sigma)\).
- Return type:
- null_pval(result)[source]
Null p-value: \(p = 1 - \Phi(\sqrt{q_0})\). No \(\sigma\) needed.
- Return type:
- alt_pval(result)[source]
Alt p-value: \(p = 1 - \Phi(\sqrt{q_0} - \sqrt{q_A})\).
\(q_A = \mu_\text{asimov}^2/\sigma^2\) (Asimov under signal), so \(\sqrt{q_A} = \mu_\text{asimov}/\sigma\).
- Return type:
Array | None
- expected_pvalues(result)[source]
Expected p-values at \(\pm N\sigma\) fluctuations under signal hypothesis.
\(q_A = \mu_\text{asimov}^2/\sigma^2\) (Asimov under signal), so \(\sqrt{q_A} = \mu_\text{asimov}/\sigma\). \(q = \max(0, \sqrt{q_A} + N)^2\). Upward fluctuations (\(+N\)) increase discovery significance, opposite to exclusion tests.
- Parameters:
result (TestStatResult) – Must contain
q_asimovfor \(\sqrt{q_A}\).- Returns:
ExpectedBands with (pnull, palt) at each sigma level, or None if q_asimov is missing.
- Return type:
ExpectedBands | None
- class everwillow.hypotest.distributions.QMuAsymptotic[source]
Bases:
DistributionAsymptotic distribution for \(q_\mu\) (upper limit, Eq. 57).
Used with the \(q_\mu\) test statistic for upper limit calculations.
- cdf(q, mu, mu_prime, sigma)[source]
CDF: \(F(q_\mu \mid \mu') = \Phi(\sqrt{q_\mu} - (\mu - \mu')/\sigma)\).
- Return type:
- null_pval(result)[source]
Null p-value: \(p = 1 - \Phi(\sqrt{q_\mu})\). No \(\sigma\) needed.
- Return type:
- alt_pval(result)[source]
Alt p-value: \(p = 1 - \Phi(\sqrt{q_\mu} - \sqrt{q_A})\).
\(q_A = \mu^2/\sigma^2\) (Asimov under \(\mu'=0\)), so \(\sqrt{q_A} = \mu/\sigma\).
- Return type:
Array | None
- expected_pvalues(result)[source]
Expected p-values at \(\pm N\sigma\) fluctuations under background-only.
\(q_A = \mu^2/\sigma^2\) (Asimov under \(\mu'=0\)), so \(\sqrt{q_A} = \mu/\sigma\). At band \(N\), the expected \(\hat{\mu} = N\sigma\), giving \(\sqrt{q} = \max(0, \mu/\sigma - N)\). Synthetic TestStatResult objects are passed through the existing null_pval/alt_pval methods to reuse the CDF logic.
- Parameters:
result (TestStatResult) – Must contain
q_asimovfor \(\sigma\) extraction.- Returns:
ExpectedBands with (pnull, palt) at each sigma level, or None if q_asimov is missing.
- Return type:
ExpectedBands | None
- class everwillow.hypotest.distributions.QTildeAsymptotic[source]
Bases:
DistributionAsymptotic distribution for \(\tilde{q}_\mu\) (upper limit with physical bound, Eq. 64).
Used with the \(\tilde{q}_\mu\) test statistic for hypothesis testing with the physical constraint \(\mu \geq 0\). The CDF is piecewise at \(\tilde{q} = \mu^2/\sigma^2 = q_\text{asimov}\).
- cdf(q, mu, mu_prime, sigma)[source]
CDF: \(F(\tilde{q}_\mu \mid \mu')\) — piecewise at threshold \(\mu^2/\sigma^2\).
- Return type:
- null_pval(result)[source]
Null p-value (\(\mu' = \mu\)), where \(q_A = \mu^2/\sigma^2\).
\[\begin{split}p_{\mu'=\mu} = \begin{cases} 1 - \Phi(\sqrt{\tilde{q}}) & \text{if } \tilde{q} \leq q_A \\ 1 - \Phi\!\left(\frac{\tilde{q} + q_A}{2\sqrt{q_A}}\right) & \text{if } \tilde{q} > q_A \end{cases}\end{split}\]- Return type:
Array | None
- alt_pval(result)[source]
Alt p-value (\(\mu' = 0\)), where \(q_A = \mu^2/\sigma^2\).
\[\begin{split}p_{\mu'=0} = \begin{cases} 1 - \Phi(\sqrt{\tilde{q}} - \sqrt{q_A}) & \text{if } \tilde{q} \leq q_A \\ 1 - \Phi\!\left(\frac{\tilde{q} - q_A}{2\sqrt{q_A}}\right) & \text{if } \tilde{q} > q_A \end{cases}\end{split}\]- Return type:
Array | None
- expected_pvalues(result)[source]
Expected p-values at \(\pm N\sigma\) fluctuations under background-only.
At band \(N\), \(\hat{\mu} = N\sigma\), so the expected test statistic is (with \(q_A = \mu^2/\sigma^2\)):
\[\begin{split}\tilde{q}_\text{exp} = \begin{cases} \max(0,\; \mu/\sigma - N)^2 & \text{if } N \geq 0 \\ (\mu/\sigma)^2 - 2(\mu/\sigma)\,N & \text{if } N < 0 \end{cases}\end{split}\]- Parameters:
result (TestStatResult) – Must contain
q_asimovfor \(\sigma\) extraction.- Returns:
ExpectedBands with (pnull, palt) at each sigma level, or None if q_asimov is missing.
- Return type:
ExpectedBands | None
- class everwillow.hypotest.distributions.EmpiricalDistribution(q_null, q_alt=None)[source]
Bases:
DistributionBase class for distributions built from toy test statistics.
Stores the raw test statistic arrays from toy generation and provides the
from_toysfactory method. Subclass this and overridenull_pval/alt_pvalto implement custom p-value computation methods (e.g. KDE smoothing, tail extrapolation).- q_null
Test statistics under the tested hypothesis (poi_test).
- Type:
- q_alt
Test statistics under the alternative hypothesis (poi_alt). None if poi_alt was not provided to the ToyGenerator.
- Type:
jax.Array | None
- classmethod from_toys(toys)[source]
Construct from a ToyResult.
- Parameters:
toys (ToyResult) – Raw toy generation output containing q_null and optionally q_alt.
- Returns:
An instance of this distribution class.
- Return type:
EmpiricalDistribution
- class everwillow.hypotest.distributions.SimpleEmpiricalDistribution(q_null, q_alt=None)[source]
Bases:
EmpiricalDistributionEmpirical p-values via simple tail counting.
\(p_\text{null} = \text{fraction of } q_\text{null} \geq q_\text{obs}\), \(p_\text{alt} = \text{fraction of } q_\text{alt} \geq q_\text{obs}\).
- null_pval(result)[source]
Empirical p-value under tested hypothesis: fraction of \(q_\text{null} \geq q_\text{obs}\).
- Return type:
- alt_pval(result)[source]
Empirical p-value under alternative: fraction of \(q_\text{alt} \geq q_\text{obs}\).
Returns None if q_alt was not provided (no alternative toys generated).
- Return type:
Array | None
- expected_pvalues(result)[source]
Compute expected p-values at standard sigma bands using toy quantiles.
Uses quantiles of q_alt at standard sigma percentiles as synthetic test statistic values, then evaluates empirical p-values at each via
_build_expected_bands.- Parameters:
result (TestStatResult) – Test statistic result (used as template for synthetic results).
- Returns:
ExpectedBands with empirical p-values at each sigma level.
- Raises:
ValueError – If q_alt is None (no alternative toys generated).
- Return type:
ExpectedBands
Calculators#
Hypothesis test calculators.
- class everwillow.hypotest.calculators.HypoTestCalculator(nll_fn, params, observation, poi_key, test_statistic=<factory>, distribution=<factory>)[source]
Bases:
ModuleGeneric hypothesis test calculator.
Orchestrates hypothesis testing by: 1. Computing the test statistic on observed data 2. Delegating p-value computation to a Distribution object
The calculator stores all model-specific arguments at construction time, so
test(poi_test)only takes the varying parameter. Additional keyword arguments totest()are forwarded to the test statistic.- nll_fn
Negative log-likelihood function taking (params, observation).
- Type:
Callable[[jaxtyping.PyTree, jaxtyping.PyTree], float]
- params
Initial parameter state.
- observation
Observed data passed to nll_fn.
- Type:
jaxtyping.PyTree
- test_statistic
Test statistic to use. Defaults to QTilde.
- Type:
everwillow._src.inference.hypotest.test_statistics.TestStatistic
- distribution
Distribution for p-value computation. Defaults to QTildeAsymptotic.
- Type:
everwillow._src.inference.hypotest.distributions.Distribution
- test(poi_test, **kwargs)[source]
Run hypothesis test.
- cls(result)[source]
Compute CLs = pnull / palt from a hypothesis test result.
- Parameters:
result (HypoTestResult) – HypoTestResult from
test().- Returns:
CLs value, or None if either p-value is None.
- Return type:
Array | None
- expected(result)[source]
Compute expected p-values at standard sigma bands.
Delegates to the distribution’s
expected_pvaluesmethod.- Parameters:
result (HypoTestResult) – HypoTestResult from
test().- Returns:
ExpectedBands with p-values at each sigma level.
- Raises:
NotImplementedError – If the distribution does not support expected p-value computation.
- Return type:
ExpectedBands | None
- class everwillow.hypotest.calculators.AsymptoticCalculator(nll_fn, params, observation, poi_key, test_statistic=<factory>, distribution=<factory>, predict_fn=None, mu_asimov=0.0, asimov_observation=None)[source]
Bases:
HypoTestCalculatorCalculator for Cowan et al. asymptotic hypothesis tests.
Extends
HypoTestCalculatorwith Asimov dataset configuration. These fields are injected into the test statistic call automatically bytest().The Asimov dataset can be provided in two ways:
Pre-computed: pass
asimov_observationdirectly. This is useful when the Asimov dataset is expensive to generate or when the model prediction function is not available (e.g. combined models with multiple observation channels).On-the-fly: pass
predict_fnandmu_asimov. The Asimov dataset is generated at eachtest()call by setting the POI tomu_asimovand callingpredict_fn.
When both are provided,
asimov_observationtakes precedence andpredict_fn/mu_asimovare ignored.Example
>>> calc = AsymptoticCalculator( ... nll_fn=nll_fn, params=params, observation=observed, ... poi_key="mu", predict_fn=my_predict_fn, ... ) >>> result = calc.test(poi_test=1.0)
- predict_fn
Function mapping parameter state to expected observation. Used to create the Asimov dataset at
mu_asimov.- Type:
Callable[[everwillow._src.statelib.state.State], jaxtyping.PyTree] | None
- mu_asimov
POI value for Asimov dataset generation. Defaults to 0.0 (background-only, for exclusion tests). Use 1.0 for discovery tests.
- Type:
- asimov_observation
Pre-computed Asimov dataset. When provided, this is used directly instead of generating one via
predict_fn/mu_asimov.- Type:
jaxtyping.PyTree | None
- test(poi_test, **kwargs)[source]
Run asymptotic hypothesis test.
Injects
predict_fn,mu_asimov, andasimov_observationfrom init fields, unless overridden in kwargs.
Results#
Result containers for hypothesis testing.
- class everwillow.hypotest.results.TestStatResult(value, test, q_asimov=None, extras=<factory>)[source]
Bases:
ModuleResult of computing a test statistic.
- value
Test statistic value.
- Type:
- test
POI value being tested (\(\mu\)).
- Type:
- q_asimov
Test statistic evaluated on Asimov data. None if not computed.
- Type:
jax.Array | None
- class everwillow.hypotest.results.ToyResult(q_null, q_alt=None)[source]
Bases:
ModuleRaw output from toy generation.
Contains the test statistic arrays under both hypotheses, decoupled from any particular p-value computation method.
- q_null
Test statistic values under the tested hypothesis (poi_test).
- Type:
- q_alt
Test statistic values under the alternative hypothesis (poi_alt). None if poi_alt was not provided to the ToyGenerator.
- Type:
jax.Array | None
- class everwillow.hypotest.results.BandValues(minus_2sigma, minus_1sigma, median, plus_1sigma, plus_2sigma)[source]
Bases:
ModuleScalar values at standard \(\pm N\sigma\) fluctuation bands.
Supports iteration via
for name, value in bv, indexing viabv["median"], andlen(bv) == 5.dict(bv)produces a{name: value}mapping, andBandValues(**dict(bv))roundtrips.- minus_2sigma
Value at \(-2\sigma\) fluctuation.
- Type:
- minus_1sigma
Value at \(-1\sigma\) fluctuation.
- Type:
- median
Value at median (\(0\sigma\)).
- Type:
- plus_1sigma
Value at \(+1\sigma\) fluctuation.
- Type:
- plus_2sigma
Value at \(+2\sigma\) fluctuation.
- Type:
- class everwillow.hypotest.results.ExpectedBands(null_pvalue, alt_pvalue, cl_s, null_sig, alt_sig)[source]
Bases:
ModuleExpected quantities at standard sigma bands.
All derived quantities (CLs, significance) are eagerly computed at construction time so access is a simple attribute lookup.
- null_pvalue
p-value under null hypothesis (\(p_\mu\)) at each band.
- Type:
everwillow._src.inference.hypotest.results.BandValues
- alt_pvalue
p-value under alternative hypothesis (\(\text{CL}_b\)) at each band.
- Type:
everwillow._src.inference.hypotest.results.BandValues
- cl_s
\(\text{CL}_s = p_\text{null}/p_\text{alt}\) at each band.
- Type:
everwillow._src.inference.hypotest.results.BandValues
- null_sig
Null significance \(\Phi^{-1}(1 - p_\text{null})\) at each band.
- Type:
everwillow._src.inference.hypotest.results.BandValues
- alt_sig
Alternative significance \(\Phi^{-1}(1 - p_\text{alt})\) at each band.
- Type:
everwillow._src.inference.hypotest.results.BandValues
- class everwillow.hypotest.results.HypoTestResult(q_obs, pnull, palt, test_stat_result)[source]
Bases:
ModuleResult of a hypothesis test.
- q_obs
Observed test statistic value.
- Type:
- pnull
p-value under the tested hypothesis (poi_test).
- Type:
jax.Array | None
- palt
p-value under the alternative hypothesis (poi_alt / background-only).
- Type:
jax.Array | None
- test_stat_result
Full test statistic result with fit information.
- Type:
everwillow._src.inference.hypotest.results.TestStatResult
Toys#
Toy generation for hypothesis testing.
- class everwillow.hypotest.toys.ToyGenerator(test_statistic, ntoys=1000, map_fn=<function vmap>)[source]
Bases:
ModuleGenerates toy experiments for hypothesis testing.
Creates toys under the null hypothesis (poi_null) and optionally under an alternative hypothesis (poi_alt). Returns a ToyResult with raw test statistic arrays that can be fed into any EmpiricalDistribution subclass for p-value computation.
- test_statistic
Test statistic to compute for each toy.
- Type:
everwillow._src.inference.hypotest.test_statistics.TestStatistic
- ntoys
Number of toys per hypothesis. Defaults to 1000.
- Type:
- map_fn
Function that maps a scalar function over an array of keys. Defaults to
jax.vmap. Replace with e.g.lambda fn: partial(jax.lax.map, fn, batch_size=8)to process toys in groups instead of all at once, or a Python loop for step-through debugging.- Type:
Callable
Example
>>> toy_gen = ToyGenerator(test_statistic=QTilde(), ntoys=10000) >>> toys = toy_gen.generate( ... nll_fn, params, observed, "mu", 1.0, ... poi_alt=0.0, ... key=jax.random.key(42), ... predict_fn=my_predict_fn, ... ) >>> # Choose how to interpret the toys (open-world) >>> dist = SimpleEmpiricalDistribution.from_toys(toys) >>> # Use with HypoTestCalculator >>> calc = HypoTestCalculator( ... nll_fn=nll_fn, ... params=params, ... observation=observed, ... poi_key="mu", ... test_statistic=QTilde(), ... distribution=dist, ... ) >>> result = calc.test(1.0)
- generate(nll_fn, params, observation, poi_key, poi_null, *, poi_alt=None, key, sample_fn=None, predict_fn=None, **fit_kwargs)[source]
Generate toys and return raw test statistic arrays.
- Parameters:
nll_fn (Callable[[PyTree, PyTree], float]) – Negative log-likelihood function taking (params, observation).
params (State) – Initial parameter state.
observation (PyTree) – Observed data (used to profile nuisance parameters).
poi_key (str | tuple[str | int, ...]) – Canonical key for the parameter of interest, e.g. “mu”.
poi_null (float) – Null hypothesis POI value. Toys generated under this hypothesis populate q_null. The test statistic is evaluated at this value for each toy.
poi_alt (float | None) – Alternative hypothesis POI value. If provided, toys are generated under both hypotheses. If None, only null toys are generated and q_alt will be None in the result. For exclusion tests, typically 0.0. For discovery, typically 1.0.
key (Key[Array, ''] | UInt32[Array, '2']) – JAX PRNG key for reproducibility.
sample_fn (Callable[[State, Key[Array, ''] | UInt32[Array, '2']], PyTree] | None) – Function to generate toy data. Called as sample_fn(params_state, key) -> toy_observation. If None, a default Poisson sampler is created using predict_fn.
predict_fn (Callable[[State], PyTree] | None) – Function returning expected observation given parameters. Used to create default Poisson sampler if sample_fn is None.
**fit_kwargs (Any) – Additional arguments passed to fit().
- Returns:
ToyResult with q_null (always) and q_alt (if poi_alt provided).
- Raises:
ValueError – If neither sample_fn nor predict_fn is provided.
- Return type:
ToyResult
Upper Limits#
Upper limit finding via root search.
- everwillow.hypotest.upper_limit.expected_upper_limit(band_objective_fn, bounds, level=0.05, **solver_kwargs)[source]
Find expected upper limits at each sigma band.
Calls
upper_limit()five times — once per band — extracting the corresponding scalar from theBandValuesreturned byband_objective_fn.- Parameters:
band_objective_fn (Callable[[float], BandValues]) – Function mapping POI value to
BandValuesof the objective quantity (e.g. expected CLs at each sigma band).bounds (tuple[float, float]) – (lower, upper) search range for POI value.
level (float) – Target level (default 0.05 for 95% CL).
**solver_kwargs (Any) – Additional arguments passed to
upper_limit()(e.g.,rtol,atol,max_steps).
- Returns:
BandValues where each entry is the upper limit at that sigma band.
- Return type:
BandValues
Example
>>> def band_cls_objective(poi): ... result = calc.test(poi) ... bands = calc.expected(result) ... return bands.cl_s >>> brazil = expected_upper_limit(band_cls_objective, bounds=(0, 5)) >>> for name, val in brazil: ... print(f" {name}: {float(val):.4f}")
- everwillow.hypotest.upper_limit.upper_limit(objective_fn, bounds, level=0.05, *, solver=None, rtol=0.0001, atol=1e-06, max_steps=100)[source]
Find POI value where objective function equals target level.
Uses root-finding to find where objective_fn(poi) = level. Pure JAX implementation via optimistix, fully JIT-compatible.
This is a generic root finder - the user composes the objective function to implement their desired exclusion criterion (CLs, p_alt, etc.).
- Parameters:
objective_fn (Callable[[float], Array]) – Function mapping POI value to quantity of interest. Must be JAX-traceable (no float() calls on traced values). Should be monotonic within bounds for reliable convergence.
bounds (tuple[float, float]) – (lower, upper) search range for POI value.
level (float) – Target value for the objective function (default 0.05).
solver (AbstractRootFinder | None) – Root-finding solver to use. Defaults to optx.Bisection.
rtol (float) – Relative tolerance for convergence (used by default solver).
atol (float) – Absolute tolerance for convergence (used by default solver).
max_steps (int) – Maximum iterations.
- Returns:
POI value where objective_fn(poi) = level.
- Return type:
Note
The objective function is JIT-compiled by optimistix. Avoid calling float() or other Python operations that break JAX tracing.
Examples
>>> # Find where CLs = 0.05 (95% CL upper limit) >>> def cls_objective(poi): ... return calc.cls(calc.test(poi)) >>> limit = upper_limit(cls_objective, bounds=(0, 5), level=0.05)
>>> # With custom solver >>> solver = optx.Newton(rtol=1e-5, atol=1e-5) >>> limit = upper_limit(cls_objective, bounds=(0, 5), solver=solver)
- everwillow.hypotest.upper_limit.upper_limit_scan(objective_fn, scan, level=0.05)[source]
Find POI value where objective equals target level via grid scan.
Evaluates the objective function on a grid of POI values, then interpolates to find where it crosses the target level. Fully JIT-compatible via jax.vmap and jnp.interp.
This is useful when: - The objective function is expensive and you want to reuse evaluations - You need to visualize the objective curve - Root-finding fails due to non-monotonicity
- Parameters:
objective_fn (Callable[[float], Array]) – Function mapping POI value to quantity of interest. Must be JAX-traceable.
scan (Array) – Array of POI values to evaluate. Should be monotonically increasing and span the expected limit location.
level (float) – Target value for the objective function (default 0.05).
- Returns:
POI value where objective_fn(poi) = level, found by interpolation.
- Return type:
Note
The accuracy depends on the density of scan points near the crossing. For CLs limits, the objective typically decreases as POI increases.
Examples
>>> # Scan CLs on a grid >>> scan = jnp.linspace(0, 2, 50) >>> limit = upper_limit_scan( ... lambda poi: calc.cls(calc.test(poi)), ... scan, ... level=0.05, ... )
>>> # With finer grid near expected limit >>> scan = jnp.concatenate([ ... jnp.linspace(0, 0.5, 10), ... jnp.linspace(0.5, 1.5, 30), ... jnp.linspace(1.5, 3, 10), ... ]) >>> limit = upper_limit_scan(cls_objective, scan, level=0.05)
- everwillow.hypotest.upper_limit.upper_limit_toys(objective_fn, bounds, key, level=0.05, *, tol=0.01, max_iterations=100)[source]
Find POI value where stochastic objective equals target level.
Uses bisection search with fresh PRNG keys at each iteration. Pure JAX implementation via jax.lax.while_loop, fully JIT-compatible.
- Parameters:
objective_fn (Callable[[float, Key[Array, ''] | UInt32[Array, '2']], Array]) – Function mapping (poi, key) to quantity of interest. Should be monotonic (typically decreasing) as POI increases. Must be JAX-traceable (no float() calls on traced values).
bounds (tuple[float, float]) – (lower, upper) search range for POI value.
key (Key[Array, ''] | UInt32[Array, '2']) – JAX PRNG key for reproducibility.
level (float) – Target value for the objective function (default 0.05).
tol (float) – Stop when objective is within tol of level (default 1e-2).
max_iterations (int) – Maximum bisection iterations (default 100).
- Returns:
POI value where objective_fn(poi, key) ≈ level.
- Return type:
Note
The result has statistical uncertainty from Monte Carlo sampling. The tolerance should account for this.
Examples
>>> # CLs-based upper limit with toys >>> def cls_objective(poi, key): ... result = toy_calc( ... nll_fn, params, "mu", poi, ... sample_fn=sample_fn, nll_factory=nll_factory, key=key ... ) ... return result.cl_s >>> limit = upper_limit_toys(cls_objective, bounds=(0, 5), key=key)
Utilities#
Utilities for hypothesis testing.
- everwillow.hypotest.utils.cl_s(pnull, palt)[source]
Compute \(\text{CL}_s = p_\text{null} / p_\text{alt}\).
\(\text{CL}_s = P(q \geq q_\text{obs} \mid \text{signal+background}) / P(q \geq q_\text{obs} \mid \text{background})\)
The CLs method protects against excluding signal hypotheses when there is no sensitivity: if palt is small (background also finds data unlikely), CLs stays large.
- everwillow.hypotest.utils.constrained_fit(nll_fn, params, observation, poi_fixed, **fit_kwargs)[source]
Perform constrained fit, merging POI constraint with user-fixed params.
Merges
poi_fixed(the POI constraint from the test statistic) with any user-specifiedfixedparams infit_kwargs. When all parameters end up fixed, the NLL is evaluated directly without running the optimizer.- Parameters:
nll_fn (Callable[[PyTree, PyTree], float]) – Negative log-likelihood function taking (params, observation).
params (State) – Initial parameter state.
observation (PyTree) – Observed data passed to nll_fn.
poi_fixed (State) – State specifying the POI value to constrain.
**fit_kwargs (Any) – Additional arguments passed to fit(). If
fixedis present, it is merged withpoi_fixed(poi_fixedwins on overlapping keys).
- Returns:
FitResult with fitted parameters and NLL value.
- Return type:
- everwillow.hypotest.utils.make_asimov(predict_fn, params, poi_key, mu_asimov)[source]
Generate an Asimov dataset at a given POI value.
Sets the POI to
mu_asimovin the parameter state and callspredict_fnto produce the expected observation.- Parameters:
predict_fn (Callable[[State], PyTree]) – Function mapping parameter state to expected observation.
params (State) – Parameter state (used as template).
poi_key (str | tuple[str | int, ...]) – Canonical key for the parameter of interest, e.g. “mu”.
mu_asimov (float) – POI value at which to generate the Asimov dataset.
- Returns:
Expected observation (Asimov dataset).
- Return type:
PyTree
- everwillow.hypotest.utils.sigma_from_asimov(mu, q_asimov, mu_asimov=0.0)[source]
Extract \(\sigma\) (uncertainty on \(\hat{\mu}\)) from an Asimov test statistic.
Uses the relation \(t_{\mu,A} \approx (\mu - \mu')^2/\sigma^2\) to solve for \(\sigma\).
- Parameters:
- Returns:
Estimated \(\sigma = |\mu - \mu_\text{asimov}| / \sqrt{q_\text{asimov}}\).
- Return type: