I’m seeking some wisdom on how to construct a hierarchical Bayesian model. The project is too complex to be useful to include here asis. I could possibly craft a minimal toy example, but the simplicity of that would drive thinking in a direction inappropriate for the complex case. So I hope discussing it conceptually will suffice.
The project uses a dataset with over 10 features, plus the physical observable that we ultimately want to predict, for each of 100s of different physical sites.
Overview

The first stage of analysis is to develop a global model (loglinear glm with negative binomial error structure) on all data to determine global coefficients for each of the features.

In the second stage, these global coefficients are then applied to each row (site) of the dataset, and a second round of modelling is performed to determine sitespecific factors that account for hidden structural drivers of systematic deviation from the global model.

Finally, predictions of the physical observable are inferred for each site.
The loglinear glm of stage 1 produces posteriors for each of the coefficients. Applying these coefficient distributions to any individual physical site gives a distribution of global model "mean" observable at that site. The subsequent modelling of deviations in the observable from this "global mean" should most certainly include this entire distribution.
I think the most MCMCic way to do this would be for the pymc3 modeller to draw a different sample from the globalmodel trace with each step it tests/takes in the the secondstage modelling process. This could perhaps be done by:
 define a uniform distribution, e.g. between 0 and 4999 if the global model was run with 5000 increments,
 apply the coefficients from the trace increment drawn in step 1 to a physical site
 model the deviations to infer posterior distributions in sitespecific parameters.
For this to work, though, I need pymc3 model to sample from the uniform distribution "prior" while leaving it out of the energy calculation. In a sense, I need it to be "deterministic", but in the sense that it’s predefined and fixed, and should not be part of the parameter space that the model assumes should be "narrowed" into a collection of posteriors.
Or is that a naïve approach? Is it only two stages in concept, and a hierarchical Bayesian model like this should always be modelled in one stage, even if that creates a significantly more complex parameter space? This seems to be the case in all the simple examples I can find, but I don’t know if it has to be this way for more complex cases. And I’m not sure it’s possible to impose defined error structures on only part of the parameter space. If it were indeed possible to impose that error structure, I could use stage 1 to define good priors for the global subset of parameters in stage 2, and presumably the posteriors for those global parameters wouldn’t change much. But that seems sloppy.
I could also, of course, use bruteforce sampling of the traces in the second stage (sitespecific) modelling, but that would be slow and inefficient, and, I would think, antithetical to the MCMC approach.
I would be grateful for any guidance!
Source: Python Questions