Consider Bayesian Posterior from a diffusion perspective
Published:
If one looks very carefully at how diffusion models equation is written down. It is very clear that there is connection between Bayesian Posterior and current diffusion Model. In Bayesian Statistics, we have $\theta \sim \pi(\theta)$ and ${\bf X} = {x_1,\cdots x_T} \sim f(x\mid \theta)$
$ f(\theta \mid {\bf X}) \propto f({\bf X} \mid \theta )\pi(\theta) = \prod_{t=1}^T f(X_t\mid X_{t-1}\cdots X_1, \theta) \pi(\theta)$
Now, use some mathematical tricks,
$\prod_{t=1}^T f(X_t\mid X_{t-1}\cdots X_1, \theta) \pi(\theta) = f(X_T\mid X_{T-1}\cdots X_1, \theta) \prod_{t=3}^T f(X_t\mid X_{t-2}\cdots X_1, \theta) \pi(\theta) \propto f(\theta \mid X_{T-1}\cdots X_1) f(X_T\mid X_{T-1}\cdots X_1, \theta)$
This gives us the key result that one can update Bayesian Posterior one data point at a time. Now, lets revert this process in the spirit of diffusion models.
Now, we denote $\hat{\theta}T \sim f(\theta \mid X{1:T})$. We can approximate
$\hat{\theta}{T-1} \approx \alpha(T) \hat{\theta}_T + (1-\alpha(T)) \epsilon $ where $\epsilon \sim \pi(\theta)$ and one can learn $\epsilon = \hat{\theta}{T-1} - \hat{\theta}_T$ using ideas from diffusion. Indeed, this gives us a very good use case for Diffusion Methods if we want to do some hard to sample posterior sampling procedure.
Reference
Chung H, Kim J, Mccann M T, et al. Diffusion Posterior Sampling for General Noisy Inverse Problems[C]//The Eleventh International Conference on Learning Representations. 2022.
Gupta S, Jalal A, Parulekar A, et al. Diffusion Posterior Sampling is Computationally Intractable[J]. arXiv preprint arXiv:2402.12727, 2024. Wu L, Trippe B, Naesseth C, et al. Practical and asymptotically exact conditional sampling in diffusion models[J]. Advances in Neural Information Processing Systems, 2024, 36.