Suppose we have a prior probability $P(\theta)$, and we have a observation from the prior probability that the likelihood is $L(\theta|x) = P(x| \theta)$, thus, according to the Bayesian probability theory, the updated, posterior probability is

$\displaystyle P(\theta |x) = \frac{P(x|\theta) P(\theta) }{ \int P(x|\theta') P(\theta') d\theta' = P(x)}$

Here, $P(\theta|x), P(\theta)$ are called conjugate distribution. $P(\theta)$ is called conjugate prior to the likelihood $P(x|theta)$.

suppose we have no knowledge on the prior probability, it is a fair assumption that $P(\theta) = 1$ or uniform, then, the posterior probability is proportional to the likelihood function, i.e.

$P(\theta | x ) \propto L(\theta | x) = P(x|\theta)$.

Now, suppose we know the prior probability, after updated with new information, if the prior probability is “stable”, the new (or posterior) probability should have the similar functional form as the odd (prior) probability!

When the likelihood function is Binomial distribution, it is found that the beta distribution is the “eigen” distribution that is unchanged after update.

$\displaystyle P(\theta | a, b) = \frac{\theta^{(a-1)} (1-\theta)^{(b-1)}}{Beta(a,b) }$

where $Beta(a,b)$ is the beta function, which served as the normalization factor.

After $s$ success trails and $r$ failure trial, the posterior is

$\displaystyle P(\theta | s, r) = \frac{\theta^{(s+a-1)} (1-\theta)^{(r+b-1)}}{Beta(s+a,r+b) }$

When $a = b = 1 \implies Beta(1,1) = 1$, the posterior probability is reduced to binomial distribution and equal to the Likelihood function.

It is interesting to write the Bayesian equation in a “complete” form

$\displaystyle P(\theta| s + a , r + b) \propto P(s + a , r+ b | \theta) P(\theta | a, b)$

Unfortunately, the beta distribution is undefined for $a = 0 || b = 0$, therefore, when no “prior” trials was taken, there is no “eigen” probability.

Remark: this topic is strongly related to the Laplace rule of succession.

Update: the reason for same functional form of prior and posterior is that the inference of mean, variant is more “consistence”.