## Parseval theorem

When a function $f(x)$ can be expressed as a linear combination of a orthogonal basis $\phi_n(x)$, i.e.

$\displaystyle f(x) = \sum_n a_n \phi_n(x)$

$\displaystyle \langle \phi_n|\phi_m \rangle = \delta_{nm}$

then, the integration

$\displaystyle \int |f(x)|^2 dx = \sum_n |a_n|^2 \langle \phi_n|\phi_n \rangle = \sum_n |a_n|^2$

That is.

Using this theorem, many complicated integration can be calculated as a sum.

## Solving general eigenvalue problem

Given a matrix $H$ we can find it eigen value on a given basis set $|\phi_i\rangle$. Suppose the eigen vector is

$|\psi \rangle = \sum_i \alpha_i |\phi_i\rangle$

Put in the eigen equation

$H|\psi\rangle = \epsilon |\psi\rangle \implies \sum_i H|\phi_i\rangle \alpha_i = \epsilon \sum_i |\phi_i\rangle \alpha_i$

We can act $\langle \phi_k|$ on the left, but in general, the basis set are not orthogonal.

$H_{ij} \alpha_j = \epsilon S_{ij} \alpha_j \to H\cdot \alpha = \epsilon S\cdot \alpha$

$H_{ij} = \langle \phi_i|H|\phi_j\rangle , S_{ij} = \langle \phi_i|\phi_j\rangle$

This is the General Eigen Value Problem.

One way to solve the problem, is the “reconfigure” the basis so that it is orthogonal. However, in computation, non-orthogonal basis could give supreme advantage. So, the other way is split the problem. First solve the $S\cdot v = \sigma v$,

The eigen system of $S$ is

$S = P^T \cdot \sigma \cdot P, P\cdot P^T = P^T \cdot P = I$

Here, $\sigma$ is a diagonal matrix of eigen value. Now, we define a new non-unitary matrix

$\displaystyle R_{ij} = \frac{P_{ij}}{\sqrt{\sigma_j}}$

Notices that $R^T \neq R^{-1}$

Thus,

$R^T \cdot S\cdot R = I \implies S = R^{-T} \cdot R^{-1}$

We know that, the form $R^T \cdot Q \cdot R$ is a transform that from one basis to another basis, i.e.

$|\hat{\phi}_i \rangle = R|\phi_i\rangle$

$|\psi \rangle = \sum_i \alpha_i |\phi_i\rangle = \sum_i \alpha R^{-1} |\hat{\phi}_i\rangle = \sum_i \beta_i |\hat{\phi}_i \rangle \implies \alpha = R\cdot \beta$

and for any operator,

$\hat{Q} = R^T\cdot Q\cdot R$

We put this back to the general problem

$H\cdot\alpha = \epsilon S\cdot \alpha = \epsilon R^{-T} \cdot R^{-1} \alpha$

$\implies R^T\cdot H \cdot \alpha = \epsilon R^{-1} \alpha$

$\implies \hat{H}\cdot \beta = R^T\cdot H \cdot R \cdot \beta = \epsilon \beta$

Thus, we can solve the $\hat{H}$, get the eigen system, then use $R\cdot \beta = \alpha$

For example,

$\displaystyle \phi_1 = (1,0), \phi_2 = \frac{1}{\sqrt{2}}(1,1)$

$S = \begin{pmatrix} 1 & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & 1 \end{pmatrix}$

The eigen system is

$P = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & -1 \\ 1 & 1 \end{pmatrix}$

$\sigma = \begin{pmatrix} 1 + \frac{1}{\sqrt{2}} & 0 \\ 0 & 1 - \frac{1}{\sqrt{2}} \end{pmatrix}$

The matrix $R$ is

$R = \begin{pmatrix} \frac{1}{\sqrt{2+\sqrt{2}}} & -\frac{1}{\sqrt{2-\sqrt{2}}} \\ \frac{1}{\sqrt{2+\sqrt{2}}} & \frac{1}{\sqrt{2-\sqrt{2}}} \end{pmatrix}$

We can verify that

$R^T \cdot S \cdot R = I$

$R^{-T} \cdot R^{-1} = S$

$R^T \cdot R = \begin{pmatrix} \frac{1}{2}(3-\sqrt{5}) & 0 \\ 0 & \frac{1}{2}(3+\sqrt{5}) \end{pmatrix}$

$R \cdot R^T = \begin{pmatrix} 2 & -1 \\ -1 & 1 \end{pmatrix}$

Let a Hamilton is

$H = \begin{pmatrix} 2 & 0 \\ 0 & 1 \end{pmatrix}$

The Hamiltonian in the basis is

$H_S = [H_S]_{ij} = \phi_i \cdot H\cdot \phi_j = \begin{pmatrix} 2 & 2 \\ 2 & 3 \end{pmatrix}$

The eigen values in the S frame are

$\displaystyle \sigma_s = \frac{1}{2}(5 \pm \sqrt{17})$

which is not the correct eigen values.

Now, we transform the Hamiltonian into orthogonal basis

$\displaystyle H_R = R^T \cdot H_S \cdot R = \begin{pmatrix} \frac{15+\sqrt{5}}{10} & -\frac{1}{\sqrt{5}} \\ -\frac{1}{\sqrt{5}} & \frac{3}{2} - \frac{1}{2\sqrt{5}} \end{pmatrix}$

The eigen values are

$\sigma_\pm = 2, 1$

where are the correct pair. The un-normalized eigen vector are

$\displaystyle \beta_{\pm} = \left(\frac{1}{2}(-1 \pm \sqrt{5}), 1\right)$

We can verify that

$\alpha_{\pm} = R\cdot \beta_{\pm}$

$H_S \cdot \alpha_\pm = \sigma_\pm S\cdot \alpha_\pm$

## Conjugate Prior

Suppose we have a prior probability $P(\theta)$, and we have a observation from the prior probability that the likelihood is $L(\theta|x) = P(x| \theta)$, thus, according to the Bayesian probability theory, the updated, posterior probability is

$\displaystyle P(\theta |x) = \frac{P(x|\theta) P(\theta) }{ \int P(x|\theta') P(\theta') d\theta' = P(x)}$

Here, $P(\theta|x), P(\theta)$ are called conjugate distribution. $P(\theta)$ is called conjugate prior to the likelihood $P(x|theta)$.

suppose we have no knowledge on the prior probability, it is a fair assumption that $P(\theta) = 1$ or uniform, then, the posterior probability is proportional to the likelihood function, i.e.

$P(\theta | x ) \propto L(\theta | x) = P(x|\theta)$.

Now, suppose we know the prior probability, after updated with new information, if the prior probability is “stable”, the new (or posterior) probability should have the similar functional form as the odd (prior) probability!

When the likelihood function is Binomial distribution, it is found that the beta distribution is the “eigen” distribution that is unchanged after update.

$\displaystyle P(\theta | a, b) = \frac{\theta^{(a-1)} (1-\theta)^{(b-1)}}{Beta(a,b) }$

where $Beta(a,b)$ is the beta function, which served as the normalization factor.

After $s$ success trails and $r$ failure trial, the posterior is

$\displaystyle P(\theta | s, r) = \frac{\theta^{(s+a-1)} (1-\theta)^{(r+b-1)}}{Beta(s+a,r+b) }$

When $a = b = 1 \implies Beta(1,1) = 1$, the posterior probability is reduced to binomial distribution and equal to the Likelihood function.

It is interesting to write the Bayesian equation in a “complete” form

$\displaystyle P(\theta| s + a , r + b) \propto P(s + a , r+ b | \theta) P(\theta | a, b)$

Unfortunately, the beta distribution is undefined for $a = 0 || b = 0$, therefore, when no “prior” trials was taken, there is no “eigen” probability.

Remark: this topic is strongly related to the Laplace rule of succession.

Update: the reason for same functional form of prior and posterior is that the inference of mean, variant is more “consistence”.

## Probability Limit of a null experiment

Suppose we want to measure the probability of tail of a coin. And found that after 100 toss, no tail is appear, with is the upper limit of the probability of tail?

Suppose we have a coin, we toss it for 1 time, it show head, obviously, we cannot say the coin is biased or not, because the probability base on the result has not much information. So, we toss it 2 times, it shows head again, and if we toss it 10 time, it shows 10 heads. So, we are quite confident that the coin is biased to head, but how biased it is?

Or, we ask, what is the likelihood of the probability of head?

The probability of having $r$ head in $n$ toss, given that the probability of head is $p$ is

$P( r \textrm{head} | p ) = C^n_r p^r (1-p)^{n-r}$

Now, we want to find out the “inverse”

$P( p = x | r \textrm{head} )$

This will generate a likelihood function.

$L(x | r \textrm{head}) = P( \textrm{head} | x ) = C^n_r x^r (1-x)^{n-r}$

The peak position of the likelihood function for binomial distribution is

$\displaystyle x_{\max} = \frac{r}{n}$

Since the distribution has a width, we can estimate the width by a given confident level.

From the graph, more the toss, the smaller of the width, and narrower of the uncertainly of the probability.

As the binomial distribution approach normal distribution when $n \to \infty$.

$\displaystyle C^n_r p^r(1-p)^n \to \frac{1}{\sqrt{2\pi \sigma}} \exp\left( - \frac{(r- \mu)^2}{2\sigma^2} \right)$

where $\mu = n p$ and $\sigma^2 = n p (1-p)$.

When $r = 0$, no head was shown in $n$ toss, the likelihood function is

$L(x | n, 0) = (1 - x )^n$

The Full-width Half-maximum is

$\displaystyle x_{\textrm{FWHM}} = 1- \left(\frac{1}{2}\right)^{\frac{1}{n}}$

We can assign this is a “rough” the upper limit for the probability of head.

## Covariant and Contravariant

These concepts are very confusing. I made a diagram to help.

$e, \hat{e}$ are two different basis or coordinate of a vector. The hat just an indicator for the difference. Many people use $e'$, but I feel it is confusing when the prime enters the transformation elements $\alpha$.

The Direct transform of a basis is

$\hat{e}_k = \alpha_{k}^{i} e_i, \hat{e} = T\cdot e$

The Inverse transform is

$e = T^{-1} \cdot \hat{e} \implies \alpha_{i}^k \alpha_{k}^{j} = \delta_{ij}$

Notice that the reciprocal basis transform can be different, $\beta \neq \alpha$.

$\beta^l_j = g^{lk} \alpha_k^i g_{ij}$

But for some cases they are equal, for example, if the basis rotated, the reciprocal basis also rotated by the same rotation.

A vector

$\vec{v} = v^i e_i = v_i e^i = v \cdot e$

In the change of basis,

$\vec{v} = v \cdot T^{-1} \cdot T \cdot e$

we can see the coordinate is always opposite to the change of basis. When the basis is under Direct transform, the coordinate is under Inverse transform, therefore, it is contravariant.

## on general quantum propagator

A general quantum propagator take the exponential form,

$D(\epsilon) = \exp(-i \epsilon P),$

where $\epsilon$ has a unit with $q$, which is a general coordinate, and $P$ is the Hermitian operator of a coordinate $p$, such that

$\displaystyle P |q \rangle = -i \frac{d}{dq} |q \rangle ,$

$\displaystyle p = \frac{dL}{d\dot{q}},$

where $L$ is the system Lagrangian.

I discuss this topic on a previous post. I found that there is not a strong reason that a unitary operator has to be exponential. I mean, it can, does not mean it must.

Why a general quantum propagator take the exponential form?

Consider the $P$ is a GENERATOR of movement, so that

$\displaystyle P |q \rangle = -i \frac{d}{dq} |q \rangle ,$

The above relation means that the amount or magnitude of the first order change of $q$ is $i P$.

Thus, the first order of the propagator is

$D^{(1)}(\epsilon) |q \rangle = |q + \epsilon \rangle = (1 + i\epsilon P )|q \rangle$

The higher order change, using Taylor series of $|q + \epsilon \rangle$ around $q$ , we can expect the complete form of the propagator should be taking exponential form.

This approach to understand this is kind of reverse, compare to the previous post.

The answer the that question is:

Since the canonical momentum is the magnitude of the first order change of the canonical position. According to Taylor series, the canonical translation operator must take an exponential form.

## Does the existence of Planck constant indicate that we are living in a computer simulation?

Every time I come across with approximation, I feel kind of reluctant and uncomfortable, especially in theory. For example, The rotation operator usually introduced using

$D(\delta\omega) = 1 + \delta\omega J_z$

And any higher order terms like $\delta\omega^n, n>1$ will be neglected. Claiming that those term are too small to consider.

What if, the real world has infinite digit to store an information? such that those higher order terms are not NEGLECTABLE at all.

We know that there are Planck scale, or Planck unit for almost all physical quantities, like Planck length, Planck time, Planck constant for energy and so on. Everything smaller than those unit are “undefine” or “not known”. Therefore, the higher order terms are safely be neglected as the $\delta$ is already “infinitesimal”.

This situation is similar to computer that we have finite digit to store a number, everything smaller than the smallest digit will be neglected. That causing floating point error. Therefore, in high precision calculation, the memory has to be very large.

These connection, make me wonder,

Does the existence of Planck constant indicate that we are living in a computer simulation?