Hartree method for Helium ground state

Leave a comment

After long preparation, I am ready to do this problem.

The two electron in the helium ground state occupy same spacial orbital but difference spin. Thus, the total wavefunction is

\displaystyle \Psi(x,y) = \frac{1}{\sqrt{2}}(\uparrow \downarrow - \downarrow \uparrow) \psi(x) \psi(y)

Since the Coulomb potential is spin-independent, the Hartree-Fock method reduce to Hartree method. The Hartree operator is

F(x) = H(x) + \langle \psi(y)|G(x,y) |\psi(y) \rangle

where the single-particle Hamiltonian and mutual interaction are

\displaystyle H(x) = -\frac{\hbar^2}{2m} \nabla^2 - \frac{Ze^2}{4\pi\epsilon_0 x} = -\frac{1}{2}\nabla^2 - \frac{Z}{x}

\displaystyle G(x,y) = \frac{e^2}{4\pi\epsilon_0|x-y|} = \frac{1}{|x-y|}

In the last step, we use atomic unit, such that \hbar = 1, m=1, e^2 = 4\pi\epsilon_0. And the energy is in unit of Hartree, 1 \textrm{H} = 27.2114 \textrm{eV}.

We are going to use Hydrogen-like orbital as a basis set.

\displaystyle b_i(r) = R_{nl}(r)Y_{lm}(\Omega) \\= \sqrt{\frac{(n-l-1)!Z}{n^2(n+l)!}}e^{-\frac{Z}{n}r} \left( \frac{2Z}{n}r \right)^{l+1} L_{n-l-1}^{2l+1}\left( \frac{2Z}{n} r \right) \frac{1}{r} Y_{lm}(\Omega)

I like the left the 1/r, because in the integration \int b^2 r^2 dr, the r^2 can be cancelled. Also, the i = nlm is a compact index of the orbital.

Using basis set expansion, we need to calculate the matrix elements of

\displaystyle H_{ij}=\langle b_i(x) |H(x)|b_j(x)\rangle = -\delta \frac{Z^2}{2n^2}

\displaystyle G_{ij}^{hk} = \langle b_i(x) b_h(y) |G(x,y) |b_j(x) b_k(y) \rangle

Now, we will concentrate on evaluate the mutual interaction integral.

Using the well-known expansion,

\displaystyle G(x,y) = \frac{1}{|x-y|}=\frac{1}{r_{12}} = \sum_{l=0}^{\infty} \sum_{m=-l}^{l} \frac{4\pi}{2l+1} \frac{r_<^l}{r_>^{l+1}} Y_{lm}^{*}(\Omega_1)Y_{lm}(\Omega_2)

The angular integral

\displaystyle \langle Y_i(x) Y_h(y)| Y_{lm}^{*}(x) Y_{lm}(y) | Y_j(x) Y_k(y) \rangle \\ = \big( \int Y_i^{*}(x) Y_{lm}^{*}(x) Y_j(x) dx \big) \big( \int Y_h^{*}(y) Y_{lm}(y) Y_k(y) dy \big)

where the integral \int dx = \int_{0}^{\pi} \int_{0}^{2\pi} \sin(\theta_x) d\theta_x d\phi_x .

From this post, the triplet integral of spherical harmonic is easy to compute.

\displaystyle \int Y_h^{*}(y) Y_{lm}(y) Y_k(y) dy = \sqrt{\frac{(2l+1)(2l_k+1)}{4\pi (2l_h+1)}} C_{l0l_k0}^{l_h0} C_{lm l_km_k}^{l_hm_h}

The Clebsch-Gordon coefficient imposed a restriction on l,m.

The radial part,

\displaystyle \langle R_i(x) R_h(y)| \frac{r_<^l}{r_>^{l+1}} | R_j(x) R_k(y) \rangle \\ = \int_0^{\infty} \int_{0}^{\infty} R_i(x) R_h(y) \frac{r_<^l}{r_>^{l+1}} R_j(x) R_k(y) y^2 x^2 dy dx \\ = \int_0^{\infty} R_i(x) R_j(x) \\ \left( \int_{0}^{x} R_h(y) R_k(y) \frac{y^l}{x^{l+1}} y^2dy  + \int_{x}^{\infty} R_h(x)R_k(x) \frac{x^l}{y^{l+1}}  y^2 dy   \right) x^2 dx

The algebraic calculation of the integral is complicated, but after the restriction of l from the Clebsch-Gordon coefficient, only few terms need to be calculated.

The general consideration is done. now, we use the first 2 even states as a basis set.

\displaystyle b_{1s}(r) = R_{10}(r)Y_{00}(\Omega) = 2Z^{3/2}e^{-Zr}Y_{00}(\Omega)

\displaystyle b_{2s}(r) = R_{20}(r)Y_{00}(\Omega) = \frac{1}{\sqrt{8}}Z^{3/2}(2-Zr)e^{-Zr/2}Y_{00}(\Omega)

These are both s-state orbital. Thus, the Clebsch-Gordon coefficient

\displaystyle C_{lm l_k m_k}^{l_h m_h} = C_{lm00}^{00}

The radial sum only has 1 term. And the mutual interaction becomes

\displaystyle G(x,y) = \frac{1}{|x-y|}=\frac{1}{r_{12}} = 4\pi \frac{1}{r_>} Y_{00}^{*}(\Omega_1)Y_{00}(\Omega_2)

The angular part

\displaystyle \langle Y_i(x) Y_h(y)| Y_{lm}^{*}(x) Y_{lm}(y) | Y_j(x) Y_k(y) \rangle = \frac{1}{4\pi}

Thus, the mutual interaction energy is

G_{ij}^{hk} = \displaystyle \langle b_i(x) b_h(y) |G(x,y) |b_j(x) b_k(y) \rangle = \langle R_i(x) R_h(y)| \frac{1}{r_>} | R_j(x) R_k(y) \rangle

The radial part

G_{ij}^{hk} = \displaystyle \langle R_i(x) R_h(y)| \frac{1}{r_>} | R_j(x) R_k(y) \rangle \\ \begin{pmatrix} G_{11}^{hk} & G_{12}^{hk} \\ G_{21}^{hk} & G_{22}^{hk} \end{pmatrix} = \begin{pmatrix} \begin{pmatrix} G_{11}^{11} & G_{11}^{12} \\ G_{11}^{21} & G_{11}^{22} \end{pmatrix} & \begin{pmatrix} G_{12}^{11} & G_{12}^{12} \\ G_{12}^{21} & G_{12}^{22} \end{pmatrix} \\ \begin{pmatrix} G_{21}^{11} & G_{21}^{12} \\ G_{21}^{21} & G_{21}^{22} \end{pmatrix} & \begin{pmatrix} G_{22}^{11} & G_{22}^{12} \\ G_{22}^{21} & G_{22}^{22} \end{pmatrix} \end{pmatrix} \\= \begin{pmatrix} \begin{pmatrix} 1.25 & 0.17871 \\ 0.17871 & 0.419753 \end{pmatrix} & \begin{pmatrix} 0.17871 & 0.0438957 \\ 0.0439857 & 0.0171633 \end{pmatrix} \\ \begin{pmatrix} 0.17871 & 0.0438957 \\ 0.0438957 & 0.0171633 \end{pmatrix} & \begin{pmatrix} 0.419753 & 0.0171633 \\ 0.0171633 & 0.300781 \end{pmatrix} \end{pmatrix}

We can easy to see that G_{ij}^{hk} = G_{ji}^{hk} = G_{ij}^{kh} = G_{hk}^{ij} = G_{ji}^{kh} . Thus, if we flatten the matrix of matrix, it is Hermitian, or symmetric.

Now, we can start doing the Hartree method.

The general solution of the wave function is

\psi(x) = a_1 b_{1s}(x) + a_2 b_{2s}(x)

The Hartree matrix is

F_{ij} = H_{ij} + \sum_{h,k} a_h a_k G_{ij}^{hk}

The first trial wave function are the Hydrogen-like orbital,

\psi^{(0)}(x) = b_{1s}(r)

F_{ij}^{(0)} = \begin{pmatrix} -2 & 0 \\ 0 & -0.5 \end{pmatrix}  + \begin{pmatrix} 1.25 & 0.17871 \\ 0.17817 & 0.419753 \end{pmatrix}

Solve for eigen system, we have the energy after 1st trial,

\epsilon^{(1)} = -0.794702 , (a_1^{(1)}, a_2^{(1)}) = (-0.970112, 0.242659)

After 13th trial,

\epsilon^{(13)} = -0.880049 , (a_1^{(13)}, a_2^{(13)}) = (-0.981015, 0.193931)

F_{ij}^{(13)} = \begin{pmatrix} -2 & 0 \\ 0 & -0.5 \end{pmatrix}  + \begin{pmatrix} 1.15078 & 0.155932 \\ 0.155932 & 0.408748 \end{pmatrix}

Thus, the mixing of the 2s state is only 3.7%.

Since the eigen energy contains the 1-body energy and 2-body energy. So, the total energy for 2 electrons is

E_2 = 2 * \epsilon^{(13)} - G = -2.82364 \textrm{H} = -76.835 \textrm{eV}

In which ,

G = \langle \psi(x) \psi(y) |G(x,y) |\psi(x) \psi(y) \rangle = 1.06354 \textrm{H} = 28.9403 \textrm{eV}

So the energies for

From He to He++.  E_2 = -2.82364 \textrm{H} = -76.835 \textrm{eV}
From He+ to He++, E_1^+ = -Z^2/2 = 2 \textrm{H} = -54.422 \textrm{eV} .
From He to He+, is E_1 = E_2 - E_1^+ = -0.823635 \textrm{H} =  -22.4123 \textrm{eV}

The experimental 1 electron ionization energy for Helium atom is

E_1(exp) = -0.90357 \textrm{H} = -24.587 \textrm{eV}
E_1^+(exp) = -1.99982 \textrm{H} = -54.418 \textrm{eV}
E_2(exp) = -2.90339 \textrm{H} = -79.005 \textrm{eV}

The difference with experimental value is 2.175 eV. The following plot shows the Coulomb potential, the screening due to the existence of the other electron, the resultant mean field, the energy, and r \psi(x)


Usually, the Hartree method will under estimate the energy, because it neglected the correlation, for example, pairing and spin dependence. In our calculation, the E_2 energy is under estimated.

From the (a_1^{(13)}, a_2^{(13)}) = (-0.981015, 0.193931) , we can see, the mutual interaction between 1s and 2s state is attractive. While the interaction between 1s-1s and 2s-2s states are repulsive. The repulsive can be easily understood. But I am not sure how to explain the attractive between 1s-2s state.

Since the mass correction and the fine structure correction is in order of 10^{-3} \textrm{eV} , so the missing 0.2 eV should be due to something else, for example, the incomplete basis set.

If the basis set only contain the 1s orbit, the mutual interaction is 1.25 Hartree = 34.014 eV. Thus, the mixing reduce the interaction by 5.07 eV, just for 3.7% mixing

I included the 3s state,

\epsilon^{(13)} = -0.888475 , (a_1^{(13)}, a_2^{(13)}, a_3^{(13)}) = (0.981096, -0.181995, -0.06579)

The mutual energy is further reduced to 1.05415 Hartree = 28.6848 eV. The E_2 = -77.038 \textrm{eV} . If 4s orbital included, the E_2 = -77.1058 \textrm{eV} . We can expect, if more orbital in included, the E_2 will approach to E_2(exp).



Integrations of Laguerre polynomial

Leave a comment

Since the Laguerre polynomial is deeply connected to the hydrogen-like electron orbital. The radial solution is

\displaystyle R_{nl}(r) = A \frac{1}{r} \exp(- \frac{x}{2}) x^{l+1}L_{n-l-1}^{2l+1}(x)

\displaystyle x = \frac{2Z}{na_0} r

From the normalization condition, we can get the coefficient A

\displaystyle \int_0^{\infty} R_{nl}^2(r) r^2 dr

Beside of calculating the normalization factor, the general expectation value also need to evaluation of the integration of the Laguerre polynomial.

In here, we will show the calculation for 3 integrals:

\displaystyle \int_0^{\infty} x^{\alpha+m} e^{-x} L_n^\alpha(x) dx,  n,\alpha\geq0

\displaystyle \int_0^{\infty} x^{\alpha+k} e^{-x} (L_n^\alpha(x))^2 dx

\displaystyle \int_0^{\infty} x^{\alpha +k} e^{-x} L_m^{\beta}(x) L_n^\alpha(x) dx

Using the Rodrigues formula

\displaystyle L_n^\alpha(x) = \frac{1}{n!} x^{-\alpha} e^x \frac{d^n}{dx^n}(e^{-x}x^{n+\alpha})

\displaystyle \int_0^{\infty} x^{\alpha+m} e^{-x} L_n^\alpha(x) dx = \frac{1}{n!} \int_0^{\infty} x^{m} \frac{d^n}{dx^n}(e^{-x}x^{n+\alpha})dx

For m=0,

\displaystyle \int_0^{\infty} x^{\alpha+m} e^{-x} L_n^\alpha(x) dx \\ = \frac{1}{n!} \int_0^{\infty} \frac{d^n}{dx^n}(e^{-x}x^{n+\alpha})dx \\ = \frac{1}{n!} (F_{n-1}(\infty)-F_{n-1}(0) )

\displaystyle F_{n-1}(x) = \int_0^{\infty} \frac{d^n}{dx^n}(e^{-x}x^{n+\alpha})dx \\ = \sum_{i=0}^{n-1} (-1)^i \begin{pmatrix} n-1 \\ i \end{pmatrix} \frac{(n+\alpha)!}{(\alpha+1+i)!} x^{\alpha+1+i} e^{-x}

It is obvious that F(\infty) = 0 due to the e^{-x}. Since \alpha>0, then x^{\alpha+1+i} = 0 at x = 0. Thus,

\displaystyle \int_0^{\infty} x^{\alpha+m} e^{-x} L_n^\alpha(x) dx = 0 ,  m=0

Using integration by path,

\displaystyle \int_0^{\infty} x^{m} d \left(\frac{d^{n-1}}{dx^{n-1}}(e^{-x}x^{n+\alpha}) \right)dx \\ = [ x^{m} \frac{d^{n-1}}{dx^{n-1}}(e^{-x}x^{n+\alpha}) ]_0^{\infty} - (m) \int_0^{\infty} x^{m-1} \frac{d^{n-1}}{dx^{n-1}}(e^{-x}x^{n+\alpha})dx

For the same reason, the first term is zero for \alpha + m \geq 0. Repeat the integration by path k times,

\displaystyle \int_0^{\infty} x^{m} d \left(\frac{d^{n-1}}{dx^{n-1}}(e^{-x}x^{n+\alpha}) \right)dx \\ = (-1)^k \frac{(m)!}{(m-k)!} \int_0^{\infty} x^{m-k} \frac{d^{n-k}}{dx^{n-k}}(e^{-x}x^{n+\alpha})dx

When m \geq n || 0 > m, k can go to n, then,

\displaystyle \int_0^{\infty} x^{m} d \left(\frac{d^{n-1}}{dx^{n-1}}(e^{-x}x^{n+\alpha}) \right)dx \\ = (-1)^{n} \frac{(m)!}{(m-n)!} \int_0^{\infty} x^{m-n}(e^{-x}x^{n+\alpha})dx \\ = (-1)^{n} \frac{(m)!}{(m-n)!} (\alpha+m)!

When n > m > 0, k can only go to m. then,

\displaystyle \int_0^{\infty} x^{m} d \left(\frac{d^{n-1}}{dx^{n-1}}(e^{-x}x^{n+\alpha}) \right)dx \\ = (-1)^{m} (m)! \int_0^{\infty} \frac{d^{n-k}}{dx^{n-k}}(e^{-x}x^{n+\alpha})dx \\ = (-1)^{m} (m)!  \frac{d^{n-m-1}}{dx^{n-m-1}}(e^{-x}x^{n+\alpha}) = 0

For the similar result as F_{n-1}(x), the result is zero.

To summarize, the following formula suitable for all m > n || 0>m

\displaystyle \int_0^{\infty} x^{\alpha+m} e^{-x} L_n^\alpha(x) dx = (-1)^n (\alpha+m)! \begin{pmatrix} m \\ n \end{pmatrix}

To evaluate the integral,

\displaystyle \int_0^{\infty} x^{\alpha+k} e^{-x} (L_n^\alpha(x))^2 dx

We need the know that the Laguerre polynomial can be expressed as,

\displaystyle L_n^\alpha(x) = \sum_{i=0}^{n} (-1)^i \begin{pmatrix} n+\alpha \\ n-i \end{pmatrix} \frac{x^i}{i!}

Thus, we can evaluate

\displaystyle \int_0^{\infty} x^{\alpha+k} e^{-x} x^i L_n^\alpha(x) dx

Then combine everything,

\displaystyle \int_0^{\infty} x^{\alpha+k+i} e^{-x} L_n^\alpha(x) dx = (-1)^n (\alpha+i+k)! \begin{pmatrix} i+k \\ n \end{pmatrix}

put inside the series expression,

\displaystyle \int_0^{\infty} x^{\alpha+k} e^{-x} (L_n^\alpha(x))^2 dx = \sum_{i=0}^{n} (-1)^{i+n} \begin{pmatrix} n+\alpha \\ n-i \end{pmatrix} \begin{pmatrix} i+k \\ n \end{pmatrix} \frac{(\alpha+k+i)!}{i!}

In general, i + k \geq n || 0 > i+k .

For k = 0 , the sum only has 1 term for i = n

\displaystyle \int_0^{\infty} x^{\alpha} e^{-x} (L_n^\alpha(x))^2 dx = \frac{(\alpha+n)!}{n!}

Using the formula, for k=1, we have

\displaystyle \int_0^{\infty} x^{\alpha+1} e^{-x} (L_n^\alpha(x))^2 dx = \frac{(\alpha+n)!}{n!} (2n+\alpha +1)

To evaluate the expectation value of radial function

\displaystyle \langle R_{nl} | \frac{1}{r^m} | R_{nl} \rangle

We have to calculate the integral,

\displaystyle \int_0^\infty e^{-x} x^{2l+2-m} \left( L_{n-l-1}^{2l+1} (x) \right)^2 dx

For m = 0, we get the normalization constant,

\displaystyle A^2 \frac{na_0}{2Z} \int_0^{\infty} e^{-x} x^{2l+2} \left( L_{n-l-1}^{2l+1}(x) \right)^2 dx = 1

\displaystyle \langle \frac{1}{r} \rangle = \frac{1}{n^2} \frac{Z}{a_0}

\displaystyle \langle \frac{1}{r^2} \rangle = \frac{2}{n^3(2l+1)} \frac{Z^2}{a_0^2}

In general,

\displaystyle \langle \frac{1}{r^m} \rangle = \left(\frac{Z}{a_0}\right)^m \frac{2^{m-1}(n-l-1)!}{n^{m+1}(n+l)!} G_{nlm}

\displaystyle G_{nlm} = \sum_{i=0}^{n-l-1} (-1)^{n-l-1+i} \begin{pmatrix} n+l \\ n-l-1+i \end{pmatrix} \begin{pmatrix} 1-m+i \\ n-l-1 \end{pmatrix} \frac{(2l+2-m+i)!}{i!}

For the integral, for m \neq n

\displaystyle \int_0^{\infty} x^{\alpha +k} e^{-x} L_m^{\beta}(x) L_n^\alpha(x) dx

When k = 0, without lost of generality, when m < n , the integral is zero.

When k \neq 0, m<n

\displaystyle \int_0^{\infty} x^{\alpha +k} e^{-x} L_m^{\beta}(x) L_n^\alpha(x) dx \\ = \sum_{i=0}^{m} \frac{(-1)^i}{i!}\begin{pmatrix} m+\beta \\ m-i \end{pmatrix} \int_0^{\infty} x^{\alpha+k+i} e^{-x} L_n^\alpha(x) dx \\ = \sum_{i=0}^{m} \frac{(-1)^{i+n}}{i!}\begin{pmatrix} m+\beta \\ m-i \end{pmatrix} \begin{pmatrix} k+i \\ n \end{pmatrix} (\alpha+k+i)!

The sum only non-zero when m\geq i > n-k or -k > i \geq 0

This shows that the Laguerre polynomial is orthogonal with weighting e^{-x} x^{\alpha}.


Algorithm of Wavelet Transform (with Qt class)

Leave a comment

There are many kind of wavelet transform, and I think the names are quite confusing.

For instance, there are continuous and discrete wavelet transforms, in which, the “continuous” and “discrete” are for the wavelet parameters, not for the “data” itself. Therefore, for discrete data, there are “continuous” and “discrete” wavelet transforms, and for function, there are also “continuous” and “discrete” wavelet transforms.

In here, we will focus on discrete wavelet transform for function first. This discrete wavelet transform is also called as wavelet series, which express a compact support function into series of wavelet.

For simplicity, we also focus on orthonormal wavelet.

As the wavelet span the entire space, any compact function can be expressed as

\displaystyle f(t) = \sum_{j,k} \left<f(t)|\psi_{j,k}(t)\right> \psi_{j,k}(t)

\psi_{j,k}(t) = 2^{j/2} \psi(2^j t - k)

where j, k are integer.

Now, we move to discrete data discrete wavelet transform. The data is discrete, we can imagine only t_n = t_0 + n \Delta points are known with finite n .

\displaystyle f_n = f(t_n) = \sum_{j,k} \left<f_n|\psi_{j,k}(t_n) \right> \psi_{j,k}(t_n)

the integration becomes a finite sum.

Without loss of generality, we can set t_0 = 0, \Delta = 1, and then the time axis becomes an integer number axis. We found that j  \leq 0 as the wavelet can only be expand, not shrink. Because there are finite number of data point, i.e. n < \infty, -Log_2(n) < j \leq 0 .

However, this double summation for each f_n is very time consuming. There is a Fast Discrete Wavelet Transform. Before we continuous, we must study the wavelet.

From the last post, we know that the scaling function that generate a MRA must be:

\displaystyle \phi(t) = \sum_{k} g_0(k) \phi(2t-k)

\left<\phi(t-k) | \phi(t-k') \right> = \delta_{kk'}

, where k are integer. The set of shifted scaling function span a space V_0 . For the wavelet,

\displaystyle \psi(t) = \sum_{k} g_1(k) \psi(2t-k)

\left<\psi(t-k) | \psi(t-k') \right> = \delta_{kk'}

The set of shifted wavelet span a space W_0, so that W_0 \perp V_0, so that

\left<\phi(t-k)|\psi(t-k') \right> = 0

Since the wavelet is generated from the scaling function, we expect the coefficient of g_0(k) and g_1(k) are related. In fact, the relationship for orthonormal scaling function and wavelet is

g_1(k) = (-1)^k g_0(1-k)

For discrete data x_i , it can be decomposed into the MRA space. We start by the largest V_0 space, where the wavelet is most shrunken.

\displaystyle x_i = \sum_{k} v_{0,k} \phi(i-k)

to decompose to the V_{-1} and W_{-1} space. We can use the nested property of the MRA space, \phi(2t) can be decomposed into \phi(t-k) and \psi(t-k) ,

\displaystyle \psi(2t-l) = \sum_{k} h_0(2k-l) \phi(t-k) + h_1(2k-l) \psi(t-k)

where (given that \phi(t) and $\latex \psi(t)$ are orthonormal ),

h_0(2k-l) = \left< \phi(2t-l) | \phi(t-k) \right>

h_1(2k-l) = \left< \phi(2t-l) | \psi(t-k) \right>

Therefore, using the coefficient of h_0 and h_1, the wavelet coefficient v_{0,k} can be decomposed to

\displaystyle v_{s-1,k} = \sum_{l} h_0(2k-l) v_{s,l}  

\displaystyle w_{s-1,k} = \sum_{l} h_1(2k-l) v_{s,l}

in graphic representation


This is a fast discrete wavelet transform.

Due to the nested space of MRA, we also expect that the coefficient h_0 and h_1 are related to g_0 . For orthonormal wavelet,

\displaystyle h_0(k) = \frac{1}{2} g_0(-k)

\displaystyle h_1(k) = \frac{1}{2} (-1)^{k} g_0 (k+1)

Since the g_0 is finite, the g_1, h_0, h_1 are all finite. That greatly reduce the computation cost of the discrete wavelet transform.

To reconstruct the discrete data x_i, we don’t need to use

\displaystyle v_{s+1,l} = \sum_{k} v_{s,k} \phi(l - k) + w_{s,k} \psi(l-k)

using the nested space of MRA, \psi(t) = \sum_{k} g_1(k) \psi(2t-k) ,

\displaystyle v_{s+1,l} = \sum_{k} g_0(l-2k) v_{s,k} + g_1(l-2k) w_{s,k}

in graphical representation,


I attached the wavelet transfrom class for Qt, feel free to modify.



in the code, the data did not transform to MRA space. The code treats the data already in the MRA space. Some people said this is a “crime”. But for the seek of “speed”, it is no need to map the original discrete data into MRA space. But i agree, for continuous function, we must map to MRA space.



Wavelet Analysis or MRA


Although the Fourier transform is a very powerful tool for data analysis, it has some limit due to lack of time information. From physics point of view, any time-data should live in time-frequency space. Since the Fourier transform has very narrow frequency resolution, according to  uncertainty principle, the time resolution will be very large, therefore, no time information can be given by Fourier transform.

Usually, such limitation would not be a problem. However, when analysis musics, long term performance of a device, or seismic survey, time information is very crucial.

To over come this difficulty, there a short-time Fourier transform (STFT) was developed. The idea is the applied a time-window (a piecewise uniform function, or Gaussian) on the data first, then FT. By applying the time-window on difference time of the data (or shifting the window), we can get the time information. However, since the frequency range of the time-window  always covers the low frequency, this means the high frequency  signal is hard to extract.

To improve the STFT, the time-window can be scaled (usually by 2). When the time window is shrink by factor of 2, the frequency range is expanded by factor of 2. If we can subtract the frequency ranges for the time-window and the shrink-time-window, the high frequency range is isolated.

To be more clear, let say the time-window function be

\phi_{[0,1)}(t) = 1 , 0 \leq t < 1

its FT is

\hat{\phi}(\omega) = sinc(\pi \omega)

Lets also define a dilation operator

Df(t) = \sqrt{2} f(2t)

the factor \sqrt{2} is for normalization.

The FT of D\phi(t) has smaller frequency range, like the following graph.


We can subtract the orange and blue curve to get the green curve. Then FT back the green curve to get the high-frequency time-window.

We can see that, we can repeat the dilation, or anti-dilation infinite time. Because of this, we can drop the FT basis Exp(-2\pi i t \omega), only use the low-pass time-window to see the low-frequency behaviour of the data, and use the high-pass time-window to see the high-frequency behaviour of the data. Now, we stepped into the Multi-resolution analysis (MRA).

In MRA, the low-pass time-window is called scaling function \phi(t) , and the high-pass time-window is called wavelet \psi(t).

Since the scaling function is craetd by dilation, it has the property

\phi(t) = \sum_{k} g_{0}(k) \phi(2t-k)

where k is integer. This means the vector space span by {\phi(t-k)}_{k}=V_0 is a subspace of the dilated space DV_0 =V_1. The dilation can be go one forever, so that the whole frequency domain will be covered by V_{\infty}.

Also, the space span by the wavelet, {\psi(t-k)}=W_0, is also a subspace of V_1. Thus, we can imagine the structure of MRA is:


Therefore, any function f(t) can also be expressed into the wavelet spaces. i.e.

f(t) = \sum_{j,k} w_{j,k} 2^{j/2}\psi(2^j t - k)

where j, k are integers.

I know this introduction is very rough, but it gives a relatively smooth transition from FT to WT (wavelet transform), when compare to the available material on the web.





Levenberg-Marquardt Algorithm

Leave a comment

In pervious post, we shows the Gauss-Newton method for fitting non-linear function. The disadvantage of that method is that the inverse matrix could be ill-defined. This makes the method unstable.

Back to the basic, we want to minimize the sum of square of residual (SSR). The SSR is,

SSR(\beta) = (Y - f(\beta))^T\cdot (Y-f(\beta))

The derivative on \beta,

\frac{d}{d\beta} SSR(\beta) = -2 (Y-f(\beta))^T \cdot \nabla f(\beta)

Many literatures denote \nabla f = J, which is the Jacobian. The second derivative of f is Hessian matrix H = \nabla^2 f \sim J^T\cdot J.

The Gradient Descent method is that ,

h = \beta - \beta_0 = \alpha J^T \cdot (Y - f(\beta_0))

where \alpha is a step size. The gradient descent changes the SSR using the steepest path. The step size \alpha has to be adjusted. The simplest way to adjust is testing the \delta = SSR(\beta_0 + h) - SSR(\beta_0). If \delta < 0 , the \alpha increases, else decreases. This method is slow but stable. It is slow because of finding the \alpha . It is stable because the method is always computable.

Thus, we have 2 methods, Gradient Descent is stable and slow, Gauss-Newton method is unstable but fast. Levenberg-Marquardt Algorithm combined this 2 methods so that it is stable and fast by solving,

(J^T \cdot J + \lambda I) h = J^T \cdot (Y - f)

where \lambda is an adjustable parameter. When \lambda >> 1 , the J^T\cdot J is neglectable and the method becomes Gradient Descent with small \alpha. When the \lambda << 1, the method becomes Gauss-Newton method.

Usually, the \lambda_0 is small. The Gauss-Newton method is very good near the minimum of SSR, while Gradient Descent is better far away.

When the \delta < 0, \lambda_{i+1} = \lambda_i / 10 , else \lambda_{i+1} = \lambda_i * 10. I don’t know the exact reason for this setting. In fact if you set oppositely, the method is still work in most cases.

The method add \lambda I on the J^T\cdot J , the inverse is always well-define. Therefore, this method is stable.

Non-linear Regression

Leave a comment

The fit equation is

Y = f(A) + \epsilon

We assume near Y , the curvy subspace of f(A) can be approximated by a plane.  This, using Taylor series,

Y = f(A_0) + F(A_0) \cdot (A - A_0)  + \cdots,

where F(A_0) is divergence of f(A) at A_0.

Using same technique in linear regression,

A - A_0 = (F(A_0)^T \cdot F(A_0))^{-1} \cdot F(A_0) \cdot ( Y-f(A_0))

With an initial guess, the interaction should approach to the best estimated parmeter \hat{A}.

The covariance is

Var(A) = \sigma^2 (F(A)^T \cdot F(A))^{-1}

The above method is also called Gauss-Newton method.

Multi-dimension Linear Regression

Leave a comment

In the field of science, collecting data and fitting it with model is essential. The most common type of fitting is 1-dimensional fitting, as there is only one independent variable. By fitting, we usually mean the least-squared method.

Suppose we want to find the n parameters in a linear function

f(x_1, x_2,\cdots, x_n) = \sum_{i=1} a_i x_i

with m observed experimental data

Y_j = f(x_{1j}, x_{2j}, \cdot, x_{nj} + \epsilon_j= \sum_{i=1} a_i x_{ij}+ \epsilon_j

Thus, we have a matrix equation

Y=X \cdot A + \epsilon

where Y is a m-dimensional data column vector, A is a n-dimensional parameter column vector, and X is a n-m non-square matrix.

In order to get the n parameter, the number of data m >= n. when m=n, it is not really a fitting because of degree-of-freedom is DF = m-n = 0, so that the fitting error is infinity.

The least square method in matrix algebra is like calculation. Take both side with transpose of X

X^T \cdot Y = (X^T \cdot X) \cdot A + X^T \cdot \epsilon

(X^T\cdot X)^{-1} \cdot X^T \cdot Y = A + (X^T \cdot X)^{-1} \cdot X^T \cdot \epsilon

Since the expectation of the \epsilon is zero. Thus the expected parameter is

A = (X^T \cdot X)^{-1} \cdot X^T \cdot Y

The unbiased variance is

\sigma^2 = (Y - X\cdot A)^T \cdot (Y - X\cdot A) / DF

where DF is the degree of freedom, which is the number of value that are free to vary. Many people will confuse by the “-1” issue. In fact, if you only want to calculate the sum of square of residual SSR, the degree of freedom is always m - n.

The covariance of the estimated parameters is

Var(A) = \sigma^2 (X^T\cdot X)^{-1}

This is only a fast-food notices on the linear regression. This has a geometrical meaning  that the matrix X is the sub-space of parameters with basis formed by the column vectors of X. Y is a bit out-side the sub-space. The linear regression is a method to find the shortest distance from Y to the sub-space X .

The from of the variance can be understood using Taylor series. This can be understood using variance in matrix notation Var(A) = E( A - E(A) )^T \cdot E(A  - E(A)) .




Older Entries