Multi-dimension Linear Regression

Leave a comment

In the field of science, collecting data and fitting it with model is essential. The most common type of fitting is 1-dimensional fitting, as there is only one independent variable. By fitting, we usually mean the least-squared method.

Suppose we want to find the n parameters in a linear function

f(x_1, x_2,\cdots, x_n) = \sum_{i=1} a_i x_i

with m observed experimental data

Y_j = f(x_{1j}, x_{2j}, \cdot, x_{nj} + \epsilon_j= \sum_{i=1} a_i x_{ij}+ \epsilon_j

Thus, we have a matrix equation

Y=X \cdot A + \epsilon

where Y is a m-dimensional data column vector, A is a n-dimensional parameter column vector, and X is a n-m non-square matrix.

In order to get the n parameter, the number of data m >= n. when m=n, it is not really a fitting because of degree-of-freedom is DF = m-n = 0, so that the fitting error is infinity.

The least square method in matrix algebra is like calculation. Take both side with transpose of X

X^T \cdot Y = (X^T \cdot X) \cdot A + X^T \cdot \epsilon

(X^T\cdot X)^{-1} \cdot X^T \cdot Y = A + (X^T \cdot X)^{-1} \cdot X^T \cdot \epsilon

Since the expectation of the \epsilon is zero. Thus the expected parameter is

A = (X^T \cdot X)^{-1} \cdot X^T \cdot Y

The unbiased variance is

\sigma^2 = (Y - X\cdot A)^T \cdot (Y - X\cdot A) / DF

where DF is the degree of freedom, which is the number of value that are free to vary. Many people will confuse by the “-1” issue. In fact, if you only want to calculate the sum of square of residual SSR, the degree of freedom is always m - n.

The covariance of the estimated parameters is

Var(A) = \sigma^2 (X^T\cdot X)^{-1}

This is only a fast-food notices on the linear regression. This has a geometrical meaning  that the matrix X is the sub-space of parameters with basis formed by the column vectors of X. Y is a bit out-side the sub-space. The linear regression is a method to find the shortest distance from Y to the sub-space X .

The from of the variance can be understood using Taylor series. This can be understood using variance in matrix notation Var(A) = E( A - E(A) )^T \cdot E(A  - E(A)) .





Goodness of Fit

Leave a comment

We assumed each data point is taking from a distribution with mean \mu and variance \sigma^2

Y\sim D(\mu, \sigma^2)

in which, the mean can be a function of X.

For example, we have a data Y_i , it has relation with an independent variable X_i. We would like to know the relationship between Y_i and X_i, so we fit a function y = f(x).

After the fitting (least square method), we will have so residual for each of the data

e_i = y_i - Y_i

This residual  should be follow the distribution

e \sim D(0, \sigma_e^2)

The goodness of fit, is a measure, to see the distribution of the residual, agree with the experimental error of each point, i.e. \sigma

Thus, we would like to divide the residual with \sigma and define the chi-squared

\chi^2 = (\sum (e_i^2)/\sigma_{e_i}^2 ) .

we can see, the distribution of

e/\sigma_e \sim D(0, 1)

and the sum of this distribution would be the chi-squared distribution. It has a mean of the degree of freedom DF. Note that the mean and the peak of the chi-squared distribution is not the same that the peak at  DF-1.

In the case we don’t know the error, then, the sample variance of the residual is out best estimator of the true variance. The unbiased sample variance is

\sigma_s^2 = Var(e)/DF ,

where DF is degree of freedom. In the cause of f(x) = a x + b, the DF = n-1 , because there is 1 degree o freedom used in x. And because the 1  with the b is fixed, it provides no degree of freedom.

on angular momentum adding & rotation operator

Leave a comment

the angular momentum has 2 kinds – orbital angular momentum L , which is caused by a charged particle executing orbital motion, since there are 3 dimension space. and spin S , which is an internal degree of freedom to let particle “orbiting” at there.

thus, a general quantum state for a particle should not just for the spatial part and the time part. but also the spin, since a complete state should contains all degree of freedom.

\left| \Psi \right> = \left| x,t \right> \bigotimes \left| s \right>

when we “add” the orbital angular momentum and the spin together, actually, we are doing:

J = L \bigotimes 1 + 1 \bigotimes S

where the 1 with L is the identity of the spin-space and the 1 with S is the identity of the 3-D space.

the above was discussed on J.J. Sakurai’s book.

the mathematics of L and S are completely the same at rotation operator.

R_J (\theta) = Exp( - \frac {i}{\hbar} \theta J)

where J can be either L or S.

the L can only have effect on spatial state while S can only have effect on the spin-state. i.e:

R_L(\theta) \left| s \right> = \left| s\right>

R_S(\theta) \left| x \right> = \left| x\right>

the L_z can only have integral value but S_z can be both half-integral and integral. the half-integral value of Sz makes the spin-state have to rotate 2 cycles in order to be the same again.

thus, if the different of L and S is just man-made. The degree of freedom in the spin-space is actually by some real geometry on higher dimension. and actually, the orbital angular momentum can change the spin state:

L \left| s \right> = \left | s' \right > = c \left| s \right>

but the effect is so small and

R_L (\theta) \left| s\right > = Exp( - \frac {i}{\hbar} \theta c )\left| s \right>

but the c is very small, but if we can rotate the state for a very large angle, the effect of it can be seen by compare to the rotation by spin.

\left < R_L(\omega t) + R_S(\omega t) \right> = 2 ( 1+ cos ( \omega ( c -1 ) t)

the experiment can be done as follow. we apply a rotating magnetic field at the same frequency as the Larmor frequency. at a very low temperature, the spin was isolated and T_1 and T_2 is equal to \infty . the different in the c will come up at very long time measurement and it exhibit a interference pattern.

if c is a complex number, it will cause a decay, and it will be reflected in the interference pattern.

if we find out this c, then we can reveal the other spacial dimension!


the problem is. How can we act the orbital angular momentum on the spin with out the effect of spin angular momentum? since L and S always coupled.

one possibility is make the S zero. in the system of electron and positron. the total spin is zero.

another possibility is act the S on the spatial part. and this will change the energy level.


an more fundamental problem is, why L and S commute? the possible of writing this

\left| \Psi \right> = \left| x,t \right> \bigotimes \left| s \right>

is due to the operators are commute to each other. by why?

if we break down the L in to position operator x and momentum operator p, the question becomes, why x and S commute or p and S commute?

[x,S]=0 ?

[p,S]=0 ?

[p_x, S_y] \ne 0 ?

i will prove it later.


another problem is, how to evaluate the Poisson bracket? since L and S is not same dimension. may be we can write the eigenket in vector form:

\begin {pmatrix} \left|x, t \right> \\ \left|s\right> \end {pmatrix}

i am not sure.



For any vector operator, it must satisfy following equation, due to rotation symmetry.

[V_i, J_j] = i \hbar V_k   run in cyclic


where J is rotation operator. but i am not sure is it restricted to real space rotation. any way, spin is a vector operator, thus

$latex [S_x, L_y] = i \hbar S_z = – [S_y, L_x] $

so, L, S is not commute.