## Goodness of Fit

We assumed each data point is taking from a distribution with mean $\mu$ and variance $\sigma^2$

$Y\sim D(\mu, \sigma^2)$

in which, the mean can be a function of X.

For example, we have a data $Y_i$, it has relation with an independent variable $X_i$. We would like to know the relationship between $Y_i$ and $X_i$, so we fit a function $y = f(x)$.

After the fitting (least square method), we will have so residual for each of the data

$e_i = y_i - Y_i$

This residual  should be follow the distribution

$e \sim D(0, \sigma_e^2)$

The goodness of fit, is a measure, to see the distribution of the residual, agree with the experimental error of each point, i.e. $\sigma$

Thus, we would like to divide the residual with $\sigma$ and define the chi-squared

$\chi^2 = (\sum (e_i^2)/\sigma_{e_i}^2 )$.

we can see, the distribution of

$e/\sigma_e \sim D(0, 1)$

and the sum of this distribution would be the chi-squared distribution. It has a mean of the degree of freedom $DF$. Note that the mean and the peak of the chi-squared distribution is not the same that the peak at  $DF-1$.

In the case we don’t know the error, then, the sample variance of the residual is out best estimator of the true variance. The unbiased sample variance is

$\sigma_s^2 = Var(e)/DF$,

where $DF$ is degree of freedom. In the cause of $f(x) = a x + b$, the $DF = n-1$, because there is 1 degree o freedom used in x. And because the 1  with the b is fixed, it provides no degree of freedom.

## Weighted mean and error

We have n values of $x_i$ and error $\sigma_i$,

With a weighting $w_i$, the uncorrelated weighted mean and error is

$X= \sum x_i w_i / \sum w_i$

$S^2 = \sum w_i^2 \sigma_i^2 / (\sum w_i)^2$

when combining data, the weighting is

$w_i = 1/\sigma_i^2$

and the weighted error becomes

$S^2 = \sum{\frac{1}{\sigma_i^2}} / (\sum{\frac{1}{\sigma_i^2}})^2 = 1 / \sum \frac{1}{\sigma_i^2}$

Example.

we measured a quantity n times, we can assume the intrinsic error of the data is fixed. Thus,

$w_i = 1/n$

$X = \sum x_i / n$

$S^2 = \sum \sigma_0^2/n^2 = \sigma_0^2 /n$

Therefore, when we take more and more data, the error is proportional to $1/\sqrt{n}$.

In normal distribution, the sample of size n, the estimator of the sample mean and sample variance are

$X =\sum x_i/n$

$S^2 = \sum (x_i-X)^2 / (n-1)$

Don’t mix up the sample variance and intrinsic error, although they are very similar.

To explain the formula of the weighted variance, we have to go to the foundation of the algebra of distribution.

For a random variable follow a distribution with mean $\mu$ and variance $\sigma^2$,

$X \sim D(\mu, \sigma^2)$

Another random variable built on it,

$Z=aX+b \sim D(a \mu + b, a^2\sigma^2)$

The adding of two independent random variables is

$Z=aX + bY \sim D(a \mu_X + b \mu_Y, a^2\sigma_X^2 + b^2 \sigma_Y^2)$

But there is a catch, when the $\mu_X = \mu_Y$ and $\sigma_X = \sigma_Y$, The rule does not apply. But lets look back, if the mean and variance are the same, the two distribution does not really independent.

## Maximum Likelihood

In data analysis, especially the number of data is small, in order to found out the parameter of the distribution, which fit the data the best, maximum likelihood method is a mathematical tool to do so.

The ideal can be found in Wikipedia. For illustration, I generate 100 data points from a Gaussian distribution with mean = 1, and sigma = 2.

In Mathematica,

Data = RandomVariate[NormalDistribution[1, 2], 100]
MaxLikeliHood = Flatten[Table[{
mean,
sigma,
Log[Product[PDF[NormalDistribution[mean, sigma], Data[[i]]] // N, {i, 1,100}]],
},
{mean, -3, 3, 1}, {sigma, 0.5, 3.5, 0.5}], 1]

This calculate the a table of mean form -3 to 3, step 1, sigma from 0.5 to 3.5 step 0.5. To find the maximum of the LogProduct in the table:

maxL=MaxLikeliHood[[1 ;; -1, 3]];
Max[%]
maxN = Position[maxL[[1 ;; -1, 3]], %]
maxL[[Flatten[maxN]]]

The result is

{{1,2.,-217.444}}

which is the correct mean and sigma.