## Testing Hypothesis

Testing hypothesis may be the most used and most misunderstood statistics tool. When we do even a simple fitting, and want to evaluate the fitting result, we have to use the hypothesis testing. One common quantity used is the reduced chi-squared.

A hypothesis testing means given an observation and hypothesis, Is the hypothesis NOT true? right, hypothesis test never tell us the trueness  of the hypothesis, but the wrongness of it. The core of the test is “Can we reject the null hypothesis?

There are one-tailed and two-tailed testing, as a result, the p-value has different meanings.

https://en.wikipedia.org/wiki/One-_and_two-tailed_tests

https://en.wikipedia.org/wiki/P-value

The p-value is the probability that the model agree with the observation. when the p-value too small, smaller than the confident level, the null hypothesis Rejected. But if the p-value is very large, in a 1-tailed test, we cannot say the null hypothesis is true, but we can say the null hypothesis CANNOT be rejected.

In 2-tailed test, there are two p-values, corresponding to each tail.

https://en.wikipedia.org/wiki/Confidence_interval

https://en.wikipedia.org/wiki/Type_I_and_type_II_errors

## Variance and Sigma in finding resolution

I have been mis-understood for a while.

First of all, the variance of a distribution is not equal its sigma, except for Normal distribution.

In an observation, there should be an intrinsic variance, for example, some hole size, or physical windows. And there is a resolution from the detection. As a result, we observed an overall effect of the intrinsic variance and the detector resolution. In an data analysis, one of the goal is the find out the resolution.

Lets denote the random variable of the intrinsic variance is

$X \sim D(\mu, \sigma^2)$

and the resolution of the detection is an other random variable of Normal distribution,

$Y \sim N(0, s^2)$

Then, we observed,

$Z = X + Y \sim D(\mu, \sigma^2 + s^2)$,

according to the algebra of distribution.

If the $\sigma >> s$ and the the intrinsic distribution is NOT a gaussian, say, it is a uniform distribution, then, the observed distribution is NOT a gaussian. One why to get the resolution is to do de-convolution. Since we are not interesting on the intrinsic distribution but the resolution, thus we can simply use the variance of the intrinsic distribution and the variance of the observed distribution to extract the resolution.

When $\sigma <= s$, the observed distribution is mostly a gaussian-like. and we can approximate the observed variance as the squared-sigma of a gaussian fit.

for example, in deducing the time resolution using time-difference method with a help of a tracking, that a narrow width of the position was gated.

The narrow width of the position is equivalent to a uniform distribution of time-difference. Thus, the time resolution is deduced using the observed variance and the variance of the uniform distribution. For the width of the position is $\Delta X$, the width of the time difference is

$\Delta t = \Delta X / c/\beta$,

Thus,

$Var(resol.) = Var(obser.) - Var(\Delta t)$

The variance of an uniform distribution is 1/12 of the square of the width.

$Var(\Delta t) = (\Delta t)^2/ 12 \neq (\Delta t)^2$

The effect of the factor 1/12 is very serious when the resolution is similar with the width of the time difference. But can be neglected when $Var(resol.) >> Var( \Delta t)$.

Because of missing this 1/12 factor, the resolution will be smaller than actual resolution.

Here is an example, I generated 10,000 data, the intrinsic distribution is a uniform distribution from 0 to 100, the resolution is a gaussian of sigma of 20. The result distribution is a convolution of them.

When find resolution using projection from a 2-D distribution, we should be careful about the projected 1-D distribution and its variance. For example, a uniform 2-D disk projected to 1-D distribution, the 1-D distribution is not uniform but a half circle,

$pdf =f(x) =\sqrt{(r^2-x^2)}$

the variance is $(0.5 r)^2$.

the formula to calculate variance is

$Var(X) = \int x^2 f(x) dx / \int f(x) dx$,

where $f(x)$ is the pdf.

ToDo, estimate the error of the deduced resolution, with number of data. For small number of data, the error should be large. but how large?

## Goodness of Fit

We assumed each data point is taking from a distribution with mean $\mu$ and variance $\sigma^2$

$Y\sim D(\mu, \sigma^2)$

in which, the mean can be a function of X.

For example, we have a data $Y_i$, it has relation with an independent variable $X_i$. We would like to know the relationship between $Y_i$ and $X_i$, so we fit a function $y = f(x)$.

After the fitting (least square method), we will have so residual for each of the data

$e_i = y_i - Y_i$

This residual  should be follow the distribution

$e \sim D(0, \sigma_e^2)$

The goodness of fit, is a measure, to see the distribution of the residual, agree with the experimental error of each point, i.e. $\sigma$

Thus, we would like to divide the residual with $\sigma$ and define the chi-squared

$\chi^2 = (\sum (e_i^2)/\sigma_{e_i}^2 )$.

we can see, the distribution of

$e/\sigma_e \sim D(0, 1)$

and the sum of this distribution would be the chi-squared distribution. It has a mean of the degree of freedom $DF$. Note that the mean and the peak of the chi-squared distribution is not the same that the peak at  $DF-1$.

In the case we don’t know the error, then, the sample variance of the residual is out best estimator of the true variance. The unbiased sample variance is

$\sigma_s^2 = Var(e)/DF$,

where $DF$ is degree of freedom. In the cause of $f(x) = a x + b$, the $DF = n-1$, because there is 1 degree o freedom used in x. And because the 1  with the b is fixed, it provides no degree of freedom.

## Weighted mean and error

We have n values of $x_i$ and error $\sigma_i$,

With a weighting $w_i$, the uncorrelated weighted mean and error is

$X= \sum x_i w_i / \sum w_i$

$S^2 = \sum w_i^2 \sigma_i^2 / (\sum w_i)^2$

when combining data, the weighting is

$w_i = 1/\sigma_i^2$

and the weighted error becomes

$S^2 = \sum{\frac{1}{\sigma_i^2}} / (\sum{\frac{1}{\sigma_i^2}})^2 = 1 / \sum \frac{1}{\sigma_i^2}$

Example.

we measured a quantity n times, we can assume the intrinsic error of the data is fixed. Thus,

$w_i = 1/n$

$X = \sum x_i / n$

$S^2 = \sum \sigma_0^2/n^2 = \sigma_0^2 /n$

Therefore, when we take more and more data, the error is proportional to $1/\sqrt{n}$.

In normal distribution, the sample of size n, the estimator of the sample mean and sample variance are

$X =\sum x_i/n$

$S^2 = \sum (x_i-X)^2 / (n-1)$

Don’t mix up the sample variance and intrinsic error, although they are very similar.

To explain the formula of the weighted variance, we have to go to the foundation of the algebra of distribution.

For a random variable follow a distribution with mean $\mu$ and variance $\sigma^2$,

$X \sim D(\mu, \sigma^2)$

Another random variable built on it,

$Z=aX+b \sim D(a \mu + b, a^2\sigma^2)$

The adding of two independent random variables is

$Z=aX + bY \sim D(a \mu_X + b \mu_Y, a^2\sigma_X^2 + b^2 \sigma_Y^2)$

But there is a catch, when the $\mu_X = \mu_Y$ and $\sigma_X = \sigma_Y$, The rule does not apply. But lets look back, if the mean and variance are the same, the two distribution does not really independent.

## Maximum Likelihood

In data analysis, especially the number of data is small, in order to found out the parameter of the distribution, which fit the data the best, maximum likelihood method is a mathematical tool to do so.

The ideal can be found in Wikipedia. For illustration, I generate 100 data points from a Gaussian distribution with mean = 1, and sigma = 2.

In Mathematica,

Data = RandomVariate[NormalDistribution[1, 2], 100]
MaxLikeliHood = Flatten[Table[{
mean,
sigma,
Log[Product[PDF[NormalDistribution[mean, sigma], Data[[i]]] // N, {i, 1,100}]],
},
{mean, -3, 3, 1}, {sigma, 0.5, 3.5, 0.5}], 1]



This calculate the a table of mean form -3 to 3, step 1, sigma from 0.5 to 3.5 step 0.5. To find the maximum of the LogProduct in the table:

maxL=MaxLikeliHood[[1 ;; -1, 3]];
Max[%]
maxN = Position[maxL[[1 ;; -1, 3]], %]
maxL[[Flatten[maxN]]]



The result is

{{1,2.,-217.444}}

which is the correct mean and sigma.

## estimate the mean reaction rate given that no event in time interval

The count follows Poisson Distribution:

$P(n|\lambda) = \frac{(\lambda T)^n}{n!} Exp(-\lambda T)$

this means, the probability of number of count, n, happened in time interval T given that the mean rate is $\lambda$.

now, we measured no count in time interval T, what is the mean rate for given confident interval?

if the count is zero, then the probability is:

$P(0|\lambda) = Exp(-\lambda T )$

and we can treat this as a new probability density function that, the probability of count rate is $\lambda$ in time interval T.

normalized this pdf.

$P(\lambda) = T Exp(-\lambda T )$

we can see, for $\lambda = 0$, the probability is 1. of course, if the count rate is 0, then no count is 100 %. therefore, we want to estimate the maximum count rate it will be to give “zero count “.

Now we have to introduce the Confident Interval (CL). this is the chance that the “interest” is true. in here, our interest is the “count rate is smaller then some value “. Thus, the total probability is:

$1- CL = \int_0^(\lambda_0) P(\lambda) d\lambda$

$\lambda = \frac{1}{T} ln(\frac{1}{CL})$

Lets give an example. suppose we count nothing in 10 second. what is the count rate for 95% Confident Interval?

plug in the equation and give $\lambda = 0.005$.

Now, we can see how the Confident Interval means. 95% means, in 100 measurement of 10 second interval, there is 95% that no count. thus, in 1000 second time interval, there is at most 5 count. which is same to say  $\lambda = 0.005$.

or, if we put the  $\lambda = 0.005$ into the Poisson distribution.

$P(0|0.005) = Exp(-0.05) = 0.951$

this is same that it has 95% give 0 count.

## On NMR noise reduction

there are many ways to reduce the signal-to-noise ratio in NMR.

here, we will show the simplest one – by measuring it many times.

the reason behind this is the different of statistic nature between signal and noise.

for signal, the correlation between each measurement is strong. or to say, if you measured the signal is around 10 this time, it will probable measure it around 10 next time.

however, the noise is different, it is random in nature. so, in average, it will equal to zero.

in mathematics, we use the standard deviation $\sigma$ and covariance $cov$ to show it.

$\sigma(S_T)=\sigma(\sum S_i) = \sqrt{ \sum ( \sigma( S_i)^2 + 2cov(S_i,S_j) ) }$

$\sigma(N_T)=\sigma(\sum N_i) = \sqrt{ \sum ( \sigma( N_i)^2 + 2cov(N_i,N_j) ) }$

since the signal should be correlated, $cov(S_i , S_j) = \sigma(S_i) \sigma(S_j)$ thus,

$\sigma(S_T) = n \sigma(S_i)$

but the noise should have no correlation, $cov(N_i, N_j) = 0$

$\sigma(N_T) = \sqrt{n} \sigma(N_i)$

Thus, the signal-to-noise level will be

$\sigma( S/N _T ) = \sqrt{n} \sigma (S/N)$

For 100 times measurements, the signal-to-noise ratio will be 10 times stronger.