Although the Fourier transform is a very powerful tool for data analysis, it has some limit due to lack of time information. From physics point of view, any time-data should live in time-frequency space. Since the Fourier transform has very narrow frequency resolution, according to  uncertainty principle, the time resolution will be very large, therefore, no time information can be given by Fourier transform.

Usually, such limitation would not be a problem. However, when analysis musics, long term performance of a device, or seismic survey, time information is very crucial.

To over come this difficulty, there a short-time Fourier transform (STFT) was developed. The idea is the applied a time-window (a piecewise uniform function, or Gaussian) on the data first, then FT. By applying the time-window on difference time of the data (or shifting the window), we can get the time information. However, since the frequency range of the time-window  always covers the low frequency, this means the high frequency  signal is hard to extract.

To improve the STFT, the time-window can be scaled (usually by 2). When the time window is shrink by factor of 2, the frequency range is expanded by factor of 2. If we can subtract the frequency ranges for the time-window and the shrink-time-window, the high frequency range is isolated.

To be more clear, let say the time-window function be

$\phi_{[0,1)}(t) = 1 , 0 \leq t < 1$

its FT is

$\hat{\phi}(\omega) = sinc(\pi \omega)$

Lets also define a dilation operator

$Df(t) = \sqrt{2} f(2t)$

the factor $\sqrt{2}$ is for normalization.

The FT of $D\phi(t)$ has smaller frequency range, like the following graph.

We can subtract the orange and blue curve to get the green curve. Then FT back the green curve to get the high-frequency time-window.

We can see that, we can repeat the dilation, or anti-dilation infinite time. Because of this, we can drop the FT basis $Exp(-2\pi i t \omega)$, only use the low-pass time-window to see the low-frequency behaviour of the data, and use the high-pass time-window to see the high-frequency behaviour of the data. Now, we stepped into the Multi-resolution analysis (MRA).

In MRA, the low-pass time-window is called scaling function $\phi(t)$, and the high-pass time-window is called wavelet $\psi(t)$.

Since the scaling function is craetd by dilation, it has the property

$\phi(t) = \sum_{k} g_{0}(k) \phi(2t-k)$

where $k$ is integer. This means the vector space span by ${\phi(t-k)}_{k}=V_0$ is a subspace of the dilated space $DV_0 =V_1$. The dilation can be go one forever, so that the whole frequency domain will be covered by $V_{\infty}$.

Also, the space span by the wavelet, ${\psi(t-k)}=W_0$, is also a subspace of $V_1$. Thus, we can imagine the structure of MRA is:

Therefore, any function $f(t)$ can also be expressed into the wavelet spaces. i.e.

$f(t) = \sum_{j,k} w_{j,k} 2^{j/2}\psi(2^j t - k)$

where $j, k$ are integers.

I know this introduction is very rough, but it gives a relatively smooth transition from FT to WT (wavelet transform), when compare to the available material on the web.