Well, I'm very interested in this topic aswell.
I can't tell you everything but a few things:
A Wavelet transform is a generalization/improvement to a Fourier Transform.
If you Fourier Transform a signal, you essentially turn time into frequency and frequency into time.
The use of this is manifold. It helps computers understanding inputs like humans do for one part. (what we hear is essentially processed by a fourier transform to perform grouping of information and such)
It can be used to compress data (there is data that, while corresponding to a function that's hard to calculate, has fairly easy fourier representation), it's frequently used to proove things in maths and physics.
Overally, Wavelets already helped in a LOT of applications and continue to do so.
However, while BEFORE Fourier transformation, you only have the information, when you reach what value, AFTER it, you only have the information, essentially how often you reach what value.
Some applications need BOTH informations.
To do that, Wavelet transforms got introduced.
Those perform a Fourier transform on a function that is premultiplied by a wavelet, which basically windows the function to predominantly include time of a finite interval.
So to speak, it weights the function to a time-chunk.
If transformed THEN, you get a little less information about the frequencies of the WHOLE band, but you get most of the information of the current time, along with that time.
In a discrete Wavelet transform, this is done on set times, essentially one transform per sample.
In a continuous Wavelet transform, it's done for continuous time.
Waves, as you might know, suffer from the frequency/time dualism.
You can only have so much information about frequency OR time.
Normally, you plot functions in the time domain.
Some (in fact very many) applications are easier to solve (or to solve in the first place) if you look at the frequency domain. - that led to the Fourier Transform.
Wavelet transform is essentially somewhere in between, giving you "the best of both worlds", you get a bit of frequency information and a bit of time information. How much most depends on your choice of wavelet basis.
Famous examples:
Haar Wavelet:
http://en.wikipedia.org/wiki/Haar_wavelet - the simplest and most used one of them all. Afaik exclusively discrete wavelet Transform (DWT)
Shannon Wavelet:
http://en.wikipedia.org/wiki/Shannon_wavelet - the dual of the Haar-wavelet, based on the Sinc function. Afaik not much used for giving too little time information. Or something along those lines.
Mexican Hat
http://en.wikipedia.org/wiki/Mexican_hat_wavelet (apparently the image got removed) If I'm not mistaken, it can be used for both DWT and Continuous Wavelet Transform (CWT)
Gabor Filter
http://en.wikipedia.org/wiki/Gabor_filter - the "ideal" wavelet in the sense that it statifies the Bell Inequality at its highest possible accuracy of both frequency and time. Mostly used for CWT but I've seen DWT applications for it too.
Morlet Wavelet
http://en.wikipedia.org/wiki/Morlet_wavelet - nearly the same as Gabor. Slightly simpler and thus slightly more used despite it being slightly less useful (afaik. I could easily be wrong there) CWT and probably also DWT
Daubechies Wavelet
http://en.wikipedia.org/wiki/Daubechies_wavelet - I think mostly DWT. The weirdest one of them all to me.
Esentially, if I got it right, you'd simply multiply a given Function by one of the above functions and then apply Fourier Transform on the resulting function...
Note that all those wavelets kind of look like waves which are limited in extension or at least nearly. Most (all but Haar) of my examples are technically infinitely wide. However, they drop down quickly, which, with the exception of the Shannon Wavelet, makes later parts of the functions way less important to the overal result and thus give you more time information even after transforming into the frequency domain.
Edit: It sort of comes down to wave-particle duality
http://en.wikipedia.org/wiki/Wave%E2%80%93particle_dualitywhere the time domain would be the trajectory of a particle that's perfectly localized while the frequency domain would be all you need to descrive a totally unlocalized (as in infinite) wave.
So "normal" functions give you the particle picuture, fourier transforms give you the wave picutre and wavelet transforms give you the "particle-and-wave" picture. In case of your brain, picture can even be seen literally. Our retina and our brain do essentially a continuous Gabor Wavelet Transform to make sense of things. This allows us to do edge-detection, which is very important in further tasks such as shape recognition and content grouping. (For instance, I can tell that you are human. I can tell that your nose is a nose. And I can tell that your nose is part of you. - Something that most AIs currently fail to do. They can either identify full humans (or in most cases, only their faces) but not PARTS of the face. OR they can only recognize PARTS of the face (and then use that information to deduce the presence of a face) but not the whole face at once. This kind of grouping and subgrouping is currently subject to lots of research, as far as I saw - there already ARE AIs that can recognize multiple subgroups though.)