Audible sounds are vibrations at certain frequencies of a medium around us. Sound waves always propagate through a physical compressible medium, such as the air. The air oscillates longitudinally back and forth, and causes our tympanic membrane to vibrate in the same way. However rather than noticing differences in the air pressure around us, we recognise the frequencies at which the vibrations occur. This already constitutes a natural level of abstraction between the information which is recognised by our ear, the differences in air pressure, and the information which is perceived by our brain, the corresponding frequencies.
The same is the case for digitally recorded sound waves, the audio data, which can be used for digital sound analysis: They are air pressure values at repetitive moments in time. The volume of the sound waves can be extracted from those data easily, it is defined by the maximum and minimum pressure value within a certain duration.
In order to extract the frequency values from those data, a mathematical method is used called the Fourier Transform. Using the Fourier Transform all the simultaneously occurring frequencies of a sound can be calculated, and from those it is then possible to define the base frequency of a tone together with its overtone spectrum, and based on that any other information relating to the frequency aspects of the sound. The frequencies and volumes of a sound usually change over time, which is reflected in the analysis.
However, precisely defining a frequency requires constant data for a certain length of time, which do not exist for a sound which is changing quickly, and as a consequence the outcomes of the Fourier Transform will result in blurred data.
In order to sharpen those data and to extract precise frequency information, a reassignment of the values to their closest peak can be applied. This reassignment requires a time window length to be assigned, which can be chosen as longer or shorter, and which will enhance the focus on either the harmonics or the impulses of the sound. In a complex harmonic situation, such as in rapidly changing spectrums or in human speech, we can choose at which intensity to focus on either the harmonics or the impulses, but we are not able to precisely define both at the same time.