As explored previously about Minim I have been looking at its analysis features in the library that allow processing to determine certain features and variables within a sound file. This was done through a feature called a beat listener that relayed 3 outputs to processing which could be manipulated. From further exploration of this feature I found that there was not a real music relation to what the library was exactly detecting, it was more a pitch that was being recognised.
Another method which I haven’t explored yet is Frequency analysis. The Minim library provides an example of this which log averages are used to distinguish the audio into divided parts. It’s described as
- “FFT you will get a frequency domain described by two arrays that are each 1024 values long”
- “Each point of the FFT describes the spectral density of a frequency band centred on a frequency that is a fraction of the sampling rate.”
- “Given a sample buffer of 1024 samples that were sampled at 44100 Hz, a 1024 point FFT will give us a frequency spectrum of 512 points, with a total bandwidth of 22050 Hz”
- “We’ve got an FFT with 512 spectrum values, but we want to represent the spectrum as 32 bands, so we’ve decided to simply group together frequency bands by averaging their spectrum values.
- i/1024 * 44100 whose bandwidth is 2/1024 * 22050 = 43.0664062 Hz, with the exception of spectrum and spectrum, whose bandwidth is 1/1024 * 22050 = 21.5332031 Hz.
Log averages is using an FFT to break the imported/loaded file into a buffer size of 512 or 1024 which is a fraction/ frequency points on the point of sound. These fractions of the buffers range from 44100 or 22050 Hz depending on the selected size. From this the buffer size can be broke down into even sections of which averages can be taken. In this example it’s broken down into 32 averages.
The first issue with this is that the most useful section of the sample sixe is below 15000 Hz. This doesn’t discard the upper section its just reach less, meaning there is little activity. It would become useful when breaking down music which uses high pitch tones like dance music, which has sharper increases. Secondly the lower sections will have far too much activity meaning the will seem to remain at the same level. This requires them to be broken down into smaller averages to give them more specific values.
Minim recommends the following averages to give the best range of results. From a sample size of 44100Hz
11025 to 22050 Hz
5512 to 11025 Hz
2756 to 5512 Hz
1378 to 2756 Hz
689 to 1378 Hz
344 to 689 Hz
172 to 344 Hz
86 to 172 Hz
43 to 86 Hz
22 to 43 Hz
11 to 22 Hz
0 to 11 Hz
Understanding the example
Using this knowledge to understand the example….
This image and video shows
- Activity of FFT points shows there is activity across the entire spectrum even a small amount in the high sections
- Even averages applied to FFT makes the top 40% of the frequency points not change at all even though you can see clearly there is activity in the section but averaging it out make this minuscule
- Log averages applied to the FFT spreads the average spectrum sizes more dependant on lower activity meaning the higher sections are given a larger sample size meaning they have more frequency point to react too. This gives the averages more of spread relevant to the audios active FFT points.
In regards to my issue of a more reliable and accurate way to distinguish musical representational I feel this could be a solution. From this I could take each average section and apply if statements similar to the beatListner example to make graphical elements react to the audio.