source file: mills2.txt Date: Sat, 14 Oct 1995 07:39:20 -0700 From: "John H. Chalmers" From: mclaren Subject: Tuning & psychoacoustics - post 20 of 25 --- As is now evident, the Fourier transform is at best poorly suited to the analysis of real-world sounds. Noise, inharmonic partials and radical phase changes insure that many real-world instrument tones are poorly modelled as a sum of pure sinusoids near-constant in magnitude and phase. This is complicated by the fact that "...much of the characteristic sound of an instrument is in its transient regions, such as the attack portion of its tone." [Moorer, J. A., "Signal Processing Aspects of Computer Music - A Survey," Computer Music Journal, Vol. 2, No. 2, pg. 7] To date, the aperiodic and chaotic attack portions of instrumental notes have consistently resisted analysis, as have the residual stochastic portions of the sound which cannot be analyzed successfully by current techniques: a variety of work-arounds are generally used to "smooth-over" the radical phase and magnitude discontinuities generated by Fourier analysis of the attack transients, or to parametrize the chaotic-attractor "residual" and mimic it with some kind of bandlimited noise generator added to the resynthesized signal. "Each...spectrogram tells how amplitude and phase vary as a function of frequency. This process of analysis and resynthesis has been called a phase vocoder. Serra made use of the process to his ends, taking successive spectra at intervals of around 10 milliseconds. "Such successive spectra do not in themselves give a deep insight into musical sounds. Serra's innovation was to use successive spectra in dividing the signal into two parts--and deterministic, or predictable part, and a stochastic, or unpredictable, noisy part. The deterministic part Serra took to be clear peaks which in several successive spectra change just a little in apmlitude and phase. This part of the spectrum Serra resynthesized by generating the individual sinusoidal compoennts whose amplitudes, frequencies, and phases changed with time in the fashion indicated by the successive spectra. (...) [Serra] replaced this [non-deterministic] part of the spectrum with a noise that had roughly the same overall spectrum as his stonachastic part but that didn't match it in waveform. "Serra tested this division of the signal into a deterministic and stochastic part...by listening separately to the deterministic and stochastic parts, and then adding them and listening to their sum. A piano sound reconstructed from the deterministic spectra alone didn't sound like a piano. With the stochastic or noise portions added, it sounded just like a piano. The same was true of a guitar, a flute, a drum, even the human voice." [Pierce, J.R., "The Science of Musical Sound," 2nd ed., 1992, pg. 106] Moreover, Fourier techniques work even to a rough approximation only with a small class of harmonic-series instruments (Western brass instruments, double reeds and strings, the harp). "Most percussion instruments (drums, bells) are inharmonic. This means that to describe them with sinusoids aften requires a large number of such. There are some synthesis techniques for creating what often turn out to be quite convincing drum-like or gong-like sounds, but to date there are few analysis techniques that can be used with inharmonic sounds." [Moorer, J. A., "Signal Processing Aspects of Computer Music - A Survey," Computer Music Journal, Vol. 2, No. 2, pg. 7] Bearing in mind that this puts all of Javanese and Bainese and Thai and most African and South American music off-limits to Fourier analysis (because these cultures use mainly inharmonc instruments), the value of Fourier analysis becomes questionable in the context of world music. The net result is that conventional FFT analysis *sometimes* tells us *something* about what is going on in *portions* of *a few notes* played by *a few instruments.* However, the Fourier transform is far from the universal mathematical Swiss Army Knife it has been touted as being. For example, suppose we try to increase frequency resolution by taking an FFT of tens of thousands of points--we have only 2 ways of doing this. [1] Add tens of thousand of zeroes padded at the end of each wavecycle, which merely refines the accuracy with each bin's frequency is specified but does not tell us anything about what's going on between the frequency bins (where most of the interesting and complex behavior of real-world intstruments takes place); or [2] we can extend our FFT over multiple wavecycles, which does give us some information about what's going on between frequency bins because the fundamental of the wavecycle is apt to change over the couse of several period--but this dodge lumps all the spectral changes in 2, 3, or more wavecycles into a single analysis frame and thus "smears out" the spectral changes of the sound in time. If this sounds like "Catch-22," guess what? It is. Heisenberg's Uncertainty Principle is actually an outgrowth of the basic characteristic of wave motion: to wit, you can accurately measure *either* the frequency *or * the period of a changing waveform, but you cannot measure *both * precisely at the *same time. * And when you increase the precision with which you measure the wavetrain's period, you consequently decrease the precision with you measure the wavetrain's frequency. This has profoundly important consequences for the FFT. [1] Increasing your time resolution (that is, making the FFT snapshots closer together in time) deceases your frequency resolution (because you must therefore take smaller FFTs over those smaller time-lengths). [2] Increasing your frequency resolution (that is, taking the FFT snapshot over a bunch of different spectrally-evolving wavecycles) decreases your time resolution (because all the spectral changes in each frequency line are lumped into a single frequency and averaged over the number of wavecycles you're looking at). [3] Increasing the number of phase points, to give you a more precise measurement of the exact amount by which partial is detuned from the others, also increases the number of frequency bins--and to do this you must extend the FFT over a longer time-period, which in turn means you're lumping your phase changes together and averaging them out, which entirely defeats your purpose. [4] Decreasing the number of frequency points, to narrow down the time window over which you take the FFT "snapshot," also decreases the number of phase points--which divides the fundamental frequency in a smaller number of divisions and defeats your purpose by making the measured frequency changes coarser in the time domain. Because the discrete Fourier transform is cyclic and imposes an infinite periodicity (both supersonic and subsonic) on your spectrum, you must use a limited fixed sampling rate and band-limit your input samples to avoid aliasing (that is, to avoid the lowest supersonic and the highest subsonic frequency bins from bleeding into your audible frequency bins and contaminating them with inharmonic-sounding spurious garbage). But once you fix your samling rate, you've thrown out all information between samples. This means that there's no way to reconstruct the detail in the waveform "between" the samples because it's gone. You've dumped it out. You've thrown out the baby with the bathwater. The price you pay for perfect reconstruction of a signal is that the signal that spews out of your mathematical analysis algorithm is sometimes quite different from the real analog signal that came in. "Sometimes" because if there's very little noise and the partials are almost perfectly harmonic, you get good results with the Fourier transform even in its discrete version. Alas, most sounds are *not* noise-free and perfectly harmonic. "If you have a hammer, everything in the world looks like a nail." This is nowhere more true than of the Fourier transform. Many writers have begun articles on tuning or acoustics with statements along the lines of: "Musical sounds are made up of sinusoidal frequency and phase components..." No! Wrong! Completely false! Classic error. These people have mistaken the *map* for the *territory.* They have confused the *mathematical model * of the physical phenomenon with the *physical phenomenon * itself. *Sounds * are displacements of air molecules. *Sinusoids * are ideal mathematical entities infinite in temporal extent and perfectly periodic. Sometimes one or another mathematical model works well in analyzing acoustical phenomena; other times they all work well, sometimes *none * yield useful results. Different mathematical techniques and different conceptual models are required for different acoustical phenomena, as Risset points out. There is no "one size fits all." Yet this this is exactly what the Fourier tykes would have us believe. The universe exhibits what the mathematicians call "the inexhaustibility of the real." Goedel proved this in 1930: the universe is ultimately more complex than any statements we can make about it mathematically. Or, to put in another way, there are an infinite number of true but unprovable propositions. Clearly proponentsn of this or that tuning system who write circularly- reasoned arguments a la "that's the beauty of mathematics--it has an inescapable logic" haven't collided with the real world in the form of a sound that turns to junk when you Fourier analyze it. For example, breathy vocal sounds. Or flute multiphonics, or a cymbal clash, or gamelan bar note. It's worth remembering Fourier never came up with his transform to solve the problem of frequency analysis. He used it as a clever dodge to solve the problem of heat comduction in a metal bar. It worked well on that problem, but it has since been greatly extended--in some cases, overextended. The wavelet transform, a probably superior mathematical method for analyzing acoustic phenomena, dates from only 1987. Since then we've had very few useful & powerful transforms. Mathematical progress has been slow. The Walsh Transform is no big help-- it is acoustically "brittle" and is too few Walsh components are used in a reconstructing a sound, the output is intolerably buzzy and distorted- sounding. Bart Kosko's fuzzy Kalman filter promises some insight into acoustic transformations but the amount of ground gained has been small...and the rate of progress slow. In the end, the real world is very *very* VERY complex. Our linear parametric mathematical models models apply with accuracy only to an tiny class of physical phenomena. It has become increasingly (distressingly!) clear over the last few years that sounds are not "infinite harmonic wavetrains containing perfectly harmonic overtones with a few stochastic ergodic noise components mixed in." Rather, real computer analysis of actual instrument timbres shows with brutal clarity that real-world sounds run the entire gamut from chaotic strange attractors to pure noise to semi-noise/semi-pitched to strongly pitched narrowband noise to mostly harmonic sounds with a rock-steady fundamental. No one mathematical analysis technique is adequate to model all these ranges of behavior. And when you realize that something as simple as the flute can exhibit the entire range (overblown multiphonics, semi-pitch "breathy" whispery notes, flutter-tongued notes, strongly pitched notes in the lower registers with a steady fundamental) you start to realize that the FFT is useful for only a very limited class of musical sounds. Thus it is hardly surprising that the to-date largely FFT-obsessed model of the ear as frequency analyzer has had extremely limited success in explaining real- world musical preferences, the effects of real-world intervals on listeners, and the real-world propensity of composers, performers and audiences to prefer intervals not well defined by the integer eingenvalue vocabulary of the Fourier transform. --mclaren Received: from eartha.mills.edu [144.91.3.20] by vbv40.ezh.nl with SMTP-OpenVMS via TCP/IP; Sat, 14 Oct 1995 17:30 +0100 Received: from by eartha.mills.edu via SMTP (940816.SGI.8.6.9/930416.SGI) for id IAA19603; Sat, 14 Oct 1995 08:30:29 -0700 Date: Sat, 14 Oct 1995 08:30:29 -0700 Message-Id: <9510140829.aa01987@cyber.cyber.net> Errors-To: madole@ella.mills.edu Reply-To: tuning@eartha.mills.edu Originator: tuning@eartha.mills.edu Sender: tuning@eartha.mills.edu