Gerhard Junker (Vienna / Austria)
An empirical study on the psychoacoustical phenomenon of temporal masking effects.
Summary
This study is founded upon the model of subjective tone duration and represents an analysis referred to musical application of the psychoacoustical phenomenon of temporal masking effects. Within an empirical experiment ten musicians performed phrases of the F-major-Invention by J.S. Bach and were recorded by the means of a MIDI-sequencer. A variation of the used synthetic piano-sounds and organ-sounds was executed, relating to the parameters articulation, envelope and dynamics. Sounds with subjectively longer auditory impression of an objectively measurable shorter sound sensation were contrasted with objectively longer sounding tones. The realized statistic evaluation confirmed a significant influence of the specific envelope for the performed notelength, especially for the articulation portato.
Zusammenfassung
Eine empirische Studie zum psychoakustischen Phänomen der subjektiven Tondauer.
Diese Studie basiert auf dem Modell der subjektiven Tondauer und stellt eine auf die musikalische Anwendung bezogene Analyse dieses psychoakustischen Phänomens dar. In einem mit zehn Probanden vorgenommenen praktischen Versuch wurden Phrasen der F-Dur-Invention von J.S. Bach mit einem softwaregestützten MIDI-Sequenzer aufgezeichnet. Das verwendete synthetisierte Klavier- und Orgelklangmaterial unterlag dabei einer Variation der Parameter: musikalische Artikulation, Hüllkurve sowie Anschlagdynamik. Klänge mit subjektiv längerem Höreindruck eines objektiv meßbar kürzeren Schallereignisses wurden im Experiment objektiv länger klingenden Tonimpulsen gegenübergestellt. Die statistische Auswertung zeigte besonders für die Artikulation portato einen deutlichen Einfluß der Hüllkurve eines Ausgangsklanges auf die tatsächlich gespielte Notenlänge.
Sommaire
Une étude empirique sur un phénomène psychoacoustique: les effets temporels dans le masquage.
Cette étude - fondé sur le modèle de la durée subjective du son - traite de l’application musicale du phénomène psychoacoustique des effets temporels dans le masquage. Dans une étude empirique dix musiciens ont eu pour tâche de jouer des phrases de l’ Invention en fa-majeur de J.S. Bach, enregistrés par un MIDI-sequencer. Les sons synthétiques du piano et de l’orgue, qu’on a pris pour ce test, ont été variés au point de vue de l’articulation musicale, de l’envelope et de la dynamque. Des notes qui donne l’impression d’être plus longues qu’ils sont en realité ont été comparées avec des notes effectivement plus longues. L’exploitation statistique confirme que l’envelope a une influence significative sur la longueur de la note, surtout dans le cas de l’articulation portato.
Keywords
cognitive musicology, envelope-variation, masking, MIDI, MIDI-sequencer, musical interpretation, psychoacoustics, psychomusicology, sound analysis, subjective tone duration, temporal masking effects.
1. Introduction
The subject of masking is of considerable importance to music. There are some well-known papers (Wegel and Lane 1924, Egan and Hake 1950, Feldkeller and Zwicker 1967) and experiments refering to the masking effect and other psychoacoustical phenomena in this century. New aspects have been explored by Zwicker (1970) and after all by Fastl (1974, 1975, 1979, 1981, 1982). The past research primarily explored the basic knowledge of masking in an universally valid way. Based on scientific findings about temporal masking effects, this study especially inspects the human response and attitude in a music-orientated application at the appearance of masking patterns. As a result of this paper performance rules were elaborated to make the findings available for MIDI-software-developing and other interested persons or institutions as a point of attachement.
Every sonic phenomenon requires some duration. The subjective toneduration Ts is characterized as the specific auditory sensation, for the duration of sound with given physical data Ti (Fastl 1975). Fig. 1 shoes an idealized model for a masking threshold. The dotted line represents the threshold of audibility, the subjective toneduration Ts is defined (in this connection) 10 dB above the absolute threshold, which is the minimum stimulus that evokes a response in a specified fraction of the test. It is noticeable in Fig. 1, that the masking effect is extended over the objective toneduration, that means it is not limited to the physical presence-time of the test-tone impulse.
|
|
Fig. 1: Masking threshold for a short test-tone impulse Ti at the masking level Lt. |
In this paper no additional experiments connected to basic masking research are explained. The main interest of this study is the practical application of the actual psychoacoustical knowledge by performing musicians. Evident researches on subjective auditory sensation should be applied instinctively by instrumentalists; artists should play notes of existing scores automatically in the subjectively correct length; to find out the need for an implementation of this psychoacoustical knowledge in existing MIDI-sequencers was the secondary interest of the experiment.
The purpose of this study was to draw a conclusion how these interactions between objective physical duration-parameters and the specific interpretation are managed by performing musicians. Therefore the following experiment was executed:
2. Procedure
By the aid of a digital sampler (Roland S 760, 44.1 kHz, 10 MB Sample-RAM) the envelopes of synthetic pianosounds and organsounds were varied. This was done by the use of an envelope follower (time variable amplifier, TVA). The same sound was supplied with three different dynamic envelopes (Fig. 2): dynamic envelope 1 represents a piano-sound or organ-sound (Ti = 121ms) reaching the maximum intensity (0 ms attack-time) rapidly. After the MIDI-note-off there will be no decay, the sound will fall back to silence without any delay (0 ms release-time). For the second dynamic envelope the attack-times and release-times were selected shorter than the masked threshold (Fastl 1975: 289). In this experiment Ti for envelope 2 was fixed between 160 ms and 170 ms, depending on sound and frequency. The attack-time was less than 15 ms, the release-time therefore about 30 ms. In case of the dynamic envelope 3 the objective toneduration was selected longer than the masking threshold of the original piano-impulse or organ-impulse used with envelope 1, that means, the objective and the subjective auditory sensation is approximately equal (Ti = Tm’).
|
|
Fig. 2: Variation of the envelope (1;2;3) of a test-tone impulse Ti at the masking level Lt: The dotted line represents the masked thresholds of the original sound Ti. Tm specifies the subjective toneduration. Ti = Tm’ (envelope 3): equality of the subjective and objective toneduration. |
"The attack is normally the most tonally complex part of the envelope, and provides most of the auditory attributes by which one instrument is distinguished from another. Although only a small fraction of a second of the sound will have been removed, this is often sufficient to render the resultant sound unrecognizable." (Dobson 1992, 68-69) There are many reasons to suppose, that musicians will adapt their way of playing by the different appearance of dynamic envelopes. Envelope type 1 and type 2 hypothetically should be treated in the same way, while envelope 3 should cause a shorter performance of the note length.
3. Experiment
The experiment (Fig.3) was executed with 10 pianists, who met all requirements for the musical qualification. Each of them had to play the first phrase (4 measures of the soprano-voice) of the F-major-Invention by J.S. Bach for several times in the articulation legato, portato and staccato. The four measure phrase was played on a masterkeyboard (Roland A 80) and recorded on a MIDI-sequencer (Cubase Score 1.0). Both the piano-sound and organ-sound were performed with an enabled MIDI-velocity-sensitivity, within one execution of each articulation type and envelope type, the MIDI-velocity of the samplers organsound was disabled. This was in order to find out if there is an influence in the instruments touch-sensitivity on the musical performance. The tempo of the performance was 120 quaternotes per minute (1/8-note = 250 ms, 1/16-note = 125 ms) for all articulation-types. So it was faster than it would be performed in a usual interpretation. This was necessary to fullfill the limit for masking thresholds (Zwicker 1970, Fastl 1982).
|
|
Fig. 3: Structure of the experiment: variation of articulation (by the musician), envelope and sound (digital sampler), recorded with a software-based MIDI-sequencer. |
To execute all possibilities, the four-measure-phrase had to be played 27 times; to avoid arbitrary results, a total of 54 repetitions had to be performed. The overall duration of the experiment was about 30 minutes. The succession of envelope-variation was a matter of chance. The instrument-sounds (organ or piano) and the chronological order of the articulation were not mixed to guarantee a constant musical feeling during the testsession and to avoid a distraction of the musicians. All persons were instructed to retain the articulation as requested, irrespective of possible nonsense for the interpretation or their musical realization. Before starting the experiment, an unrestricted practice-time for each person was included, to give all musicians the possibility to feel comfortable on the keyboard, and to get an auditory impression of the forthcoming soundmaterial.
4. Evaluation
To keep a maximum of flexibility for the evaluation and interpretation, every single MIDI-recording was transformed seperately to an ASCII-file (per person 54 files, each with 35 MIDI-notes containing 35 velocity- and notelength-values). This procedure was done by a special PC-program written in C. The statistical computer-analysis was made with the program SPSS 6.0.1® for Windows®.
Figure 4 gives an example of a single performed phrase-profile, including the parameters notelength and MIDI-velocity of one person. This style of representation allows an exact and individual interpretation of the musical phrase shown below. Each note is determinable in notelength and velocity at first sight, characteristic attributes of the phrase are well readable. The last note (No. 35) was excluded from the evaluation. The sequencer value for the MIDI-notelength (variable "Notenlänge") is represented by the ppq-factor (ppq = pulses per quaternote). In this sequencer application one quaternote is subdivided in 384 ppq, therefore 1ms is approximately equivalent to 1.3 ppq at a tempo of 120 quaternotes per minute. The degree of accuracy in this experiment was better than 0.7 ms.
|
|
Fig. 4: Example for a profile over the notelength and the velocity. person 1, instrument organ (touch sensitive), articulation legato, envelope 1 (HK 1) variables: lang - notelength |
The following schedule gives an impression of the evaluation for ten people: this summary of the 27 possibilities is classified in two categories: the arithmetical mean of quavers and semiquavers of the whole phrase. According to expectation, the notelength for the envelopetypes 1 and 2 indicates a significant correlation for all articulations and sounds. The comparison of the notelengths for envelope 3 shows interesting results: the arithmetical mean displays no or miniscule deviation, in most cases less than 3 ms in the notelength (for all instrument-sounds played under the requested articulation legato). Concerning the articulation staccato nearly the same result appeares, excepting irregularities. Finally, the prognosticated variation for the articulation portato appears: playing with envelope 3 the notelength was shortened up to 11ms.
|
Notelength |
piano |
organ |
organ |
|||||||
|
quavers (1/8) |
velosity sensitive |
velosity sensitive |
not velosity sensitive |
|||||||
|
articulation |
env. |
Xam |
S |
Xme |
Xam |
S |
Xme |
Xam |
S |
Xme |
|
1 |
201.98 |
27.2 |
201 |
199.43 |
22.8 |
199 |
201.67 |
20.7 |
200 |
|
|
legato |
2 |
201.53 |
24.9 |
199 |
199.25 |
37.4 |
198 |
200.95 |
23.5 |
199 |
|
3 |
200.52 |
23.7 |
198 |
197.62 |
24.0 |
197 |
198.74 |
23.6 |
197 |
|
|
1 |
112.85 |
25.2 |
113 |
115.31 |
26.1 |
117 |
115.58 |
25.3 |
117 |
|
|
portato |
2 |
110.48 |
27.1 |
112 |
114.40 |
25.9 |
117 |
113.85 |
27.7 |
115 |
|
3 |
107.34 |
29.1 |
106 |
110.00 |
22.8 |
111 |
105.44 |
23.5 |
106 |
|
|
1 |
60.44 |
18.6 |
58 |
66.93 |
17.7 |
64 |
66.05 |
17.9 |
64 |
|
|
staccato |
2 |
59.55 |
18.1 |
58 |
64.l63 |
16.3 |
64 |
66.47 |
17.3 |
65 |
|
3 |
59.42 |
14.2 |
60 |
63.65 |
15.0 |
63 |
65.15 |
16.9 |
63 |
|
|
notelength semiquavers (1/16) |
||||||||||
|
articulation |
env. |
Xam |
S |
Xme |
Xam |
S |
Xme |
Xam |
S |
Xme |
|
1 |
105.71 |
19.9 |
105 |
105.52 |
18.7 |
104 |
104.81 |
17.9 |
102 |
|
|
legato |
2 |
105.77 |
20.5 |
105 |
103.53 |
19.8 |
103 |
105.40 |
18.5 |
103 |
|
3 |
104.94 |
19.6 |
105 |
103.48 |
17.8 |
101 |
103.61 |
18.7 |
101 |
|
|
1 |
78.73 |
21.4 |
77 |
77.65 |
19.0 |
75 |
78.37 |
20.3 |
75 |
|
|
portato |
2 |
78.89 |
22.9 |
76 |
77.01 |
20.1 |
74 |
76.44 |
20.3 |
73 |
|
3 |
77.18 |
22.4 |
74 |
74.50 |
19.8 |
72 |
72.46 |
20.2 |
70 |
|
|
1 |
58.44 |
20.2 |
55 |
59.16 |
19.0 |
57 |
59.53 |
18.5 |
57 |
|
|
staccato |
2 |
58.85 |
19.2 |
56 |
58.95 |
17.8 |
56 |
58.21 |
18.0 |
57 |
|
3 |
57.36 |
18.1 |
57 |
57.62 |
17.2 |
55 |
55.63 |
16.2 |
55 |
|
Abbr.: env. = envelope, Xam = arithmetical mean, S = standard deviation, Xme = median.
A more detailed evaluation of the whole sample as shown above made some global trends evident (Junker 1995). One step further a clusterevaluation was made to analyse optional similarities of single musicians.
4.1 Clusterevaluation
The clusterevaluation was used to establish groups of persons with regular attributes. Due to the most evident deviations the primary emphasis was put on a significant playingstyle and a typical choice of notelength for a specific envelope-variation of the articulation portato. Other attention of the analysis was directed to the possible sensitivity of people for different instrument sounds. Irregular data of staccato-phrases was excluded from the analysis. As a result of the primary constant regularity for the legato-notelength, this articulation type had a minor influence on the final results.
Two categories with similar attributes could be isolated, one person was excluded from the interpretation because of irregularities:
Five musicians form cluster 1 (this group of people can be characterized as follows): the variation of the envelope has no significant influence on the deviation of the performed notelength (Fig. 5). The comparison made for all three envelopetypes results in an average deviation less than 3 ms. Equally, no sensitivity could be detected in this category for specific instrument-sounds.
The description of Cluster 2 (4 persons) is much more interesting: as with cluster 1, there is no significant influence in this group for the envelope-variation playing the phrases legato and staccato; under the premise to articulate portato there appear especially remarkable differences in the notelength. The correlation between phrases performed with envelope 1 and 2 is significant, for the envelope type 3 the performed notelength is shortened almost to 40 ms (Fig. 5), in single cases the shortening was even more. It is noticeable for cluster 2, that the major deviations are at note No.17 to No. 21. Especially remarkable is the tendency after note No. 3: detectable is a more or less parallel proceeding of the graphes "LANG1" and "LANG2". Curve "LANG3" shows a significant deviation, obviously from this point the persons of cluster 2 adapted their performance behaviour because of the modified envelope. From note No. 4, the notelength-values for "LANG3" are almost always shorter.
|
|
Fig. 5: Comparison of notelength- profiles for cluster 1 (above) and cluster 2 (below). instrument organ (not touch sensitive), articulation portato. variables: |
Opposed to the more insensitive musicians of cluster 1 (Fig. 6), the persons of Cluster 2 show a sensibility for specific instrument-sounds, especially at the articulations portato and staccato. Phrases played with the piano-sound resulted in an eminently shorter notelength. The maximum notelength-reduction is about 35 ms at note No. 18. For the semiquavers there is no significant regularity of deviation detectable. The fascinating thing is that the deviation seems not to be forced by any specific envelope. There is no adaption after some played notes, the reduction of the notelength for cluster 1 starts with the first performed note. Obviously, people in this group have a different affinity or relationship to the alternately used instrument sounds in this experiment. It is an interesting fact that this characteristic feature did not appear for the articulation legato.
|
|
Fig. 6: The influence of the used instrument sounds to notelength-profiles for cluster 1 (above) and cluster 2 (below). articulation portato. variables: |
The next graph (Fig. 7) refers to the question: "If there is an influence of different articulation styles on the performed MIDI-velocity?" A preponderate regularity is indicated, the progress of the curve diagram is parallel; just for the articulation staccato appears an irregularity at note No. 9. On an average the articulation "portato" has the highest dynamic coefficent, tendentious the velocity-value for legato is the lowest. The analysis, referring to the influence of the envelopetype, did not result in significant facts. Figure 8 gives a demonstration for the remarkable correlation, all envelope adjustments result in almost congruent developments of the curves. In the same way, there couldn’t be established any typical influence for the use of the non velocity sensitive organsound. Additional researches resulted in no significant indication of a specific sensitivity for using touch senitive or not touch sensitive sounds.
|
|
Fig. 7: The influence of the articulation to velocity - profiles (n=10). instrument: organ, envelope 1; touch sensitive |
|
|
Fig. 8: The influence of the envelope to velocity - profiles for cluster 1. instrument: organ, articulation "portato"; touch sensitive variables: VELO HK1-envelope 1 |
Proceeding on the hypothesis that the parameter notelength and velocity are inversly proportional this study did not result in gratifying confirmation. No significant or fundamental characterization which indicates a shorter notelength for a high velocity-value (and vice versa) could be remarked, neither for a specific articulation, nor for any instrument sound or envelope used in this experiment.
|
|
Fig. 9: confrontation of notelength and velocity. instrument: organ, touch sensitive. articulation "legato", envelope 2. variables: |
Figure 9 shows an example of the confrontation of notelength and velocity. Both variables (var.: LANG and VELO) are indicated by the arithmetical mean (fat line), the standard deviation (box) and maximum/minimum (thin line, cumulation 95%). Any proportion can be inspected and interpretated note by note. For example the extended notelength ranges for note No. 10, No. 13 and No. 22, or the relative constant value for the MIDI-velocity of each single note.
5. Summary of facts
The results of this study are:
The experiment, using an softwarebased MIDI-sequenzer, turned out to be a reliable method. To improve the psychoacoustical performance of software based MIDI-sequenzers the aspects of the following model could be applied (Fig. 10).
|
|
Fig. 10: Sequencermodel for six crotchets based on the findings of this study for the articulation portato. |
Figure 10 shows a simplified description of a MIDI-sequenzer listeditor, including six crotchets. Different dynamic values are indicated by shading (pp to ff). The graph below the listeditor displays in a simplified way the expected masking threshold of each single note depending on the velocity and the envelope type. MIDI-notes over the whole length represent sounds performed with envelope type 1 and 2. MIDI-notes without the brigth shaded elongation stand for sounds with a long attack-time and release-time. It is quite natural that in this case the objective toneduration for the synthetic instrument is longer than the duration of the MIDI-note, and logically longer than the intellectual model of masking threshold for the MIDI-notes.
This paper is a abridged version of a more detailed study (Junker 1995). The description in rough outlines represents not a final result. To advance a model for practical realization more extensive series of experiments would be necessary. This article introduces just a method for finding a future way to design a better (more humanized) generation of computers for the musical creativeness.
References
Ackermann, Philipp
1991: Computer und Musik. Eine Einführung in die digitale Klang- und Musikverarbeitung. Wien: Springer Verlag.
Backus, John
1977: The acoustical foundation of music. New York-London: Norton & Company.
Bengtsson, Ingmar and Gabrielsson, Alf
1983: Analysis and synthesis of musical rhythm. In: Sundberg, J.: Studies of music performance: 27-60. Stockholm.
Burghardt, H.
1972: Zusammenhang zwischen subjektiver und objektiver Dauer von Schallen. München.
Butler, David
1982: The musicians guide to perception and cognition. New York: Schirmer Books.
Cogan, Robert
1984: New images on musical sound. Cambridge-London: Harvard University Press.
Desain, Peter and Honing, Henkjan
1990: The quantisation of musical time: a connectionist approach. In: Computer Music Journal 14/3: 56-66. Cambridge.
Dobson, Richard
1992: A dictionary of electronic and computer music technology. Instruments, terms, techniques. Oxford: University Press.
Egan and Hake
1950: On the masking pattern of a simple auditory stimulus. In: JASA 22: 622-630.
Fastl, Hugo
1974: Mithörschwellen als Maß für das zeitliche und spektrale Auflösungsvermögen des Gehör. München.
1975: Mithörschwelle und subjektive Dauer. In: Acustica 32: 288-290.
1979a: Der Einfluß der zeitlichen Struktur von Tönen auf die Addition von Teillautheiten. In: Acustica 21: 16-25.
1979b: Temporal masking effects. In: Acustica 43: 282-294. Stuttgart.
1981: Psychoakustik musikalischer Klänge und ihre Beziehung zur Musiktheorie. In: Terhard, E. et al. Stand und Entwicklung der musikalischen Akustik, 71-76. Umschau.
1982: Beschreibung dynamischer Höremfindungen an Hand von Mithörschwellen-Mustern. Freiburg: Hochschulsammlung Ingenierwissenschaft Nachrichtentechnik Band 7, Hochschulverlag.
Feldkeller and Zwicker
1967: Das Ohr als Nachrichtenempfänger. Stuttgart.
Fletcher, Neville H.
1991: The physics of musical instruments. New York: Springer.
Junker, Gerhard
1995: MIDI-Synthese in der kognitiven Musikologie: eine exemplarische Studie des psychoakustischen Phänomens der subjektiven Tondauer. Diplomarbeit Universität Wien.
Moore, Brian C.J.
1985: An introduction to the psychology of hearing. London
Palmer, C. and Brown, J.C.
1991: Investigations in the amplitude of sounded piano tones. In: JASA 90/1: 60-66.
Repp, Bruno H.
1992: Some empirical observations on sound level properties of recorded piano tones. In: JASA 93/2: 1136-1144.
Roederer, Juan G.
1993: Physikalische und psychoakustische Grundlagen der Musik. Berlin: Springer.
Sundberg, Johan
1983: Studies of music performance. Stockholm: The Royal Swedish Academy of Music.
Sundberg, Johan
1991: The science of musical sounds. New York: Academic Press.
Wegel and Lane
1924: The auditory masking of one pure tone by another and its probable relation to the dynamics of the inner ear. Phys. Rev. 23, Ser. 2: 266-285.
Zwicker, Eberhard
1970: Subjektive und objektive Dauer von Schallimpulsen und Schallpausen. In: Acustica 22: 214-218. Stuttgart.
1982: Psychoakustik. Berlin: Springer.
Gerhard Junker (Vienna / Austria): An empirical study on the psychoacoustical phenomenon of temporal masking effects.