Enveloppe temporelle et structure temporelle fine

Cet article est une ébauche concernant les neurosciences.

Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants.

L’enveloppe temporelle (ENV) et la structure temporelle fine (TFS) sont des variations d’amplitude et de fréquence dans les sons. Ces fluctuations temporelles sont responsables de nombreux aspects de la perception auditive, notamment la perception de la sonie, de la hauteur et du timbre, et la localisation sonore.

Lorsqu’ils sont traités par le système auditif périphérique, les sons sont décomposés sur un ensemble de bandes fréquentielles. Les signaux à bande étroite résultants véhiculent de l’information à différentes échelles de temps, allant de moins d’une milliseconde à plusieurs centaines de millisecondes^[1]^,^[2]^,^[3]^,^[4]^,^[5]^,^[6]^,^[7]. Une séparation entre l’enveloppe temporelle (les fluctuations lentes) et la structure temporelle fine (les fluctuations rapides) dans chaque bande de fréquence a été proposée pour expliquer différents aspects de la perception auditive. Une série d’études psychophysiques, électrophysiologies et computationnelles basées sur cette dichotomie enveloppe temporelle / structure temporelle fine ont examiné le rôle de ces indices temporels dans l’identification de sons et la communication, comment ces indices sont traités par le système auditif central et périphérique, ainsi que l’effet du vieillissement et des dommages cochléaires sur ce traitement. Bien que cette dichotomie soit encore débattue et que la question de l’encodage de ces indices par le système auditif reste ouverte, ces études ont donné lieu à un certain nombre d’applications dans des champs incluant le traitement de la parole, l’audiologie clinique, et la correction des surdités de perception au moyen d’implants cochléaires et de prothèses auditives.

Définition modifier

Les termes d’enveloppe temporelle et de structure temporelle fine peuvent recouvrir des notions différentes selon les études. Une distinction importante porte sur la différence entre les descriptions physique (c.-à-d. acoustique) et biologique (ou perceptuelle) de ces caractéristiques.

Tout son couvrant une bande de fréquences limitée (signal à bande étroite) peut être décrit comme une enveloppe (ENV_p, où p indique le signal physique) modulant une porteuse oscillant rapidement, la structure temporelle fine (TFS_p)^[8].

Dans la vie courante, la plupart des sons, notamment les sons de parole et la musique, sont large bande : ils sont distribués sur tout le spectre des fréquences, et il n’existe pas de façon bien définie de représenter le signal en termes d’ENV_p et TFS_p. Cependant, dans la cochlée fonctionnelle, les sons sont décomposés par la membrane basilaire en une série de signaux à bande étroite^[9]. Ainsi, la vibration de chaque cellule ciliée peut-elle être considérée comme une enveloppe ENV_BM superposée à une structure temporelle fine TFS_BM^[10]. Ces composantes dépendent de la position considérée le long de la membrane basilaire (BM). Au niveau de l’apex, répondant aux fréquences basses, les fluctuations d’ENV_BM et TFS_BM sont relativement lentes, tandis qu’elles sont les plus rapides à l’extrémité basale répondant aux fréquences hautes^[10].

Grâce à la transduction mécanoélectrique effectuée par les cellules ciliées, ENV_BM et TFS_BM sont tous deux transmis par le nerf auditif, sous la forme de potentiels d'action^[11], donnant naissance à ENV_n et TFS_n. TFS_n (la TFS neurale) est encodée principalement par les neurones accordés sur les fréquences audio basses. Au contraire, ENV_n (l'enveloppe neurale) est encodée principalement dans les neurones accordés sur les fréquences audio élevées^[12]^,^[13]. Dans un signal à large bande, il n’est pas possible de manipuler TFS_n sans affecter ENV_BM et ENV_n et, inversement, il n’est pas possible de manipuler ENV_n sans affecter TFS_BM et TFS_n^[14]^,^[15].

Rôles pour la perception de la parole et de la musique modifier

L’ENV_p joue un rôle crucial dans de nombreux aspects de la perception auditive, notamment pour la perception des sons de parole et de la musique ^[2]^,^[7]^,^[16]^,^[17]. Ainsi un son de parole reste-t-il intelligible dans une certaine mesure même lorsque l’information de TFS_p qu’il contient est artificiellement supprimée ^[18]. De même, lorsque la TFS_p d’une première phrase est combinée avec l’ENV_p d’une seconde, seuls les mots de la seconde phrase sont compris ^[19]. Les composantes de l’ENV_p les plus importantes pour la compréhension de la parole fluctuent à des cadences inférieures à 16 Hz (ce qui correspond à peu près au rythme des syllabes) ^[20]^,^[21]^,^[22].

Le traitement de l’information de TFS_p joue un rôle dans la perception de la hauteur des sons, une capacité importante pour la perception de la musique mais aussi pour la perception de la parole, puisqu’elle contribue à la prosodie de la voix. Ainsi les indices de TFS_p sont-ils importants pour l’identification du locuteur, et de ses émotions et intention, transmises par la prosodie ^[4]. Dans les langues à ton, ils ont également un rôle fondamental pour la transmission du contenu phonétique ^[23]. De plus, plusieurs études basées sur des sons de parole vocodés ont suggéré que les indices de TFS_p contribuent à l’intelligibilité ^[24]. Bien qu’il soit difficile sinon impossible d’isoler l’effet de la TFS_p de celui de l’ENV_p ^[17]^,^[25], certaines études menées sur des auditeurs malentendants indiquent que la perception de la parole en présence d’un bruit de fond nécessite un traitement efficace de la TFS_p ^[26]^,^[27]. Dans le cas de la musique, les variations lentes de l’ENV_p transmettent les informations de rythme et de tempo tandis que les variations plus rapides véhiculent les informations d’attaque et de coupure des sons, importantes pour la perception du timbre ^[28].

Voir aussi modifier

Enveloppe sonore

Références modifier

(Anglais) Cet article est partiellement ou en totalité issu de l’article de Wikipédia en anglais intitulé « Temporal envelope and fine structure » (voir la liste des auteurs).

↑ Neal F. Viemeister et Christopher J. Plack, Human Psychophysics, New York, Springer, 1993 (ISBN 978-1-4612-7644-9, DOI 10.1007/978-1-4612-2728-1_4, lire en ligne), p. 116–154
↑ ^{a et b} Rosen S, « Temporal information in speech: acoustic, auditory and linguistic aspects », Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, vol. 336, n^o 1278,‎ juin 1992, p. 367–73 (PMID 1354376, DOI 10.1098/rstb.1992.0070)
↑ Drullman R, « Temporal envelope and fine structure cues for speech intelligibility », The Journal of the Acoustical Society of America, vol. 97, n^o 1,‎ janvier 1995, p. 585–92 (PMID 7860835)
↑ ^{a et b} Moore BC, « The role of temporal fine structure processing in pitch perception, masking, and speech perception for normal-hearing and hearing-impaired people », Journal of the Association for Research in Otolaryngology, vol. 9, n^o 4,‎ décembre 2008, p. 399–406 (PMID 18855069, PMCID 2580810, DOI 10.1007/s10162-008-0143-x)
↑ De Boer E, « Pitch of inharmonic signals », Nature, vol. 178, n^o 4532,‎ septembre 1956, p. 535–6 (PMID 13358790)
↑ F. G. Zeng, K. Nie, S. Liu, G. Stickney, E. Del Rio, Y. Y. Kong et H. Chen, « On the dichotomy in auditory perception between temporal envelope and fine structure cues », The Journal of the Acoustical Society of America, vol. 116, n^o 3,‎ septembre 2004, p. 1351–4 (PMID 15478399)
↑ ^{a et b} Reinier Plomp, « Perception of speech as a modulated signal », Proceedings of the 10th International Congress of Phonetic Sciences, Utrecht,‎ 1983, p. 19–40
↑ David Hilbert (University of California Libraries), Grundzüge einer allgemeinen theorie der linearen integralgleichungen, Leipzig, B. G. Teubner, 1912 (lire en ligne)
↑ (en) M. A. Ruggero, « Response to noise of auditory nerve fibers in the squirrel monkey », Journal of Neurophysiology, vol. 36, n^o 4,‎ juillet 1973, p. 569–87 (PMID 4197339, DOI 10.1152/jn.1973.36.4.569, lire en ligne)
↑ ^{a et b} Brian C. J. Moore, Auditory Processing of Temporal Fine Structure: Effects of Age and Hearing Loss, New Jersey, World Scientific Publishing Company, 4 mai 2014 (ISBN 9789814579650, lire en ligne)
↑ P. X. Joris, D. H. Louage, L. Cardoen et M. van der Heijden, « Correlation index: a new metric to quantify temporal coding », Hearing Research, vol. 216-217,‎ juin 2006, p. 19–30 (PMID 16644160, DOI 10.1016/j.heares.2006.03.010)
↑ P. X. Joris, D. H. Louage, L. Cardoen et M. van der Heijden, « Correlation index: a new metric to quantify temporal coding », Hearing Research, vol. 216-217,‎ juin 2006, p. 19–30 (PMID 16644160, DOI 10.1016/j.heares.2006.03.010)
↑ M. G. Heinz et J. Swaminathan, « Quantifying envelope and fine-structure coding in auditory nerve responses to chimaeric speech », Journal of the Association for Research in Otolaryngology, vol. 10, n^o 3,‎ septembre 2009, p. 407–23 (PMID 19365691, PMCID 3084379, DOI 10.1007/s10162-009-0169-8)
↑ Peter L. Søndergaard, Rémi Decorsière et Torsten Dau, « On the relationship between multi-channel envelope and temporal fine structure », Proceedings of the International Symposium on Auditory and Audiological Research, vol. 3,‎ 15 décembre 2011, p. 363–370 (lire en ligne)
↑ S. Shamma et C. Lorenzi, « On the balance of envelope and temporal fine structure in the encoding of speech in the early auditory system », The Journal of the Acoustical Society of America, vol. 133, n^o 5,‎ mai 2013, p. 2818–33 (PMID 23654388, PMCID 3663870, DOI 10.1121/1.4795783)
↑ D. J. Van Tasell, S. D. Soli, V. M. Kirby et G. P. Widin, « Speech waveform envelope cues for consonant recognition », The Journal of the Acoustical Society of America, vol. 82, n^o 4,‎ octobre 1987, p. 1152–1161 (ISSN 0001-4966, PMID 3680774, lire en ligne, consulté le 14 mai 2018)
↑ ^{a et b} O. Ghitza, « On the upper cutoff frequency of the auditory critical-band envelope detectors in the context of speech perception », The Journal of the Acoustical Society of America, vol. 110, n^o 3 Pt 1,‎ septembre 2001, p. 1628–1640 (ISSN 0001-4966, PMID 11572372, lire en ligne, consulté le 14 mai 2018)
↑ R. V. Shannon, F. G. Zeng, V. Kamath et J. Wygonski, « Speech recognition with primarily temporal cues », Science (New York, N.Y.), vol. 270, n^o 5234,‎ 13 octobre 1995, p. 303–304 (ISSN 0036-8075, PMID 7569981, lire en ligne, consulté le 14 mai 2018)
↑ Zachary M. Smith, Bertrand Delgutte et Andrew J. Oxenham, « Chimaeric sounds reveal dichotomies in auditory perception », Nature, vol. 416, n^o 6876,‎ 7 mars 2002, p. 87–90 (ISSN 0028-0836, PMID 11882898, PMCID PMC2268248, DOI 10.1038/416087a, lire en ligne, consulté le 14 mai 2018)
↑ R. Drullman, J. M. Festen et R. Plomp, « Effect of temporal envelope smearing on speech reception », The Journal of the Acoustical Society of America, vol. 95, n^o 2,‎ février 1994, p. 1053–1064 (ISSN 0001-4966, PMID 8132899, lire en ligne, consulté le 14 mai 2018)
↑ Léo Varnet, Maria Clemencia Ortiz-Barajas, Ramón Guevara Erra et Judit Gervain, « A cross-linguistic study of speech modulation spectra », The Journal of the Acoustical Society of America, vol. 142, n^o 4,‎ octobre 2017, p. 1976 (ISSN 1520-8524, PMID 29092595, DOI 10.1121/1.5006179, lire en ligne, consulté le 14 mai 2018)
↑ Nandini C. Singh et Frédéric E. Theunissen, « Modulation spectra of natural sounds and ethological theories of auditory processing », The Journal of the Acoustical Society of America, vol. 114, n^o 6 Pt 1,‎ décembre 2003, p. 3394–3411 (ISSN 0001-4966, PMID 14714819, lire en ligne, consulté le 14 mai 2018)
↑ Fan-Gang Zeng, Kaibao Nie, Ginger S. Stickney et Ying-Yee Kong, « Speech recognition with amplitude and frequency modulations », Proceedings of the National Academy of Sciences of the United States of America, vol. 102, n^o 7,‎ 15 février 2005, p. 2293–2298 (ISSN 0027-8424, PMID 15677723, PMCID PMC546014, DOI 10.1073/pnas.0406460102, lire en ligne, consulté le 14 mai 2018)
↑ Christian Lorenzi, Gaëtan Gilbert, Héloïse Carn et Stéphane Garnier, « Speech perception problems of the hearing impaired reflect inability to use temporal fine structure », Proceedings of the National Academy of Sciences of the United States of America, vol. 103, n^o 49,‎ 5 décembre 2006, p. 18866–18869 (ISSN 0027-8424, PMID 17116863, PMCID PMC1693753, DOI 10.1073/pnas.0607364103, lire en ligne, consulté le 14 mai 2018)
↑ Frédéric Apoux, Sarah E. Yoho, Carla L. Youngdahl et Eric W. Healy, « Role and relative contribution of temporal envelope and fine structure cues in sentence recognition by normal-hearing listeners », The Journal of the Acoustical Society of America, vol. 134, n^o 3,‎ septembre 2013, p. 2205–2212 (ISSN 1520-8524, PMID 23967950, PMCID PMC3765279, DOI 10.1121/1.4816413, lire en ligne, consulté le 14 mai 2018)
↑ Olaf Strelcyk et Torsten Dau, « Relations between frequency selectivity, temporal fine-structure processing, and speech reception in impaired hearing », The Journal of the Acoustical Society of America, vol. 125, n^o 5,‎ mai 2009, p. 3328–3345 (ISSN 1520-8524, PMID 19425674, DOI 10.1121/1.3097469, lire en ligne, consulté le 14 mai 2018)
↑ Kathryn Hopkins et Brian C. J. Moore, « The effects of age and cochlear hearing loss on temporal fine structure sensitivity, frequency selectivity, and speech reception in noise », The Journal of the Acoustical Society of America, vol. 130, n^o 1,‎ juillet 2011, p. 334–349 (ISSN 1520-8524, PMID 21786903, DOI 10.1121/1.3585848, lire en ligne, consulté le 14 mai 2018)
↑ P. Iverson et C. L. Krumhansl, « Isolating the dynamic attributes of musical timbre », The Journal of the Acoustical Society of America, vol. 94, n^o 5,‎ novembre 1993, p. 2595–2603 (ISSN 0001-4966, PMID 8270737, lire en ligne, consulté le 14 mai 2018)

Portail de la physique

[1] Neal F. Viemeister et Christopher J. Plack, Human Psychophysics, New York, Springer, 1993 (ISBN 978-1-4612-7644-9, DOI 10.1007/978-1-4612-2728-1_4, lire en ligne), p. 116–154

[Rosen_1992-2] {a et b} Rosen S, « Temporal information in speech: acoustic, auditory and linguistic aspects », Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, vol. 336, n^o 1278,‎ juin 1992, p. 367–73 (PMID 1354376, DOI 10.1098/rstb.1992.0070)

[Drullman_1995-3] Drullman R, « Temporal envelope and fine structure cues for speech intelligibility », The Journal of the Acoustical Society of America, vol. 97, n^o 1,‎ janvier 1995, p. 585–92 (PMID 7860835)

[Moore_2008-4] {a et b} Moore BC, « The role of temporal fine structure processing in pitch perception, masking, and speech perception for normal-hearing and hearing-impaired people », Journal of the Association for Research in Otolaryngology, vol. 9, n^o 4,‎ décembre 2008, p. 399–406 (PMID 18855069, PMCID 2580810, DOI 10.1007/s10162-008-0143-x)

[5] De Boer E, « Pitch of inharmonic signals », Nature, vol. 178, n^o 4532,‎ septembre 1956, p. 535–6 (PMID 13358790)

[6] F. G. Zeng, K. Nie, S. Liu, G. Stickney, E. Del Rio, Y. Y. Kong et H. Chen, « On the dichotomy in auditory perception between temporal envelope and fine structure cues », The Journal of the Acoustical Society of America, vol. 116, n^o 3,‎ septembre 2004, p. 1351–4 (PMID 15478399)

[:2-7] {a et b} Reinier Plomp, « Perception of speech as a modulated signal », Proceedings of the 10th International Congress of Phonetic Sciences, Utrecht,‎ 1983, p. 19–40

[8] David Hilbert (University of California Libraries), Grundzüge einer allgemeinen theorie der linearen integralgleichungen, Leipzig, B. G. Teubner, 1912 (lire en ligne)

[9] (en) M. A. Ruggero, « Response to noise of auditory nerve fibers in the squirrel monkey », Journal of Neurophysiology, vol. 36, n^o 4,‎ juillet 1973, p. 569–87 (PMID 4197339, DOI 10.1152/jn.1973.36.4.569, lire en ligne)

[Moore_2014-10] {a et b} Brian C. J. Moore, Auditory Processing of Temporal Fine Structure: Effects of Age and Hearing Loss, New Jersey, World Scientific Publishing Company, 4 mai 2014 (ISBN 9789814579650, lire en ligne)

[Joris_2006-11] P. X. Joris, D. H. Louage, L. Cardoen et M. van der Heijden, « Correlation index: a new metric to quantify temporal coding », Hearing Research, vol. 216-217,‎ juin 2006, p. 19–30 (PMID 16644160, DOI 10.1016/j.heares.2006.03.010)

[Joris_20062-12] P. X. Joris, D. H. Louage, L. Cardoen et M. van der Heijden, « Correlation index: a new metric to quantify temporal coding », Hearing Research, vol. 216-217,‎ juin 2006, p. 19–30 (PMID 16644160, DOI 10.1016/j.heares.2006.03.010)

[13] M. G. Heinz et J. Swaminathan, « Quantifying envelope and fine-structure coding in auditory nerve responses to chimaeric speech », Journal of the Association for Research in Otolaryngology, vol. 10, n^o 3,‎ septembre 2009, p. 407–23 (PMID 19365691, PMCID 3084379, DOI 10.1007/s10162-009-0169-8)

[14] Peter L. Søndergaard, Rémi Decorsière et Torsten Dau, « On the relationship between multi-channel envelope and temporal fine structure », Proceedings of the International Symposium on Auditory and Audiological Research, vol. 3,‎ 15 décembre 2011, p. 363–370 (lire en ligne)

[15] S. Shamma et C. Lorenzi, « On the balance of envelope and temporal fine structure in the encoding of speech in the early auditory system », The Journal of the Acoustical Society of America, vol. 133, n^o 5,‎ mai 2013, p. 2818–33 (PMID 23654388, PMCID 3663870, DOI 10.1121/1.4795783)

[16] D. J. Van Tasell, S. D. Soli, V. M. Kirby et G. P. Widin, « Speech waveform envelope cues for consonant recognition », The Journal of the Acoustical Society of America, vol. 82, n^o 4,‎ octobre 1987, p. 1152–1161 (ISSN 0001-4966, PMID 3680774, lire en ligne, consulté le 14 mai 2018)

[:0-17] {a et b} O. Ghitza, « On the upper cutoff frequency of the auditory critical-band envelope detectors in the context of speech perception », The Journal of the Acoustical Society of America, vol. 110, n^o 3 Pt 1,‎ septembre 2001, p. 1628–1640 (ISSN 0001-4966, PMID 11572372, lire en ligne, consulté le 14 mai 2018)

[18] R. V. Shannon, F. G. Zeng, V. Kamath et J. Wygonski, « Speech recognition with primarily temporal cues », Science (New York, N.Y.), vol. 270, n^o 5234,‎ 13 octobre 1995, p. 303–304 (ISSN 0036-8075, PMID 7569981, lire en ligne, consulté le 14 mai 2018)

[19] Zachary M. Smith, Bertrand Delgutte et Andrew J. Oxenham, « Chimaeric sounds reveal dichotomies in auditory perception », Nature, vol. 416, n^o 6876,‎ 7 mars 2002, p. 87–90 (ISSN 0028-0836, PMID 11882898, PMCID PMC2268248, DOI 10.1038/416087a, lire en ligne, consulté le 14 mai 2018)

[20] R. Drullman, J. M. Festen et R. Plomp, « Effect of temporal envelope smearing on speech reception », The Journal of the Acoustical Society of America, vol. 95, n^o 2,‎ février 1994, p. 1053–1064 (ISSN 0001-4966, PMID 8132899, lire en ligne, consulté le 14 mai 2018)

[21] Léo Varnet, Maria Clemencia Ortiz-Barajas, Ramón Guevara Erra et Judit Gervain, « A cross-linguistic study of speech modulation spectra », The Journal of the Acoustical Society of America, vol. 142, n^o 4,‎ octobre 2017, p. 1976 (ISSN 1520-8524, PMID 29092595, DOI 10.1121/1.5006179, lire en ligne, consulté le 14 mai 2018)

[22] Nandini C. Singh et Frédéric E. Theunissen, « Modulation spectra of natural sounds and ethological theories of auditory processing », The Journal of the Acoustical Society of America, vol. 114, n^o 6 Pt 1,‎ décembre 2003, p. 3394–3411 (ISSN 0001-4966, PMID 14714819, lire en ligne, consulté le 14 mai 2018)

[23] Fan-Gang Zeng, Kaibao Nie, Ginger S. Stickney et Ying-Yee Kong, « Speech recognition with amplitude and frequency modulations », Proceedings of the National Academy of Sciences of the United States of America, vol. 102, n^o 7,‎ 15 février 2005, p. 2293–2298 (ISSN 0027-8424, PMID 15677723, PMCID PMC546014, DOI 10.1073/pnas.0406460102, lire en ligne, consulté le 14 mai 2018)

[24] Christian Lorenzi, Gaëtan Gilbert, Héloïse Carn et Stéphane Garnier, « Speech perception problems of the hearing impaired reflect inability to use temporal fine structure », Proceedings of the National Academy of Sciences of the United States of America, vol. 103, n^o 49,‎ 5 décembre 2006, p. 18866–18869 (ISSN 0027-8424, PMID 17116863, PMCID PMC1693753, DOI 10.1073/pnas.0607364103, lire en ligne, consulté le 14 mai 2018)

[25] Frédéric Apoux, Sarah E. Yoho, Carla L. Youngdahl et Eric W. Healy, « Role and relative contribution of temporal envelope and fine structure cues in sentence recognition by normal-hearing listeners », The Journal of the Acoustical Society of America, vol. 134, n^o 3,‎ septembre 2013, p. 2205–2212 (ISSN 1520-8524, PMID 23967950, PMCID PMC3765279, DOI 10.1121/1.4816413, lire en ligne, consulté le 14 mai 2018)

[26] Olaf Strelcyk et Torsten Dau, « Relations between frequency selectivity, temporal fine-structure processing, and speech reception in impaired hearing », The Journal of the Acoustical Society of America, vol. 125, n^o 5,‎ mai 2009, p. 3328–3345 (ISSN 1520-8524, PMID 19425674, DOI 10.1121/1.3097469, lire en ligne, consulté le 14 mai 2018)

[27] Kathryn Hopkins et Brian C. J. Moore, « The effects of age and cochlear hearing loss on temporal fine structure sensitivity, frequency selectivity, and speech reception in noise », The Journal of the Acoustical Society of America, vol. 130, n^o 1,‎ juillet 2011, p. 334–349 (ISSN 1520-8524, PMID 21786903, DOI 10.1121/1.3585848, lire en ligne, consulté le 14 mai 2018)

[28] P. Iverson et C. L. Krumhansl, « Isolating the dynamic attributes of musical timbre », The Journal of the Acoustical Society of America, vol. 94, n^o 5,‎ novembre 1993, p. 2595–2603 (ISSN 0001-4966, PMID 8270737, lire en ligne, consulté le 14 mai 2018)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]