Displaying 1 - 70 of 70
-
Alagöz, G., Eising, E., Mekki, Y., Bignardi, G., Fontanillas, P., 23andMe Research Team, Nivard, M. G., Luciano, M., Cox, N. J., Fisher, S. E., & Gordon, R. L. (2025). The shared genetic architecture and evolution of human language and musical rhythm. Nature Human Behaviour, 9, 376-390. doi:10.1038/s41562-024-02051-y.
Abstract
Rhythm and language-related traits are phenotypically correlated, but their genetic overlap is largely unknown. Here, we leveraged two large-scale genome-wide association studies performed to shed light on the shared genetics of rhythm (N=606,825) and dyslexia (N=1,138,870). Our results reveal an intricate shared genetic and neurobiological architecture, and lay groundwork for resolving longstanding debates about the potential co-evolution of human language and musical traits. -
Bethke, S., Meyer, A. S., & Hintz, F. (2025). The German Auditory and Image (GAudI) vocabulary test: A new German receptive vocabulary test and its relationships to other tests measuring linguistic experience. PLOS ONE, 20: e0318115. doi:10.1371/journal.pone.0318115.
Abstract
Humans acquire word knowledge through producing and comprehending spoken and written language. Word learning continues into adulthood and knowledge accumulates across the lifespan. Therefore, receptive vocabulary size is often conceived of as a proxy for linguistic experience and plays a central role in assessing individuals’ language proficiency. There is currently no valid open access test available for assessing receptive vocabulary size in German-speaking adults. We addressed this gap and developed the German Auditory and Image Vocabulary Test (GAudI). In the GAudI, participants are presented with spoken test words and have to indicate their meanings by selecting the corresponding picture from a set of four alternatives. Here we describe the development of the test and provide evidence for its validity. Specifically, we report a study in which 168 German-speaking participants completed the GAudI and five other tests tapping into linguistic experience: one test measuring print exposure, two tests measuring productive vocabulary, one test assessing knowledge of book language grammar, and a test of receptive vocabulary that was normed in adolescents. The psychometric properties of the GAudI and its relationships to the other tests demonstrate that it is a suitable tool for measuring receptive vocabulary size. We offer an open-access digital test environment that can be used for research purposes, accessible via https://ems13.mpi.nl/bq4_customizable_de/researchers_welcome.php. -
Bujok, R., Meyer, A. S., & Bosker, H. R. (2025). Audiovisual perception of lexical stress: Beat gestures and articulatory cues. Language and Speech, 68(1), 181-203. doi:10.1177/00238309241258162.
Abstract
Human communication is inherently multimodal. Auditory speech, but also visual cues can be used to understand another talker. Most studies of audiovisual speech perception have focused on the perception of speech segments (i.e., speech sounds). However, less is known about the influence of visual information on the perception of suprasegmental aspects of speech like lexical stress. In two experiments, we investigated the influence of different visual cues (e.g., facial articulatory cues and beat gestures) on the audiovisual perception of lexical stress. We presented auditory lexical stress continua of disyllabic Dutch stress pairs together with videos of a speaker producing stress on the first or second syllable (e.g., articulating VOORnaam or voorNAAM). Moreover, we combined and fully crossed the face of the speaker producing lexical stress on either syllable with a gesturing body producing a beat gesture on either the first or second syllable. Results showed that people successfully used visual articulatory cues to stress in muted videos. However, in audiovisual conditions, we were not able to find an effect of visual articulatory cues. In contrast, we found that the temporal alignment of beat gestures with speech robustly influenced participants' perception of lexical stress. These results highlight the importance of considering suprasegmental aspects of language in multimodal contexts. -
Coopmans, C. W., De Hoop, H., Tezcan, F., Hagoort, P., & Martin, A. E. (2025). Language-specific neural dynamics extend syntax into the time domain. PLOS Biology, 23: e3002968. doi:10.1371/journal.pbio.3002968.
Abstract
Studies of perception have long shown that the brain adds information to its sensory analysis of the physical environment. A touchstone example for humans is language use: to comprehend a physical signal like speech, the brain must add linguistic knowledge, including syntax. Yet, syntactic rules and representations are widely assumed to be atemporal (i.e., abstract and not bound by time), so they must be translated into time-varying signals for speech comprehension and production. Here, we test 3 different models of the temporal spell-out of syntactic structure against brain activity of people listening to Dutch stories: an integratory bottom-up parser, a predictive top-down parser, and a mildly predictive left-corner parser. These models build exactly the same structure but differ in when syntactic information is added by the brain—this difference is captured in the (temporal distribution of the) complexity metric “incremental node count.” Using temporal response function models with both acoustic and information-theoretic control predictors, node counts were regressed against source-reconstructed delta-band activity acquired with magnetoencephalography. Neural dynamics in left frontal and temporal regions most strongly reflect node counts derived by the top-down method, which postulates syntax early in time, suggesting that predictive structure building is an important component of Dutch sentence comprehension. The absence of strong effects of the left-corner model further suggests that its mildly predictive strategy does not represent Dutch language comprehension well, in contrast to what has been found for English. Understanding when the brain projects its knowledge of syntax onto speech, and whether this is done in language-specific ways, will inform and constrain the development of mechanistic models of syntactic structure building in the brain. -
Goral, M., Antolovic, K., Hejazi, Z., & Schulz, F. M. (2025). Using a translanguaging framework to examine language production in a trilingual person with aphasia. Clinical Linguistics & Phonetics, 39(1), 1-20. doi:10.1080/02699206.2024.2328240.
Abstract
When language abilities in aphasia are assessed in clinical and research settings, the standard practice is to examine each language of a multilingual person separately. But many multilingual individuals, with and without aphasia, mix their languages regularly when they communicate with other speakers who share their languages. We applied a novel approach to scoring language production of a multilingual person with aphasia. Our aim was to discover whether the assessment outcome would differ meaningfully when we count accurate responses in only the target language of the assessment session versus when we apply a translanguaging framework, that is, count all accurate responses, regardless of the language in which they were produced. The participant is a Farsi-German-English speaking woman with chronic moderate aphasia. We examined the participant’s performance on two picture-naming tasks, an answering wh-question task, and an elicited narrative task. The results demonstrated that scores in English, the participant’s third-learned and least-impaired language did not differ between the two scoring methods. Performance in German, the participant’s moderately impaired second language benefited from translanguaging-based scoring across the board. In Farsi, her weakest language post-CVA, the participant’s scores were higher under the translanguaging-based scoring approach in some but not all of the tasks. Our findings suggest that whether a translanguaging-based scoring makes a difference in the results obtained depends on relative language abilities and on pragmatic constraints, with additional influence of the linguistic distances between the languages in question. -
Karaca, F., Brouwer, S., Unsworth, S., & Huettig, F. (2025). Child heritage speakers’ reading skills in the majority language and exposure to the heritage language support morphosyntactic prediction in speech. Bilingualism: Language and Cognition. Advance online publication. doi:10.1017/S1366728925000331.
Abstract
We examined the morphosyntactic prediction ability of child heritage speakers and the role of reading skills and language experience in predictive processing. Using visual world eye-tracking, we focused on predictive use of case-marking cues in Turkish with monolingual (N=49, Mage=83 months) and heritage children, who were early bilinguals of Turkish and Dutch (N=30, Mage=90 months). We found quantitative differences in magnitude of the prediction ability of monolingual and heritage children; however, their overall prediction ability was on par. The heritage speakers’ prediction ability was facilitated by their reading skills in Dutch, but not in Turkish as well as by their heritage language exposure, but not by engagement in literacy activities. These findings emphasize the facilitatory role of reading skills and spoken language experience in predictive processing. This study is the first to show that in a developing bilingual mind, effects of reading-on-prediction can take place across modalities and across languages.Additional information
data and analysis scripts -
Karaca, F. (2025). On knowing what lies ahead: The interplay of prediction, experience, and proficiency. PhD Thesis, Radboud University Nijmegen, Nijmegen.
Additional information
link to Radboud Repository -
Matetovici, M., Spruit, A., Colonnesi, C., Garnier‐Villarreal, M., & Noom, M. (2025). Parent and child gender effects in the relationship between attachment and both internalizing and externalizing problems of children between 2 and 5 years old: A dyadic perspective. Infant Mental Health Journal: Infancy and Early Childhood. Advance online publication. doi:10.1002/imhj.70002.
Abstract
Acknowledging that the parent–child attachment is a dyadic relationship, we investigated differences between pairs of parents and preschool children based on gender configurations in the association between attachment and problem behavior. We looked at mother–daughter, mother–son, father–daughter, and father–son dyads, but also compared mothers and fathers, daughters and sons, and same versus different gender pairs. We employed multigroup structural equation modeling to explore moderation effects of gender in a sample of 446 independent pairs of parents and preschool children (2–5 years old) from the Netherlands. A stronger association between both secure and avoidant attachment and internalizing problems was found for father–son dyads compared to father–daughter dyads. A stronger association between both secure and avoidant attachment and externalizing problems was found for mother–son dyads compared to mother–daughter and father–daughter dyads. Sons showed a stronger negative association between secure attachment and externalizing problems, a stronger positive association between avoidant attachment and externalizing problems, and a stronger negative association between secure attachment and internalizing problems compared to daughters. These results provide evidence for gender moderation and demonstrate that a dyadic approach can reveal patterns of associations that would not be recognized if parent and child gender effects were assessed separately.Additional information
analysis code -
Mazzini*, S., Seijdel*, N., & Drijvers*, L. (2025). Autistic individuals benefit from gestures during degraded speech comprehension. Autism, 29(2), 544-548. doi:10.1177/13623613241286570.
Abstract
*All authors contributed equally to this work
Meaningful gestures enhance degraded speech comprehension in neurotypical adults, but it is unknown whether this is the case for neurodivergent populations, such as autistic individuals. Previous research demonstrated atypical multisensory and speech-gesture integration in autistic individuals, suggesting that integrating speech and gestures may be more challenging and less beneficial for speech comprehension in adverse listening conditions in comparison to neurotypicals. Conversely, autistic individuals could also benefit from additional cues to comprehend speech in noise, as they encounter difficulties in filtering relevant information from noise. We here investigated whether gestural enhancement of degraded speech comprehension differs for neurotypical (n = 40, mean age = 24.1) compared to autistic (n = 40, mean age = 26.8) adults. Participants watched videos of an actress uttering a Dutch action verb in clear or degraded speech accompanied with or without a gesture, and completed a free-recall task. Gestural enhancement was observed for both autistic and neurotypical individuals, and did not differ between groups. In contrast to previous literature, our results demonstrate that autistic individuals do benefit from gestures during degraded speech comprehension, similar to neurotypicals. These findings provide relevant insights to improve communication practices with autistic individuals and to develop new interventions for speech comprehension. -
Ye, C., McQueen, J. M., & Bosker, H. R. (2025). Effect of auditory cues to lexical stress on the visual perception of gestural timing. Attention, Perception & Psychophysics. Advance online publication. doi:10.3758/s13414-025-03072-z.
Abstract
Speech is often accompanied by gestures. Since beat gestures—simple nonreferential up-and-down hand movements—frequently co-occur with prosodic prominence, they can indicate stress in a word and hence influence spoken-word recognition. However, little is known about the reverse influence of auditory speech on visual perception. The current study investigated whether lexical stress has an effect on the perceived timing of hand beats. We used videos in which a disyllabic word, embedded in a carrier sentence (Experiment 1) or in isolation (Experiment 2), was coupled with an up-and-down hand beat, while varying their degrees of asynchrony. Results from Experiment 1, a novel beat timing estimation task, revealed that gestures were estimated to occur closer in time to the pitch peak in a stressed syllable than their actual timing, hence reducing the perceived temporal distance between gestures and stress by around 60%. Using a forced-choice task, Experiment 2 further demonstrated that listeners tended to perceive a gesture, falling midway between two syllables, on the syllable receiving stronger cues to stress than the other, and this auditory effect was greater when gestural timing was most ambiguous. Our findings suggest that f0 and intensity are the driving force behind the temporal attraction effect of stress on perceived gestural timing. This study provides new evidence for auditory influences on visual perception, supporting bidirectionality in audiovisual interaction between speech-related signals that occur in everyday face-to-face communication. -
Mooijman, S., Schoonen, R., Goral, M., Roelofs, A., & Ruiter, M. B. (2025). Why do bilingual speakers with aphasia alternate between languages? A study into their experiences and mixing patterns. Aphasiology. Advance online publication. doi:10.1080/02687038.2025.2452928.
Abstract
Background
The factors that contribute to language alternation by bilingual speakers with aphasia have been debated. Some studies suggest that atypical language mixing results from impairments in language control, while others posit that mixing is a way to enhance communicative effectiveness. To address this question, most prior research examined the appropriateness of language mixing in connected speech tasks.
Aims
The goal of this study was to provide new insight into the question whether language mixing in aphasia reflects a strategy to enhance verbal effectiveness or involuntary behaviour resulting from impaired language control.
Methods & procedures
Semi-structured web-based interviews with bilingual speakers with aphasia (N = 19) with varying language backgrounds were conducted. The interviews were transcribed and coded for: (1) Self-reports regarding language control and compensation, (2) instances of language mixing, and (3) in two cases, instances of repair initiation.
Outcomes & results
The results showed that several participants reported language control difficulties but that the knowledge of additional languages could also be recruited to compensate for lexical retrieval problems. Most participants showed no or very few instances of mixing and the observed mixes appeared to adhere to the pragmatic context and known functions of switching. Three participants exhibited more marked switching behaviour and reported corresponding difficulties with language control. Instances of atypical mixing did not coincide with clear problems initiating conversational repair.
Conclusions
Our study highlights the variability in language mixing patterns of bilingual speakers with aphasia. Furthermore, most of the individuals in the study appeared to be able to effectively control their languages, and to alternate between their languages for compensatory purposes. Control deficits resulting in atypical language mixing were observed in a small number of participants. -
Papoutsi, C., Tourtouri, E. N., Piai, V., Lampe, L. F., & Meyer, A. S. (2025). Fast and slow errors: What naming latencies of errors reveal about the interplay of attentional control and word planning in speeded picture naming. Journal of Experimental Psychology: Learning, Memory, and Cognition. Advance online publication. doi:10.1037/xlm0001472.
Abstract
Speakers sometimes produce lexical errors, such as saying “salt” instead of “pepper.” This study aimed to better understand the origin of lexical errors by assessing whether they arise from a hasty selection and premature decision to speak (premature selection hypothesis) or from momentary attentional disengagement from the task (attentional lapse hypothesis). We analyzed data from a speeded picture naming task (Lampe et al., 2023) and investigated whether lexical errors are produced as fast as target (i.e., correct) responses, thus arising from premature selection, or whether they are produced more slowly than target responses, thus arising from lapses of attention. Using ex-Gaussian analyses, we found that lexical errors were slower than targets in the tail, but not in the normal part of the response time distribution, with the tail effect primarily resulting from errors that were not coordinates, that is, members of the target’s semantic category. Moreover, we compared the coordinate errors and target responses in terms of their word-intrinsic properties and found that they were overall more frequent, shorter, and acquired earlier than targets. Given the present findings, we conclude that coordinate errors occur due to a premature selection but in the context of intact attentional control, following the same lexical constraints as targets, while other errors, given the variability in their nature, may vary in their origin, with one potential source being lapses of attention. -
Rohrer, P. L., Bujok, R., Van Maastricht, L., & Bosker, H. R. (2025). From “I dance” to “she danced” with a flick of the hands: Audiovisual stress perception in Spanish. Psychonomic Bulletin & Review. Advance online publication. doi:10.3758/s13423-025-02683-9.
Abstract
When talking, speakers naturally produce hand movements (co-speech gestures) that contribute to communication. Evidence in Dutch suggests that the timing of simple up-and-down, non-referential “beat” gestures influences spoken word recognition: the same auditory stimulus was perceived as CONtent (noun, capitalized letters indicate stressed syllables) when a beat gesture occurred on the first syllable, but as conTENT (adjective) when the gesture occurred on the second syllable. However, these findings were based on a small number of minimal pairs in Dutch, limiting the generalizability of the findings. We therefore tested this effect in Spanish, where lexical stress is highly relevant in the verb conjugation system, distinguishing bailo, “I dance” with word-initial stress from bailó, “she danced” with word-final stress. Testing a larger sample (N = 100), we also assessed whether individual differences in working memory capacity modulated how much individuals relied on the gestures in spoken word recognition. The results showed that, similar to Dutch, Spanish participants were biased to perceive lexical stress on the syllable that visually co-occurred with a beat gesture, with the effect being strongest when the acoustic stress cues were most ambiguous. No evidence was found for by-participant effect sizes to be influenced by individual differences in phonological or visuospatial working memory. These findings reveal gestural-speech coordination impacts lexical stress perception in a language where listeners are regularly confronted with such lexical stress contrasts, highlighting the impact of gestures’ timing on prominence perception and spoken word recognition. -
Roos, N. M. (2025). Naming a picture in context: Paving the way to investigate language recovery after stroke. PhD Thesis, Radboud University Nijmegen, Nijmegen.
Additional information
link to Radboud Repository -
Sander, J., Zhang, Y., & Rowland, C. F. (2025). Language acquisition occurs in multimodal social interaction: A commentary on Karadöller, Sümer and Özyürek [invited commentary]. First Language: advance online publication. doi:10.1177/01427237251326984.
Abstract
We argue that language learning occurs in triadic interactions, where caregivers and children engage not only with each other but also with objects, actions and non-verbal cues that shape language acquisition. We illustrate this using two studies on real-time interactions in spoken and signed language. The first examines shared book reading, showing how caregivers use speech, gestures and gaze coordination to establish joint attention, facilitating word-object associations. The second study explores joint attention in spoken and signed interactions, demonstrating that signing dyads rely on a wider range of multimodal behaviours – such as touch, vibrations and peripheral gaze – compared to speaking dyads. Our data highlight how different language modalities shape attentional strategies. We advocate for research that fully incorporates the dynamic interplay between language, attention and environment. -
Severijnen, G. G. A. (2025). A blessing in disguise: How prosodic variability challenges but also aids successful speech perception. PhD Thesis, Radboud University Nijmegen, Nijmegen.
Additional information
full text via Radboud Repository -
Ter Bekke, M. (2025). On how gestures facilitate prediction and fast responding during conversation. PhD Thesis, Radboud University Nijmegen, Nijmegen.
Additional information
full text via Radboud Repository -
Ter Bekke, M., Drijvers, L., & Holler, J. (2025). Co-speech hand gestures are used to predict upcoming meaning. Psychological Science, 36(4), 237-248. doi:10.1177/09567976251331041.
Abstract
In face-to-face conversation, people use speech and gesture to convey meaning. Seeing gestures alongside speech facilitates comprehenders’ language processing, but crucially, the mechanisms underlying this facilitation remain unclear. We investigated whether comprehenders use the semantic information in gestures, typically preceding related speech, to predict upcoming meaning. Dutch adults listened to questions asked by a virtual avatar. Questions were accompanied by an iconic gesture (e.g., typing) or meaningless control movement (e.g., arm scratch) followed by a short pause and target word (e.g., “type”). A Cloze experiment showed that gestures improved explicit predictions of upcoming target words. Moreover, an EEG experiment showed that gestures reduced alpha and beta power during the pause, indicating anticipation, and reduced N400 amplitudes, demonstrating facilitated semantic processing. Thus, comprehenders use iconic gestures to predict upcoming meaning. Theories of linguistic prediction should incorporate communicative bodily signals as predictive cues to capture how language is processed in face-to-face interaction.Additional information
supplementary material -
Van Geert, E., Ding, R., & Wagemans, J. (2025). A cross-cultural comparison of aesthetic preferences for neatly organized compositions: Native Chinese- versus Native Dutch-speaking samples. Empirical Studies of the Arts, 43(1), 250-275. doi:10.1177/02762374241245917.
Abstract
Do aesthetic preferences for images of neatly organized compositions (e.g., images collected on blogs like Things Organized Neatly©) generalize across cultures? In an earlier study, focusing on stimulus and personal properties related to order and complexity, Western participants indicated their preference for one of two simultaneously presented images (100 pairs). In the current study, we compared the data of the native Dutch-speaking participants from this earlier sample (N = 356) to newly collected data from a native Chinese-speaking sample (N = 220). Overall, aesthetic preferences were quite similar across cultures. When relating preferences for each sample to ratings of order, complexity, soothingness, and fascination collected from a Western, mainly Dutch-speaking sample, the results hint at a cross-culturally consistent preference for images that Western participants rate as more ordered, but a cross-culturally diverse relation between preferences and complexity.Additional information
VanGeert_Ding_Wagemans_2024suppl_cross cultural comparison of....pdf -
He, J. (2023). Coordination of spoken language production and comprehension: How speech production is affected by irrelevant background speech. PhD Thesis, Radboud University Nijmegen, Nijmegen.
Additional information
full text via Radboud Repository -
Anichini, M., de Reus, K., Hersh, T. A., Valente, D., Salazar-Casals, A., Berry, C., Keller, P. E., & Ravignani, A. (2023). Measuring rhythms of vocal interactions: A proof of principle in harbour seal pups. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, 378(1875): 20210477. doi:10.1098/rstb.2021.0477.
Abstract
Rhythmic patterns in interactive contexts characterize human behaviours such as conversational turn-taking. These timed patterns are also present in other animals, and often described as rhythm. Understanding fine-grained temporal adjustments in interaction requires complementary quantitative methodologies. Here, we showcase how vocal interactive rhythmicity in a non-human animal can be quantified using a multi-method approach. We record vocal interactions in harbour seal pups (Phoca vitulina) under controlled conditions. We analyse these data by combining analytical approaches, namely categorical rhythm analysis, circular statistics and time series analyses. We test whether pups' vocal rhythmicity varies across behavioural contexts depending on the absence or presence of a calling partner. Four research questions illustrate which analytical approaches are complementary versus orthogonal. For our data, circular statistics and categorical rhythms suggest that a calling partner affects a pup's call timing. Granger causality suggests that pups predictively adjust their call timing when interacting with a real partner. Lastly, the ADaptation and Anticipation Model estimates statistical parameters for a potential mechanism of temporal adaptation and anticipation. Our analytical complementary approach constitutes a proof of concept; it shows feasibility in applying typically unrelated techniques to seals to quantify vocal rhythmic interactivity across behavioural contexts.Additional information
supplemental information -
Bartolozzi, F. (2023). Repetita Iuvant? Studies on the role of repetition priming as a supportive mechanism during conversation. PhD Thesis, Radboud University Nijmegen, Nijmegen.
Additional information
full text via Radboud Repository -
Byun, K.-S. (2023). Establishing intersubjectivity in cross-signing. PhD Thesis, Radboud University Nijmegen, Nijmegen.
-
Çetinçelik, M., Rowland, C. F., & Snijders, T. M. (2023). Ten-month-old infants’ neural tracking of naturalistic speech is not facilitated by the speaker’s eye gaze. Developmental Cognitive Neuroscience, 64: 101297. doi:10.1016/j.dcn.2023.101297.
Abstract
Eye gaze is a powerful ostensive cue in infant-caregiver interactions, with demonstrable effects on language acquisition. While the link between gaze following and later vocabulary is well-established, the effects of eye gaze on other aspects of language, such as speech processing, are less clear. In this EEG study, we examined the effects of the speaker’s eye gaze on ten-month-old infants’ neural tracking of naturalistic audiovisual speech, a marker for successful speech processing. Infants watched videos of a speaker telling stories, addressing the infant with direct or averted eye gaze. We assessed infants’ speech-brain coherence at stress (1–1.75 Hz) and syllable (2.5–3.5 Hz) rates, tested for differences in attention by comparing looking times and EEG theta power in the two conditions, and investigated whether neural tracking predicts later vocabulary. Our results showed that infants’ brains tracked the speech rhythm both at the stress and syllable rates, and that infants’ neural tracking at the syllable rate predicted later vocabulary. However, speech-brain coherence did not significantly differ between direct and averted gaze conditions and infants did not show greater attention to direct gaze. Overall, our results suggest significant neural tracking at ten months, related to vocabulary development, but not modulated by speaker’s gaze.Additional information
supplementary material -
Chen, A., Çetinçelik, M., Roncaglia-Denissen, M. P., & Sadakata, M. (2023). Native language, L2 experience, and pitch processing in music. Linguistic Approaches to Bilingualism, 13(2), 218-237. doi:10.1075/lab.20030.che.
Abstract
The current study investigated how the role of pitch in one’s native language and L2 experience influenced musical melodic processing by testing Turkish and Mandarin Chinese advanced and beginning learners of English as an L2. Pitch has a lower functional load and shows a simpler pattern in Turkish than in Chinese as the former only contrasts between presence and the absence of pitch elevation, while the latter makes use of four different pitch contours lexically. Using the Musical Ear Test as the tool, we found that the Chinese listeners outperformed the Turkish listeners, and the advanced L2 learners outperformed the beginning learners. The Turkish listeners were further tested on their discrimination of bisyllabic Chinese lexical tones, and again an L2 advantage was observed. No significant difference was found for working memory between the beginning and advanced L2 learners. These results suggest that richness of tonal inventory of the native language is essential for triggering a music processing advantage, and on top of the tone language advantage, the L2 experience yields a further enhancement. Yet, unlike the tone language advantage that seems to relate to pitch expertise, learning an L2 seems to improve sound discrimination in general, and such improvement exhibits in non-native lexical tone discrimination. -
Coopmans, C. W. (2023). Triangles in the brain: The role of hierarchical structure in language use. PhD Thesis, Radboud University Nijmegen, Nijmegen.
Additional information
full text via Radboud Repository -
Coopmans, C. W., Struiksma, M. E., Coopmans, P. H. A., & Chen, A. (2023). Processing of grammatical agreement in the face of variation in lexical stress: A mismatch negativity study. Language and Speech, 66(1), 202-213. doi:10.1177/00238309221098116.
Abstract
Previous electroencephalography studies have yielded evidence for automatic processing of syntax and lexical stress. However, these studies looked at both effects in isolation, limiting their generalizability to everyday language comprehension. In the current study, we investigated automatic processing of grammatical agreement in the face of variation in lexical stress. Using an oddball paradigm, we measured the Mismatch Negativity (MMN) in Dutch-speaking participants while they listened to Dutch subject–verb sequences (linguistic context) or acoustically similar sequences in which the subject was replaced by filtered noise (nonlinguistic context). The verb forms differed in the inflectional suffix, rendering the subject–verb sequences grammatically correct or incorrect, and leading to a difference in the stress pattern of the verb forms. We found that the MMNs were modulated in both the linguistic and nonlinguistic condition, suggesting that the processing load induced by variation in lexical stress can hinder early automatic processing of grammatical agreement. However, as the morphological differences between the verb forms correlated with differences in number of syllables, an interpretation in terms of the prosodic structure of the sequences cannot be ruled out. Future research is needed to determine which of these factors (i.e., lexical stress, syllabic structure) most strongly modulate early syntactic processing.Additional information
supplementary material -
Coopmans, C. W., Mai, A., Slaats, S., Weissbart, H., & Martin, A. E. (2023). What oscillations can do for syntax depends on your theory of structure building. Nature Reviews Neuroscience, 24, 723. doi:10.1038/s41583-023-00734-5.
-
Coopmans, C. W., Kaushik, K., & Martin, A. E. (2023). Hierarchical structure in language and action: A formal comparison. Psychological Review, 130(4), 935-952. doi:10.1037/rev0000429.
Abstract
Since the cognitive revolution, language and action have been compared as cognitive systems, with cross-domain convergent views recently gaining renewed interest in biology, neuroscience, and cognitive science. Language and action are both combinatorial systems whose mode of combination has been argued to be hierarchical, combining elements into constituents of increasingly larger size. This structural similarity has led to the suggestion that they rely on shared cognitive and neural resources. In this article, we compare the conceptual and formal properties of hierarchy in language and action using set theory. We show that the strong compositionality of language requires a particular formalism, a magma, to describe the algebraic structure corresponding to the set of hierarchical structures underlying sentences. When this formalism is applied to actions, it appears to be both too strong and too weak. To overcome these limitations, which are related to the weak compositionality and sequential nature of action structures, we formalize the algebraic structure corresponding to the set of actions as a trace monoid. We aim to capture the different system properties of language and action in terms of the distinction between hierarchical sets and hierarchical sequences and discuss the implications for the way both systems could be represented in the brain. -
Doerig, A., Sommers, R. P., Seeliger, K., Richards, B., Ismael, J., Lindsay, G. W., Kording, K. P., Konkle, T., Van Gerven, M. A. J., Kriegeskorte, N., & Kietzmann, T. C. (2023). The neuroconnectionist research programme. Nature Reviews Neuroscience, 24, 431-450. doi:10.1038/s41583-023-00705-w.
Abstract
Artificial neural networks (ANNs) inspired by biology are beginning to be widely used to model behavioural and neural data, an approach we call ‘neuroconnectionism’. ANNs have been not only lauded as the current best models of information processing in the brain but also criticized for failing to account for basic cognitive functions. In this Perspective article, we propose that arguing about the successes and failures of a restricted set of current ANNs is the wrong approach to assess the promise of neuroconnectionism for brain science. Instead, we take inspiration from the philosophy of science, and in particular from Lakatos, who showed that the core of a scientific research programme is often not directly falsifiable but should be assessed by its capacity to generate novel insights. Following this view, we present neuroconnectionism as a general research programme centred around ANNs as a computational language for expressing falsifiable theories about brain computation. We describe the core of the programme, the underlying computational framework and its tools for testing specific neuroscientific hypotheses and deriving novel understanding. Taking a longitudinal view, we review past and present neuroconnectionist projects and their responses to challenges and argue that the research programme is highly progressive, generating new and otherwise unreachable insights into the workings of the brain. -
Dong, T., & Toneva, M. (2023). Modeling brain responses to video stimuli using multimodal video transformers. In Proceedings of the Conference on Cognitive Computational Neuroscience (CCN 2023) (pp. 194-197).
Abstract
Prior work has shown that internal representations of artificial neural networks can significantly predict brain responses elicited by unimodal stimuli (i.e. reading a book chapter or viewing static images). However, the computational modeling of brain representations of naturalistic video stimuli, such as movies or TV shows, still remains underexplored. In this work, we present a promising approach for modeling vision-language brain representations of video stimuli by a transformer-based model that represents videos jointly through audio, text, and vision. We show that the joint representations of vision and text information are better aligned with brain representations of subjects watching a popular TV show. We further show that the incorporation of visual information improves brain alignment across several regions that support language processing. -
Drijvers, L., & Mazzini, S. (2023). Neural oscillations in audiovisual language and communication. In Oxford Research Encyclopedia of Neuroscience. Oxford: Oxford University Press. doi:10.1093/acrefore/9780190264086.013.455.
Abstract
How do neural oscillations support human audiovisual language and communication? Considering the rhythmic nature of audiovisual language, in which stimuli from different sensory modalities unfold over time, neural oscillations represent an ideal candidate to investigate how audiovisual language is processed in the brain. Modulations of oscillatory phase and power are thought to support audiovisual language and communication in multiple ways. Neural oscillations synchronize by tracking external rhythmic stimuli or by re-setting their phase to presentation of relevant stimuli, resulting in perceptual benefits. In particular, synchronized neural oscillations have been shown to subserve the processing and the integration of auditory speech, visual speech, and hand gestures. Furthermore, synchronized oscillatory modulations have been studied and reported between brains during social interaction, suggesting that their contribution to audiovisual communication goes beyond the processing of single stimuli and applies to natural, face-to-face communication.
There are still some outstanding questions that need to be answered to reach a better understanding of the neural processes supporting audiovisual language and communication. In particular, it is not entirely clear yet how the multitude of signals encountered during audiovisual communication are combined into a coherent percept and how this is affected during real-world dyadic interactions. In order to address these outstanding questions, it is fundamental to consider language as a multimodal phenomenon, involving the processing of multiple stimuli unfolding at different rhythms over time, and to study language in its natural context: social interaction. Other outstanding questions could be addressed by implementing novel techniques (such as rapid invisible frequency tagging, dual-electroencephalography, or multi-brain stimulation) and analysis methods (e.g., using temporal response functions) to better understand the relationship between oscillatory dynamics and efficient audiovisual communication. -
Düngen, D., & Ravignani, A. (2023). The paradox of learned song in a semi-solitary mammal. Ethology, 129(9), 445-497. doi:10.1111/eth.13385.
Abstract
Learning can occur via trial and error; however, learning from conspecifics is faster and more efficient. Social animals can easily learn from conspecifics, but how do less social species learn? In particular, birds provide astonishing examples of social learning of vocalizations, while vocal learning from conspecifics is much less understood in mammals. We present a hypothesis aimed at solving an apparent paradox: how can harbor seals (Phoca vitulina) learn their song when their whole lives are marked by loose conspecific social contact? Harbor seal pups are raised individually by their mostly silent mothers. Pups' first few weeks of life show developed vocal plasticity; these weeks are followed by relatively silent years until sexually mature individuals start singing. How can this rather solitary life lead to a learned song? Why do pups display vocal plasticity at a few weeks of age, when this is apparently not needed? Our hypothesis addresses these questions and tries to explain how vocal learning fits into the natural history of harbor seals, and potentially other less social mammals. We suggest that harbor seals learn during a sensitive period within puppyhood, where they are exposed to adult males singing. In particular, we hypothesize that, to make this learning possible, the following happens concurrently: (1) mothers give birth right before male singing starts, (2) pups enter a sensitive learning phase around weaning time, which (3) coincides with their foraging expeditions at sea which, (4) in turn, coincide with the peak singing activity of adult males. In other words, harbor seals show vocal learning as pups so they can acquire elements of their future song from adults, and solitary adults can sing because they have acquired these elements as pups. We review the available evidence and suggest that pups learn adult vocalizations because they are born exactly at the right time to eavesdrop on singing adults. We conclude by advancing empirical predictions and testable hypotheses for future work. -
Düngen, D., Sarfati, M., & Ravignani, A. (2023). Cross-species research in biomusicality: Methods, pitfalls, and prospects. In E. H. Margulis, P. Loui, & D. Loughridge (
Eds. ), The science-music borderlands: Reckoning with the past and imagining the future (pp. 57-95). Cambridge, MA, USA: The MIT Press. doi:10.7551/mitpress/14186.003.0008. -
Eekhof, L. S., Van Krieken, K., Sanders, J., & Willems, R. M. (2023). Engagement with narrative characters: The role of social-cognitive abilities and linguistic viewpoint. Discourse Processes, 60(6), 411-439. doi:10.1080/0163853X.2023.2206773.
Abstract
This article explores the role of text and reader characteristics in character engagement experiences. In an online study, participants completed several self-report and behavioral measures of social-cognitive abilities and read two literary narratives in which the presence of linguistic viewpoint markers was varied using a highly controlled manipulation strategy. Afterward, participants reported on their character engagement experiences. A principal component analysis on participants’ responses revealed the multidimensional nature of character engagement, which included both self- and other-oriented emotional responses (e.g., empathy, personal distress) as well as more cognitive responses (e.g., identification, perspective taking). Furthermore, character engagement was found to rely on a wide range of social-cognitive abilities but not on the presence of viewpoint markers. Finally, and most importantly, we did not find convincing evidence for an interplay between social-cognitive abilities and the presence of viewpoint markers. These findings suggest that readers rely on their social-cognitive abilities to engage with the inner worlds of fictional others, more so than on the lexical cues of those inner worlds provided by the text. -
Egger, J. (2023). Need for speed? The role of speed of processing in early lexical development. PhD Thesis, Radboud University Nijmegen, Nijmegen.
Additional information
link to Radboud Repository -
Eijk, L. (2023). Linguistic alignment: The syntactic, prosodic, and segmental phonetic levels. PhD Thesis, Radboud University Nijmegen, Nijmegen.
Additional information
full text via Radboud Repository -
Garrido Rodriguez, G., Norcliffe, E., Brown, P., Huettig, F., & Levinson, S. C. (2023). Anticipatory processing in a verb-initial Mayan language: Eye-tracking evidence during sentence comprehension in Tseltal. Cognitive Science, 47(1): e13292. doi:10.1111/cogs.13219.
Abstract
We present a visual world eye-tracking study on Tseltal (a Mayan language) and investigate whether verbal information can be used to anticipate an upcoming referent. Basic word order in transitive sentences in Tseltal is Verb-Object-Subject (VOS). The verb is usually encountered first, making argument structure and syntactic information available at the outset, which should facilitate anticipation of the post-verbal arguments. Tseltal speakers listened to verb-initial sentences with either an object-predictive verb (e.g., ‘eat’) or a general verb (e.g., ‘look for’) (e.g., “Ya slo’/sle ta stukel on te kereme”, Is eating/is looking (for) by himself the avocado the boy/ “The boy is eating/is looking (for) an avocado by himself”) while seeing a visual display showing one potential referent (e.g., avocado) and three distractors (e.g., bag, toy car, coffee grinder). We manipulated verb type (predictive vs. general) and recorded participants' eye-movements while they listened and inspected the visual scene. Participants’ fixations to the target referent were analysed using multilevel logistic regression models. Shortly after hearing the predictive verb, participants fixated the target object before it was mentioned. In contrast, when the verb was general, fixations to the target only started to increase once the object was heard. Our results suggest that Tseltal hearers pre-activate semantic features of the grammatical object prior to its linguistic expression. This provides evidence from a verb-initial language for online incremental semantic interpretation and anticipatory processing during language comprehension. These processes are comparable to the ones identified in subject-initial languages, which is consistent with the notion that different languages follow similar universal processing principles. -
Giglio, L. (2023). Speaking in the Brain: How the brain produces and understands language. PhD Thesis, Radboud University Nijmegen, Nijmegen.
Additional information
full text via Radboud Repository -
González-Peñas, J., De Hoyos, L., Díaz-Caneja, C. M., Andreu-Bernabeu, Á., Stella, C., Gurriarán, X., Fañanás, L., Bobes, J., González-Pinto, A., Crespo-Facorro, B., Martorell, L., Vilella, E., Muntané, G., Molto, M. D., Gonzalez-Piqueras, J. C., Parellada, M., Arango, C., & Costas, J. (2023). Recent natural selection conferred protection against schizophrenia by non-antagonistic pleiotropy. Scientific Reports, 13: 15500. doi:10.1038/s41598-023-42578-0.
Abstract
Schizophrenia is a debilitating psychiatric disorder associated with a reduced fertility and decreased life expectancy, yet common predisposing variation substantially contributes to the onset of the disorder, which poses an evolutionary paradox. Previous research has suggested balanced selection, a mechanism by which schizophrenia risk alleles could also provide advantages under certain environments, as a reliable explanation. However, recent studies have shown strong evidence against a positive selection of predisposing loci. Furthermore, evolutionary pressures on schizophrenia risk alleles could have changed throughout human history as new environments emerged. Here in this study, we used 1000 Genomes Project data to explore the relationship between schizophrenia predisposing loci and recent natural selection (RNS) signatures after the human diaspora out of Africa around 100,000 years ago on a genome-wide scale. We found evidence for significant enrichment of RNS markers in derived alleles arisen during human evolution conferring protection to schizophrenia. Moreover, both partitioned heritability and gene set enrichment analyses of mapped genes from schizophrenia predisposing loci subject to RNS revealed a lower involvement in brain and neuronal related functions compared to those not subject to RNS. Taken together, our results suggest non-antagonistic pleiotropy as a likely mechanism behind RNS that could explain the persistence of schizophrenia common predisposing variation in human populations due to its association to other non-psychiatric phenotypes. -
Huisman, J. L. A., Van Hout, R., & Majid, A. (2023). Cross-linguistic constraints and lineage-specific developments in the semantics of cutting and breaking in Japonic and Germanic. Linguistic Typology, 27(1), 41-75. doi:10.1515/lingty-2021-2090.
Abstract
Semantic variation in the cutting and breaking domain has been shown to be constrained across languages in a previous typological study, but it was unclear whether Japanese was an outlier in this domain. Here we revisit cutting and breaking in the Japonic language area by collecting new naming data for 40 videoclips depicting cutting and breaking events in Standard Japanese, the highly divergent Tohoku dialects, as well as four related Ryukyuan languages (Amami, Okinawa, Miyako and Yaeyama). We find that the Japonic languages recapitulate the same semantic dimensions attested in the previous typological study, confirming that semantic variation in the domain of cutting and breaking is indeed cross-linguistically constrained. We then compare our new Japonic data to previously collected Germanic data and find that, in general, related languages resemble each other more than unrelated languages, and that the Japonic languages resemble each other more than the Germanic languages do. Nevertheless, English resembles all of the Japonic languages more than it resembles Swedish. Together, these findings show that the rate and extent of semantic change can differ between language families, indicating the existence of lineage-specific developments on top of universal cross-linguistic constraints. -
Hustá, C., Nieuwland, M. S., & Meyer, A. S. (2023). Effects of picture naming and categorization on concurrent comprehension: Evidence from the N400. Collabra: Psychology, 9(1): 88129. doi:10.1525/collabra.88129.
Abstract
n conversations, interlocutors concurrently perform two related processes: speech comprehension and speech planning. We investigated effects of speech planning on comprehension using EEG. Dutch speakers listened to sentences that ended with expected or unexpected target words. In addition, a picture was presented two seconds after target onset (Experiment 1) or 50 ms before target onset (Experiment 2). Participants’ task was to name the picture or to stay quiet depending on the picture category. In Experiment 1, we found a strong N400 effect in response to unexpected compared to expected target words. Importantly, this N400 effect was reduced in Experiment 2 compared to Experiment 1. Unexpectedly, the N400 effect was not smaller in the naming compared to categorization condition. This indicates that conceptual preparation or the decision whether to speak (taking place in both task conditions of Experiment 2) rather than processes specific to word planning interfere with comprehension.Additional information
EEG data, experimental scripts, and analysis scripts -
Jadoul, Y., Düngen, D., & Ravignani, A. (2023). Live-tracking acoustic parameters in animal behavioural experiments: Interactive bioacoustics with parselmouth. In A. Astolfi, F. Asdrubali, & L. Shtrepi (
Eds. ), Proceedings of the 10th Convention of the European Acoustics Association Forum Acusticum 2023 (pp. 4675-4678). Torino: European Acoustics Association.Abstract
Most bioacoustics software is used to analyse the already collected acoustics data in batch, i.e., after the data-collecting phase of a scientific study. However, experiments based on animal training require immediate and precise reactions from the experimenter, and thus do not easily dovetail with a typical bioacoustics workflow. Bridging this methodological gap, we have developed a custom application to live-monitor the vocal development of harbour seals in a behavioural experiment. In each trial, the application records and automatically detects an animal's call, and immediately measures duration and acoustic measures such as intensity, fundamental frequency, or formant frequencies. It then displays a spectrogram of the recording and the acoustic measurements, allowing the experimenter to instantly evaluate whether or not to reinforce the animal's vocalisation. From a technical perspective, the rapid and easy development of this custom software was made possible by combining multiple open-source software projects. Here, we integrated the acoustic analyses from Parselmouth, a Python library for Praat, together with PyAudio and Matplotlib's recording and plotting functionality, into a custom graphical user interface created with PyQt. This flexible recombination of different open-source Python libraries allows the whole program to be written in a mere couple of hundred lines of code -
Jodzio, A., Piai, V., Verhagen, L., Cameron, I., & Indefrey, P. (2023). Validity of chronometric TMS for probing the time-course of word production: A modified replication. Cerebral Cortex, 33(12), 7816-7829. doi:10.1093/cercor/bhad081.
Abstract
In the present study, we used chronometric TMS to probe the time-course of 3 brain regions during a picture naming task. The left inferior frontal gyrus, left posterior middle temporal gyrus, and left posterior superior temporal gyrus were all separately stimulated in 1 of 5 time-windows (225, 300, 375, 450, and 525 ms) from picture onset. We found posterior temporal areas to be causally involved in picture naming in earlier time-windows, whereas all 3 regions appear to be involved in the later time-windows. However, chronometric TMS produces nonspecific effects that may impact behavior, and furthermore, the time-course of any given process is a product of both the involved processing stages along with individual variation in the duration of each stage. We therefore extend previous work in the field by accounting for both individual variations in naming latencies and directly testing for nonspecific effects of TMS. Our findings reveal that both factors influence behavioral outcomes at the group level, underlining the importance of accounting for individual variations in naming latencies, especially for late processing stages closer to articulation, and recognizing the presence of nonspecific effects of TMS. The paper advances key considerations and avenues for future work using chronometric TMS to study overt production. -
Karadöller, D. Z., Sumer, B., Ünal, E., & Özyürek, A. (2023). Late sign language exposure does not modulate the relation between spatial language and spatial memory in deaf children and adults. Memory & Cognition, 51, 582-600. doi:10.3758/s13421-022-01281-7.
Abstract
Prior work with hearing children acquiring a spoken language as their first language shows that spatial language and cognition are related systems and spatial language use predicts spatial memory. Here, we further investigate the extent of this relationship in signing deaf children and adults and ask if late sign language exposure, as well as the frequency and the type of spatial language use that might be affected by late exposure, modulate subsequent memory for spatial relations. To do so, we compared spatial language and memory of 8-year-old late-signing children (after 2 years of exposure to a sign language at the school for the deaf) and late-signing adults to their native-signing counterparts. We elicited picture descriptions of Left-Right relations in Turkish Sign Language (Türk İşaret Dili) and measured the subsequent recognition memory accuracy of the described pictures. Results showed that late-signing adults and children were similar to their native-signing counterparts in how often they encoded the spatial relation. However, late-signing adults but not children differed from their native-signing counterparts in the type of spatial language they used. However, neither late sign language exposure nor the frequency and type of spatial language use modulated spatial memory accuracy. Therefore, even though late language exposure seems to influence the type of spatial language use, this does not predict subsequent memory for spatial relations. We discuss the implications of these findings based on the theories concerning the correspondence between spatial language and cognition as related or rather independent systems. -
Lei, A., Willems, R. M., & Eekhof, L. S. (2023). Emotions, fast and slow: Processing of emotion words is affected by individual differences in need for affect and narrative absorption. Cognition and Emotion, 37(5), 997-1005. doi:10.1080/02699931.2023.2216445.
Abstract
Emotional words have consistently been shown to be processed differently than neutral words. However, few studies have examined individual variability in emotion word processing with longer, ecologically valid stimuli (beyond isolated words, sentences, or paragraphs). In the current study, we re-analysed eye-tracking data collected during story reading to reveal how individual differences in need for affect and narrative absorption impact the speed of emotion word reading. Word emotionality was indexed by affective-aesthetic potentials (AAP) calculated by a sentiment analysis tool. We found that individuals with higher levels of need for affect and narrative absorption read positive words more slowly. On the other hand, these individual differences did not influence the reading time of more negative words, suggesting that high need for affect and narrative absorption are characterised by a positivity bias only. In general, unlike most previous studies using more isolated emotion word stimuli, we observed a quadratic (U-shaped) effect of word emotionality on reading speed, such that both positive and negative words were processed more slowly than neutral words. Taken together, this study emphasises the importance of taking into account individual differences and task context when studying emotion word processing. -
Levshina, N., Namboodiripad, S., Allassonnière-Tang, M., Kramer, M., Talamo, L., Verkerk, A., Wilmoth, S., Garrido Rodriguez, G., Gupton, T. M., Kidd, E., Liu, Z., Naccarato, C., Nordlinger, R., Panova, A., & Stoynova, N. (2023). Why we need a gradient approach to word order. Linguistics, 61(4), 825-883. doi:10.1515/ling-2021-0098.
Abstract
This article argues for a gradient approach to word order, which treats word order preferences, both within and across languages, as a continuous variable. Word order variability should be regarded as a basic assumption, rather than as something exceptional. Although this approach follows naturally from the emergentist usage-based view of language, we argue that it can be beneficial for all frameworks and linguistic domains, including language acquisition, processing, typology, language contact, language evolution and change, and formal approaches. Gradient approaches have been very fruitful in some domains, such as language processing, but their potential is not fully realized yet. This may be due to practical reasons. We discuss the most pressing methodological challenges in corpus-based and experimental research of word order and propose some practical solutions.Additional information
The datasets and code used for the quantitative case studies can be found in th… -
Liesenfeld, A., Lopez, A., & Dingemanse, M. (2023). Opening up ChatGPT: Tracking Openness, Transparency, and Accountability in Instruction-Tuned Text Generators. In CUI '23: Proceedings of the 5th International Conference on Conversational User Interfaces. doi:10.1145/3571884.3604316.
Abstract
Large language models that exhibit instruction-following behaviour represent one of the biggest recent upheavals in conversational interfaces, a trend in large part fuelled by the release of OpenAI's ChatGPT, a proprietary large language model for text generation fine-tuned through reinforcement learning from human feedback (LLM+RLHF). We review the risks of relying on proprietary software and survey the first crop of open-source projects of comparable architecture and functionality. The main contribution of this paper is to show that openness is differentiated, and to offer scientific documentation of degrees of openness in this fast-moving field. We evaluate projects in terms of openness of code, training data, model weights, RLHF data, licensing, scientific documentation, and access methods. We find that while there is a fast-growing list of projects billing themselves as 'open source', many inherit undocumented data of dubious legality, few share the all-important instruction-tuning (a key site where human labour is involved), and careful scientific documentation is exceedingly rare. Degrees of openness are relevant to fairness and accountability at all points, from data collection and curation to model architecture, and from training and fine-tuning to release and deployment.
-
Liesenfeld, A., Lopez, A., & Dingemanse, M. (2023). The timing bottleneck: Why timing and overlap are mission-critical for conversational user interfaces, speech recognition and dialogue systems. In Proceedings of the 24rd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDial 2023). doi:10.18653/v1/2023.sigdial-1.45.
Abstract
Speech recognition systems are a key intermediary in voice-driven human-computer interaction. Although speech recognition works well for pristine monologic audio, real-life use cases in open-ended interactive settings still present many challenges. We argue that timing is mission-critical for dialogue systems, and evaluate 5 major commercial ASR systems for their conversational and multilingual support. We find that word error rates for natural conversational data in 6 languages remain abysmal, and that overlap remains a key challenge (study 1). This impacts especially the recognition of conversational words (study 2), and in turn has dire consequences for downstream intent recognition (study 3). Our findings help to evaluate the current state of conversational ASR, contribute towards multidimensional error analysis and evaluation, and identify phenomena that need most attention on the way to build robust interactive speech technologies. -
Mamus, E., Speed, L. J., Rissman, L., Majid, A., & Özyürek, A. (2023). Lack of visual experience affects multimodal language production: Evidence from congenitally blind and sighted people. Cognitive Science, 47(1): e13228. doi:10.1111/cogs.13228.
Abstract
The human experience is shaped by information from different perceptual channels, but it is still debated whether and how differential experience influences language use. To address this, we compared congenitally blind, blindfolded, and sighted people's descriptions of the same motion events experienced auditorily by all participants (i.e., via sound alone) and conveyed in speech and gesture. Comparison of blind and sighted participants to blindfolded participants helped us disentangle the effects of a lifetime experience of being blind versus the task-specific effects of experiencing a motion event by sound alone. Compared to sighted people, blind people's speech focused more on path and less on manner of motion, and encoded paths in a more segmented fashion using more landmarks and path verbs. Gestures followed the speech, such that blind people pointed to landmarks more and depicted manner less than sighted people. This suggests that visual experience affects how people express spatial events in the multimodal language and that blindness may enhance sensitivity to paths of motion due to changes in event construal. These findings have implications for the claims that language processes are deeply rooted in our sensory experiences. -
Mazzini, S., Holler, J., & Drijvers, L. (2023). Studying naturalistic human communication using dual-EEG and audio-visual recordings. STAR Protocols, 4(3): 102370. doi:10.1016/j.xpro.2023.102370.
Abstract
We present a protocol to study naturalistic human communication using dual-EEG and audio-visual recordings. We describe preparatory steps for data collection including setup preparation, experiment design, and piloting. We then describe the data collection process in detail which consists of participant recruitment, experiment room preparation, and data collection. We also outline the kinds of research questions that can be addressed with the current protocol, including several analysis possibilities, from conversational to advanced time-frequency analyses.
For complete details on the use and execution of this protocol, please refer to Drijvers and Holler (2022).Additional information
additionally, exemplary data and scripts -
Mickan, A., McQueen, J. M., Brehm, L., & Lemhöfer, K. (2023). Individual differences in foreign language attrition: A 6-month longitudinal investigation after a study abroad. Language, Cognition and Neuroscience, 38(1), 11-39. doi:10.1080/23273798.2022.2074479.
Abstract
While recent laboratory studies suggest that the use of competing languages is a driving force in foreign language (FL) attrition (i.e. forgetting), research on “real” attriters has failed to demonstrate
such a relationship. We addressed this issue in a large-scale longitudinal study, following German students throughout a study abroad in Spain and their first six months back in Germany. Monthly,
percentage-based frequency of use measures enabled a fine-grained description of language use.
L3 Spanish forgetting rates were indeed predicted by the quantity and quality of Spanish use, and
correlated negatively with L1 German and positively with L2 English letter fluency. Attrition rates
were furthermore influenced by prior Spanish proficiency, but not by motivation to maintain
Spanish or non-verbal long-term memory capacity. Overall, this study highlights the importance
of language use for FL retention and sheds light on the complex interplay between language
use and other determinants of attrition. -
Nota, N., Trujillo, J. P., & Holler, J. (2023). Specific facial signals associate with categories of social actions conveyed through questions. PLoS One, 18(7): e0288104. doi:10.1371/journal.pone.0288104.
Abstract
The early recognition of fundamental social actions, like questions, is crucial for understanding the speaker’s intended message and planning a timely response in conversation. Questions themselves may express more than one social action category (e.g., an information request “What time is it?”, an invitation “Will you come to my party?” or a criticism “Are you crazy?”). Although human language use occurs predominantly in a multimodal context, prior research on social actions has mainly focused on the verbal modality. This study breaks new ground by investigating how conversational facial signals may map onto the expression of different types of social actions conveyed through questions. The distribution, timing, and temporal organization of facial signals across social actions was analysed in a rich corpus of naturalistic, dyadic face-to-face Dutch conversations. These social actions were: Information Requests, Understanding Checks, Self-Directed questions, Stance or Sentiment questions, Other-Initiated Repairs, Active Participation questions, questions for Structuring, Initiating or Maintaining Conversation, and Plans and Actions questions. This is the first study to reveal differences in distribution and timing of facial signals across different types of social actions. The findings raise the possibility that facial signals may facilitate social action recognition during language processing in multimodal face-to-face interaction.Additional information
supporting information -
Nota, N., Trujillo, J. P., Jacobs, V., & Holler, J. (2023). Facilitating question identification through natural intensity eyebrow movements in virtual avatars. Scientific Reports, 13: 21295. doi:10.1038/s41598-023-48586-4.
Abstract
In conversation, recognizing social actions (similar to ‘speech acts’) early is important to quickly understand the speaker’s intended message and to provide a fast response. Fast turns are typical for fundamental social actions like questions, since a long gap can indicate a dispreferred response. In multimodal face-to-face interaction, visual signals may contribute to this fast dynamic. The face is an important source of visual signalling, and previous research found that prevalent facial signals such as eyebrow movements facilitate the rapid recognition of questions. We aimed to investigate whether early eyebrow movements with natural movement intensities facilitate question identification, and whether specific intensities are more helpful in detecting questions. Participants were instructed to view videos of avatars where the presence of eyebrow movements (eyebrow frown or raise vs. no eyebrow movement) was manipulated, and to indicate whether the utterance in the video was a question or statement. Results showed higher accuracies for questions with eyebrow frowns, and faster response times for questions with eyebrow frowns and eyebrow raises. No additional effect was observed for the specific movement intensity. This suggests that eyebrow movements that are representative of naturalistic multimodal behaviour facilitate question recognition. -
Nota, N., Trujillo, J. P., & Holler, J. (2023). Conversational eyebrow frowns facilitate question identification: An online study using virtual avatars. Cognitive Science, 47(12): e13392. doi:10.1111/cogs.13392.
Abstract
Conversation is a time-pressured environment. Recognizing a social action (the ‘‘speech act,’’ such as a question requesting information) early is crucial in conversation to quickly understand the intended message and plan a timely response. Fast turns between interlocutors are especially relevant for responses to questions since a long gap may be meaningful by itself. Human language is multimodal, involving speech as well as visual signals from the body, including the face. But little is known about how conversational facial signals contribute to the communication of social actions. Some of the most prominent facial signals in conversation are eyebrow movements. Previous studies found links between eyebrow movements and questions, suggesting that these facial signals could contribute to the rapid recognition of questions. Therefore, we aimed to investigate whether early eyebrow movements (eyebrow frown or raise vs. no eyebrow movement) facilitate question identification. Participants were instructed to view videos of avatars where the presence of eyebrow movements accompanying questions was manipulated. Their task was to indicate whether the utterance was a question or a statement as accurately and quickly as possible. Data were collected using the online testing platform Gorilla. Results showed higher accuracies and faster response times for questions with eyebrow frowns, suggesting a facilitative role of eyebrow frowns for question identification. This means that facial signals can critically contribute to the communication of social actions in conversation by signaling social action-specific visual information and providing visual cues to speakers’ intentions.Additional information
link to preprint -
Nota, N. (2023). Talking faces: The contribution of conversational facial signals to language use and processing. PhD Thesis, Radboud University Nijmegen, Nijmegen.
-
Quaresima, A., Fitz, H., Duarte, R., Van den Broek, D., Hagoort, P., & Petersson, K. M. (2023). The Tripod neuron: A minimal structural reduction of the dendritic tree. The Journal of Physiology, 601(15), 3007-3437. doi:10.1113/JP283399.
Abstract
Neuron models with explicit dendritic dynamics have shed light on mechanisms for coincidence detection, pathway selection and temporal filtering. However, it is still unclear which morphological and physiological features are required to capture these phenomena. In this work, we introduce the Tripod neuron model and propose a minimal structural reduction of the dendritic tree that is able to reproduce these computations. The Tripod is a three-compartment model consisting of two segregated passive dendrites and a somatic compartment modelled as an adaptive, exponential integrate-and-fire neuron. It incorporates dendritic geometry, membrane physiology and receptor dynamics as measured in human pyramidal cells. We characterize the response of the Tripod to glutamatergic and GABAergic inputs and identify parameters that support supra-linear integration, coincidence-detection and pathway-specific gating through shunting inhibition. Following NMDA spikes, the Tripod neuron generates plateau potentials whose duration depends on the dendritic length and the strength of synaptic input. When fitted with distal compartments, the Tripod encodes previous activity into a dendritic depolarized state. This dendritic memory allows the neuron to perform temporal binding, and we show that it solves transition and sequence detection tasks on which a single-compartment model fails. Thus, the Tripod can account for dendritic computations previously explained only with more detailed neuron models or neural networks. Due to its simplicity, the Tripod neuron can be used efficiently in simulations of larger cortical circuits. -
Rasenberg, M. (2023). Mutual understanding from a multimodal and interactional perspective. PhD Thesis, Radboud University Nijmegen, Nijmegen.
Additional information
full text via Radboud Repository -
Roos, N. M., Takashima, A., & Piai, V. (2023). Functional neuroanatomy of lexical access in contextually and visually guided spoken word production. Cortex, 159, 254-267. doi:10.1016/j.cortex.2022.10.014.
Abstract
Lexical access is commonly studied using bare picture naming, which is visually guided, but in real-life conversation, lexical access is more commonly contextually guided. In this fMRI study, we examined the underlying functional neuroanatomy of contextually and visually guided lexical access, and its consistency across sessions. We employed a context-driven picture naming task with fifteen healthy speakers reading incomplete sentences (word-by-word) and subsequently naming the picture depicting the final word. Sentences provided either a constrained or unconstrained lead–in setting for the picture to be named, thereby approximating lexical access in natural language use. The picture name could be planned either through sentence context (constrained) or picture appearance (unconstrained). This procedure was repeated in an equivalent second session two to four weeks later with the same sample to test for test-retest consistency. Picture naming times showed a strong context effect, confirming that constrained sentences speed up production of the final word depicted as an image. fMRI results showed that the areas common to contextually and visually guided lexical access were left fusiform and left inferior frontal gyrus (both consistently active across-sessions), and middle temporal gyrus. However, non-overlapping patterns were also found, notably in the left temporal and parietal cortices, suggesting a different neural circuit for contextually versus visually guided lexical access.Additional information
supplementary material -
Sander, J., Lieberman, A., & Rowland, C. F. (2023). Exploring joint attention in American Sign Language: The influence of sign familiarity. In M. Goldwater, F. K. Anggoro, B. K. Hayes, & D. C. Ong (
Eds. ), Proceedings of the 45th Annual Meeting of the Cognitive Science Society (CogSci 2023) (pp. 632-638).Abstract
Children’s ability to share attention with another social partner (i.e., joint attention) has been found to support language development. Despite the large amount of research examining the effects of joint attention on language in hearing population, little is known about how deaf children learning sign languages achieve joint attention with their caregivers during natural social interaction and how caregivers provide and scaffold learning opportunities for their children. The present study investigates the properties and timing of joint attention surrounding familiar and novel naming events and their relationship to children’s vocabulary. Naturalistic play sessions of caretaker-child-dyads using American Sign Language were analyzed in regards to naming events of either familiar or novel object labeling events and the surrounding joint attention events. We observed that most naming events took place in the context of a successful joint attention event and that sign familiarity was related to the timing of naming events within the joint attention events. Our results suggest that caregivers are highly sensitive to their child’s visual attention in interactions and modulate joint attention differently in the context of naming events of familiar vs. novel object labels. -
Severijnen, G. G. A., Bosker, H. R., & McQueen, J. M. (2023). Syllable rate drives rate normalization, but is not the only factor. In R. Skarnitzl, & J. Volín (
Eds. ), Proceedings of the 20th International Congress of the Phonetic Sciences (ICPhS 2023) (pp. 56-60). Prague: Guarant International.Abstract
Speech is perceived relative to the speech rate in the context. It is unclear, however, what information listeners use to compute speech rate. The present study examines whether listeners use the number of
syllables per unit time (i.e., syllable rate) as a measure of speech rate, as indexed by subsequent vowel perception. We ran two rate-normalization experiments in which participants heard duration-matched word lists that contained either monosyllabic
vs. bisyllabic words (Experiment 1), or monosyllabic vs. trisyllabic pseudowords (Experiment 2). The participants’ task was to categorize an /ɑ-aː/ continuum that followed the word lists. The monosyllabic condition was perceived as slower (i.e., fewer /aː/ responses) than the bisyllabic and
trisyllabic condition. However, no difference was observed between bisyllabic and trisyllabic contexts. Therefore, while syllable rate is used in perceiving speech rate, other factors, such as fast speech processes, mean F0, and intensity, must also influence rate normalization. -
Severijnen, G. G. A., Di Dona, G., Bosker, H. R., & McQueen, J. M. (2023). Tracking talker-specific cues to lexical stress: Evidence from perceptual learning. Journal of Experimental Psychology: Human Perception and Performance, 49(4), 549-565. doi:10.1037/xhp0001105.
Abstract
When recognizing spoken words, listeners are confronted by variability in the speech signal caused by talker differences. Previous research has focused on segmental talker variability; less is known about how suprasegmental variability is handled. Here we investigated the use of perceptual learning to deal with between-talker differences in lexical stress. Two groups of participants heard Dutch minimal stress pairs (e.g., VOORnaam vs. voorNAAM, “first name” vs. “respectable”) spoken by two male talkers. Group 1 heard Talker 1 use only F0 to signal stress (intensity and duration values were ambiguous), while Talker 2 used only intensity (F0 and duration were ambiguous). Group 2 heard the reverse talker-cue mappings. After training, participants were tested on words from both talkers containing conflicting stress cues (“mixed items”; e.g., one spoken by Talker 1 with F0 signaling initial stress and intensity signaling final stress). We found that listeners used previously learned information about which talker used which cue to interpret the mixed items. For example, the mixed item described above tended to be interpreted as having initial stress by Group 1 but as having final stress by Group 2. This demonstrates that listeners learn how individual talkers signal stress and use that knowledge in spoken-word recognition.Additional information
XHP-2022-2184_Supplemental_materials_xhp0001105.docx -
Skirgård, H., Haynie, H. J., Blasi, D. E., Hammarström, H., Collins, J., Latarche, J. J., Lesage, J., Weber, T., Witzlack-Makarevich, A., Passmore, S., Chira, A., Maurits, L., Dinnage, R., Dunn, M., Reesink, G., Singer, R., Bowern, C., Epps, P. L., Hill, J., Vesakoski, O. Skirgård, H., Haynie, H. J., Blasi, D. E., Hammarström, H., Collins, J., Latarche, J. J., Lesage, J., Weber, T., Witzlack-Makarevich, A., Passmore, S., Chira, A., Maurits, L., Dinnage, R., Dunn, M., Reesink, G., Singer, R., Bowern, C., Epps, P. L., Hill, J., Vesakoski, O., Robbeets, M., Abbas, N. K., Auer, D., Bakker, N. A., Barbos, G., Borges, R. D., Danielsen, S., Dorenbusch, L., Dorn, E., Elliott, J., Falcone, G., Fischer, J., Ghanggo Ate, Y., Gibson, H., Göbel, H.-P., Goodall, J. A., Gruner, V., Harvey, A., Hayes, R., Heer, L., Herrera Miranda, R. E., Hübler, N., Huntington-Rainey, B. H., Ivani, J. K., Johns, M., Just, E., Kashima, E., Kipf, C., Klingenberg, J. V., König, N., Koti, A., Kowalik, R. G. A., Krasnoukhova, O., Lindvall, N. L. M., Lorenzen, M., Lutzenberger, H., Martins, T. R., Mata German, C., Van der Meer, S., Montoya Samamé, J., Müller, M., Muradoglu, S., Neely, K., Nickel, J., Norvik, M., Oluoch, C. A., Peacock, J., Pearey, I. O., Peck, N., Petit, S., Pieper, S., Poblete, M., Prestipino, D., Raabe, L., Raja, A., Reimringer, J., Rey, S. C., Rizaew, J., Ruppert, E., Salmon, K. K., Sammet, J., Schembri, R., Schlabbach, L., Schmidt, F. W., Skilton, A., Smith, W. D., De Sousa, H., Sverredal, K., Valle, D., Vera, J., Voß, J., Witte, T., Wu, H., Yam, S., Ye, J., Yong, M., Yuditha, T., Zariquiey, R., Forkel, R., Evans, N., Levinson, S. C., Haspelmath, M., Greenhill, S. J., Atkinson, Q., & Gray, R. D. (2023). Grambank reveals the importance of genealogical constraints on linguistic diversity and highlights the impact of language loss. Science Advances, 9(16): eadg6175. doi:10.1126/sciadv.adg6175.
Abstract
While global patterns of human genetic diversity are increasingly well characterized, the diversity of human languages remains less systematically described. Here, we outline the Grambank database. With over 400,000 data points and 2400 languages, Grambank is the largest comparative grammatical database available. The comprehensiveness of Grambank allows us to quantify the relative effects of genealogical inheritance and geographic proximity on the structural diversity of the world’s languages, evaluate constraints on linguistic diversity, and identify the world’s most unusual languages. An analysis of the consequences of language loss reveals that the reduction in diversity will be strikingly uneven across the major linguistic regions of the world. Without sustained efforts to document and revitalize endangered languages, our linguistic window into human history, cognition, and culture will be seriously fragmented. -
Slaats, S., Weissbart, H., Schoffelen, J.-M., Meyer, A. S., & Martin, A. E. (2023). Delta-band neural responses to individual words are modulated by sentence processing. The Journal of Neuroscience, 43(26), 4867-4883. doi:10.1523/JNEUROSCI.0964-22.2023.
Abstract
To understand language, we need to recognize words and combine them into phrases and sentences. During this process, responses to the words themselves are changed. In a step towards understanding how the brain builds sentence structure, the present study concerns the neural readout of this adaptation. We ask whether low-frequency neural readouts associated with words change as a function of being in a sentence. To this end, we analyzed an MEG dataset by Schoffelen et al. (2019) of 102 human participants (51 women) listening to sentences and word lists, the latter lacking any syntactic structure and combinatorial meaning. Using temporal response functions and a cumulative model-fitting approach, we disentangled delta- and theta-band responses to lexical information (word frequency), from responses to sensory- and distributional variables. The results suggest that delta-band responses to words are affected by sentence context in time and space, over and above entropy and surprisal. In both conditions, the word frequency response spanned left temporal and posterior frontal areas; however, the response appeared later in word lists than in sentences. In addition, sentence context determined whether inferior frontal areas were responsive to lexical information. In the theta band, the amplitude was larger in the word list condition around 100 milliseconds in right frontal areas. We conclude that low-frequency responses to words are changed by sentential context. The results of this study speak to how the neural representation of words is affected by structural context, and as such provide insight into how the brain instantiates compositionality in language. -
Snijders Blok, L., Verseput, J., Rots, D., Venselaar, H., Innes, A. M., Stumpel, C., Õunap, K., Reinson, K., Seaby, E. G., McKee, S., Burton, B., Kim, K., Van Hagen, J. M., Waisfisz, Q., Joset, P., Steindl, K., Rauch, A., Li, D., Zackai, E. H., Sheppard, S. E. and 29 moreSnijders Blok, L., Verseput, J., Rots, D., Venselaar, H., Innes, A. M., Stumpel, C., Õunap, K., Reinson, K., Seaby, E. G., McKee, S., Burton, B., Kim, K., Van Hagen, J. M., Waisfisz, Q., Joset, P., Steindl, K., Rauch, A., Li, D., Zackai, E. H., Sheppard, S. E., Keena, B., Hakonarson, H., Roos, A., Kohlschmidt, N., Cereda, A., Iascone, M., Rebessi, E., Kernohan, K. D., Campeau, P. M., Millan, F., Taylor, J. A., Lochmüller, H., Higgs, M. R., Goula, A., Bernhard, B., Velasco, D. J., Schmanski, A. A., Stark, Z., Gallacher, L., Pais, L., Marcogliese, P. C., Yamamoto, S., Raun, N., Jakub, T. E., Kramer, J. M., Den Hoed, J., Fisher, S. E., Brunner, H. G., & Kleefstra, T. (2023). A clustering of heterozygous missense variants in the crucial chromatin modifier WDR5 defines a new neurodevelopmental disorder. Human Genetics and Genomics Advances, 4(1): 100157. doi:10.1016/j.xhgg.2022.100157.
Abstract
WDR5 is a broadly studied, highly conserved key protein involved in a wide array of biological functions. Among these functions, WDR5 is a part of several protein complexes that affect gene regulation via post-translational modification of histones. We collected data from 11 unrelated individuals with six different rare de novo germline missense variants in WDR5; one identical variant was found in five individuals, and another variant in two individuals. All individuals had neurodevelopmental disorders including speech/language delays (N=11), intellectual disability (N=9), epilepsy (N=7) and autism spectrum disorder (N=4). Additional phenotypic features included abnormal growth parameters (N=7), heart anomalies (N=2) and hearing loss (N=2). Three-dimensional protein structures indicate that all the residues affected by these variants are located at the surface of one side of the WDR5 protein. It is predicted that five out of the six amino acid substitutions disrupt interactions of WDR5 with RbBP5 and/or KMT2A/C, as part of the COMPASS (complex proteins associated with Set1) family complexes. Our experimental approaches in Drosophila melanogaster and human cell lines show normal protein expression, localization and protein-protein interactions for all tested variants. These results, together with the clustering of variants in a specific region of WDR5 and the absence of truncating variants so far, suggest that dominant-negative or gain-of-function mechanisms might be at play. All in all, we define a neurodevelopmental disorder associated with missense variants in WDR5 and a broad range of features. This finding highlights the important role of genes encoding COMPASS family proteins in neurodevelopmental disorders. -
Stärk, K., Kidd, E., & Frost, R. L. A. (2023). Close encounters of the word kind: Attested distributional information boosts statistical learning. Language Learning, 73(2), 341-373. doi:10.1111/lang.12523.
Abstract
Statistical learning, the ability to extract regularities from input (e.g., in language), is likely supported by learners’ prior expectations about how component units co-occur. In this study, we investigated how adults’ prior experience with sublexical regularities in their native language influences performance on an empirical language learning task. Forty German-speaking adults completed a speech repetition task in which they repeated eight-syllable sequences from two experimental languages: one containing disyllabic words comprised of frequently occurring German syllable transitions (naturalistic words) and the other containing words made from unattested syllable transitions (non-naturalistic words). The participants demonstrated learning from both naturalistic and non-naturalistic stimuli. However, learning was superior for the naturalistic sequences, indicating that the participants had used their existing distributional knowledge of German to extract the naturalistic words faster and more accurately than the non-naturalistic words. This finding supports theories of statistical learning as a form of chunking, whereby frequently co-occurring units become entrenched in long-term memory. -
Tezcan, F., Weissbart, H., & Martin, A. E. (2023). A tradeoff between acoustic and linguistic feature encoding in spoken language comprehension. eLife, 12: e82386. doi:10.7554/eLife.82386.
Abstract
When we comprehend language from speech, the phase of the neural response aligns with particular features of the speech input, resulting in a phenomenon referred to as neural tracking. In recent years, a large body of work has demonstrated the tracking of the acoustic envelope and abstract linguistic units at the phoneme and word levels, and beyond. However, the degree to which speech tracking is driven by acoustic edges of the signal, or by internally-generated linguistic units, or by the interplay of both, remains contentious. In this study, we used naturalistic story-listening to investigate (1) whether phoneme-level features are tracked over and above acoustic edges, (2) whether word entropy, which can reflect sentence- and discourse-level constraints, impacted the encoding of acoustic and phoneme-level features, and (3) whether the tracking of acoustic edges was enhanced or suppressed during comprehension of a first language (Dutch) compared to a statistically familiar but uncomprehended language (French). We first show that encoding models with phoneme-level linguistic features, in addition to acoustic features, uncovered an increased neural tracking response; this signal was further amplified in a comprehended language, putatively reflecting the transformation of acoustic features into internally generated phoneme-level representations. Phonemes were tracked more strongly in a comprehended language, suggesting that language comprehension functions as a neural filter over acoustic edges of the speech signal as it transforms sensory signals into abstract linguistic units. We then show that word entropy enhances neural tracking of both acoustic and phonemic features when sentence- and discourse-context are less constraining. When language was not comprehended, acoustic features, but not phonemic ones, were more strongly modulated, but in contrast, when a native language is comprehended, phoneme features are more strongly modulated. Taken together, our findings highlight the flexible modulation of acoustic, and phonemic features by sentence and discourse-level constraint in language comprehension, and document the neural transformation from speech perception to language comprehension, consistent with an account of language processing as a neural filter from sensory to abstract representations. -
Uluşahin, O., Bosker, H. R., McQueen, J. M., & Meyer, A. S. (2023). No evidence for convergence to sub-phonemic F2 shifts in shadowing. In R. Skarnitzl, & J. Volín (
Eds. ), Proceedings of the 20th International Congress of the Phonetic Sciences (ICPhS 2023) (pp. 96-100). Prague: Guarant International.Abstract
Over the course of a conversation, interlocutors sound more and more like each other in a process called convergence. However, the automaticity and grain size of convergence are not well established. This study therefore examined whether female native Dutch speakers converge to large yet sub-phonemic shifts in the F2 of the vowel /e/. Participants first performed a short reading task to establish baseline F2s for the vowel /e/, then shadowed 120 target words (alongside 360 fillers) which contained one instance of a manipulated vowel /e/ where the F2 had been shifted down to that of the vowel /ø/. Consistent exposure to large (sub-phonemic) downward shifts in F2 did not result in convergence. The results raise issues for theories which view convergence as a product of automatic integration between perception and production. -
Zhang, Y., Ding, R., Frassinelli, D., Tuomainen, J., Klavinskis-Whiting, S., & Vigliocco, G. (2023). The role of multimodal cues in second language comprehension. Scientific Reports, 13: 20824. doi:10.1038/s41598-023-47643-2.
Abstract
In face-to-face communication, multimodal cues such as prosody, gestures, and mouth movements can play a crucial role in language processing. While several studies have addressed how these cues contribute to native (L1) language processing, their impact on non-native (L2) comprehension is largely unknown. Comprehension of naturalistic language by L2 comprehenders may be supported by the presence of (at least some) multimodal cues, as these provide correlated and convergent information that may aid linguistic processing. However, it is also the case that multimodal cues may be less used by L2 comprehenders because linguistic processing is more demanding than for L1 comprehenders, leaving more limited resources for the processing of multimodal cues. In this study, we investigated how L2 comprehenders use multimodal cues in naturalistic stimuli (while participants watched videos of a speaker), as measured by electrophysiological responses (N400) to words, and whether there are differences between L1 and L2 comprehenders. We found that prosody, gestures, and informative mouth movements each reduced the N400 in L2, indexing easier comprehension. Nevertheless, L2 participants showed weaker effects for each cue compared to L1 comprehenders, with the exception of meaningful gestures and informative mouth movements. These results show that L2 comprehenders focus on specific multimodal cues – meaningful gestures that support meaningful interpretation and mouth movements that enhance the acoustic signal – while using multimodal cues to a lesser extent than L1 comprehenders overall.Additional information
supplementary materials
Share this page