Displaying 1 - 33 of 33
-
Akamine, S., Ghaleb, E., Rasenberg, M., Fernandez, R., Meyer, A. S., & Özyürek, A. (2024). Speakers align both their gestures and words not only to establish but also to maintain reference to create shared labels for novel objects in interaction. In L. K. Samuelson, S. L. Frank, A. Mackey, & E. Hazeltine (
Eds. ), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 2435-2442).Abstract
When we communicate with others, we often repeat aspects of each other's communicative behavior such as sentence structures and words. Such behavioral alignment has been mostly studied for speech or text. Yet, language use is mostly multimodal, flexibly using speech and gestures to convey messages. Here, we explore the use of alignment in speech (words) and co-speech gestures (iconic gestures) in a referential communication task aimed at finding labels for novel objects in interaction. In particular, we investigate how people flexibly use lexical and gestural alignment to create shared labels for novel objects and whether alignment in speech and gesture are related over time. The present study shows that interlocutors establish shared labels multimodally, and alignment in words and iconic gestures are used throughout the interaction. We also show that the amount of lexical alignment positively associates with the amount of gestural alignment over time, suggesting a close relationship between alignment in the vocal and manual modalities.Additional information
link to eScholarship -
Ghaleb, E., Rasenberg, M., Pouw, W., Toni, I., Holler, J., Özyürek, A., & Fernandez, R. (2024). Analysing cross-speaker convergence through the lens of automatically detected shared linguistic constructions. In L. K. Samuelson, S. L. Frank, A. Mackey, & E. Hazeltine (
Eds. ), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 1717-1723).Abstract
Conversation requires a substantial amount of coordination between dialogue participants, from managing turn taking to negotiating mutual understanding. Part of this coordination effort surfaces as the reuse of linguistic behaviour across speakers, a process often referred to as alignment. While the presence of linguistic alignment is well documented in the literature, several questions remain open, including the extent to which patterns of reuse across speakers have an impact on the emergence of labelling conventions for novel referents. In this study, we put forward a methodology for automatically detecting shared lemmatised constructions---expressions with a common lexical core used by both speakers within a dialogue---and apply it to a referential communication corpus where participants aim to identify novel objects for which no established labels exist. Our analyses uncover the usage patterns of shared constructions in interaction and reveal that features such as their frequency and the amount of different constructions used for a referent are associated with the degree of object labelling convergence the participants exhibit after social interaction. More generally, the present study shows that automatically detected shared constructions offer a useful level of analysis to investigate the dynamics of reference negotiation in dialogue.Additional information
link to eScholarship -
Ghaleb, E., Burenko, I., Rasenberg, M., Pouw, W., Uhrig, P., Holler, J., Toni, I., Ozyurek, A., & Fernandez, R. (2024). Cospeech gesture detection through multi-phase sequence labeling. In Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024) (pp. 4007-4015).
Abstract
Gestures are integral components of face-to-face communication. They unfold over time, often following predictable movement phases of preparation, stroke, and re-
traction. Yet, the prevalent approach to automatic gesture detection treats the problem as binary classification, classifying a segment as either containing a gesture or not, thus failing to capture its inherently sequential and contextual nature. To address this, we introduce a novel framework that reframes the task as a multi-phase sequence labeling problem rather than binary classification. Our model processes sequences of skeletal movements over time windows, uses Transformer encoders to learn contextual embeddings, and leverages Conditional Random Fields to perform sequence labeling. We evaluate our proposal on a large dataset of diverse co-speech gestures in task-oriented face-to-face dialogues. The results consistently demonstrate that our method significantly outperforms strong baseline models in detecting gesture strokes. Furthermore, applying Transformer encoders to learn contextual embeddings from movement sequences substantially improves gesture unit detection. These results highlight our framework’s capacity to capture the fine-grained dynamics of co-speech gesture phases, paving the way for more nuanced and accurate gesture detection and analysis. -
Ghaleb, E., Khaertdinov, B., Pouw, W., Rasenberg, M., Holler, J., Ozyurek, A., & Fernandez, R. (2024). Learning co-speech gesture representations in dialogue through contrastive learning: An intrinsic evaluation. In Proceedings of the 26th International Conference on Multimodal Interaction (ICMI 2024) (pp. 274-283).
Abstract
In face-to-face dialogues, the form-meaning relationship of co-speech gestures varies depending on contextual factors such as what the gestures refer to and the individual characteristics of speakers. These factors make co-speech gesture representation learning challenging. How can we learn meaningful gestures representations considering gestures’ variability and relationship with speech? This paper tackles this challenge by employing self-supervised contrastive learning techniques to learn gesture representations from skeletal and speech information. We propose an approach that includes both unimodal and multimodal pre-training to ground gesture representations in co-occurring speech. For training, we utilize a face-to-face dialogue dataset rich with representational iconic gestures. We conduct thorough intrinsic evaluations of the learned representations through comparison with human-annotated pairwise gesture similarity. Moreover, we perform a diagnostic probing analysis to assess the possibility of recovering interpretable gesture features from the learned representations. Our results show a significant positive correlation with human-annotated gesture similarity and reveal that the similarity between the learned representations is consistent with well-motivated patterns related to the dynamics of dialogue interaction. Moreover, our findings demonstrate that several features concerning the form of gestures can be recovered from the latent representations. Overall, this study shows that multimodal contrastive learning is a promising approach for learning gesture representations, which opens the door to using such representations in larger-scale gesture analysis studies. -
Hagoort, P., & Özyürek, A. (2024). Extending the architecture of language from a multimodal perspective. Topics in Cognitive Science. Advance online publication. doi:10.1111/tops.12728.
Abstract
Language is inherently multimodal. In spoken languages, combined spoken and visual signals (e.g., co-speech gestures) are an integral part of linguistic structure and language representation. This requires an extension of the parallel architecture, which needs to include the visual signals concomitant to speech. We present the evidence for the multimodality of language. In addition, we propose that distributional semantics might provide a format for integrating speech and co-speech gestures in a common semantic representation. -
Karadöller, D. Z., Sümer, B., Ünal, E., & Özyürek, A. (2024). Sign advantage: Both children and adults’ spatial expressions in sign are more informative than those in speech and gestures combined. Journal of Child Language, 51(4), 876-902. doi:10.1017/S0305000922000642.
Abstract
Expressing Left-Right relations is challenging for speaking-children. Yet, this challenge was absent for signing-children, possibly due to iconicity in the visual-spatial modality of expression. We investigate whether there is also a modality advantage when speaking-children’s co-speech gestures are considered. Eight-year-old child and adult hearing monolingual Turkish speakers and deaf signers of Turkish-Sign-Language described pictures of objects in various spatial relations. Descriptions were coded for informativeness in speech, sign, and speech-gesture combinations for encoding Left-Right relations. The use of co-speech gestures increased the informativeness of speakers’ spatial expressions compared to speech-only. This pattern was more prominent for children than adults. However, signing-adults and children were more informative than child and adult speakers even when co-speech gestures were considered. Thus, both speaking- and signing-children benefit from iconic expressions in visual modality. Finally, in each modality, children were less informative than adults, pointing to the challenge of this spatial domain in development. -
Karadöller, D. Z., Peeters, D., Manhardt, F., Özyürek, A., & Ortega, G. (2024). Iconicity and gesture jointly facilitate learning of second language signs at first exposure in hearing non-signers. Language Learning, 74(4), 781-813. doi:10.1111/lang.12636.
Abstract
When learning a spoken second language (L2), words overlapping in form and meaning with one’s native language (L1) help break into the new language. When non-signing speakers learn a sign language as L2, such forms are absent because of the modality differences (L1:speech, L2:sign). In such cases, non-signing speakers might use iconic form-meaning mappings in signs or their own gestural experience as gateways into the to-be-acquired sign language. Here, we investigated how both these factors may contribute jointly to the acquisition of sign language vocabulary by hearing non-signers. Participants were presented with three types of sign in NGT (Sign Language of the Netherlands): arbitrary signs, iconic signs with high or low gesture overlap. Signs that were both iconic and highly overlapping with gestures boosted learning most at first exposure, and this effect remained the day after. Findings highlight the influence of modality-specific factors supporting the acquisition of a signed lexicon. -
Karadöller*, D. Z., Sümer*, B., & Özyürek, A. (2024). First-language acquisition in a multimodal language framework: Insights from speech, gesture, and sign. First Language. Advance online publication. doi:10.1177/01427237241290678.
Abstract
*=shared first authorship
Children across the world acquire their first language(s) naturally, regardless of typology or modality (e.g. sign or spoken). Various attempts have been made to explain the puzzle of language acquisition using several approaches, trying to understand to what extent it can be explained by what children bring to language-learning situations as well as what they learn from the input and the interactive context. However, most of these approaches consider only speech development, thus ignoring the inherently multimodal nature of human language. As a multimodal view of language is becoming more widely adopted for the study of adult language, a multimodal approach to language acquisition is inevitable. Not only do children have the capacity to learn spoken and sign language equally easily, but spoken language acquisition consists of learning to coordinate linguistic expressions in both modalities, that is, in both speech and gesture. To provide a step forward in this direction, this article aims to synthesize findings from research studies that take a multimodal perspective on language acquisition in different sign and spoken languages, including the development of speech and accompanying gestures. Our review shows that while some aspects of language acquisition seem to be modality-independent, others might differ according to the affordances of each modality when used separately as well as together (either in sign, speech, and/or gesture). We argue that these findings need to be integrated into our understanding of language acquisition. We also identify which areas need future research for both spoken and sign language acquisition, taking into account not only multimodal but also cross-linguistic variation. -
Sekine, K., & Özyürek, A. (2024). Children benefit from gestures to understand degraded speech but to a lesser extent than adults. Frontiers in Psychology, 14: 1305562. doi:10.3389/fpsyg.2023.1305562.
Abstract
The present study investigated to what extent children, compared to adults, benefit from gestures to disambiguate degraded speech by manipulating speech signals and manual modality. Dutch-speaking adults (N = 20) and 6- and 7-year-old children (N = 15) were presented with a series of video clips in which an actor produced a Dutch action verb with or without an accompanying iconic gesture. Participants were then asked to repeat what they had heard. The speech signal was either clear or altered into 4- or 8-band noise-vocoded speech. Children had more difficulty than adults in disambiguating degraded speech in the speech-only condition. However, when presented with both speech and gestures, children reached a comparable level of accuracy to that of adults in the degraded-speech-only condition. Furthermore, for adults, the enhancement of gestures was greater in the 4-band condition than in the 8-band condition, whereas children showed the opposite pattern. Gestures help children to disambiguate degraded speech, but children need more phonological information than adults to benefit from use of gestures. Children’s multimodal language integration needs to further develop to adapt flexibly to challenging situations such as degraded speech, as tested in our study, or instances where speech is heard with environmental noise or through a face mask.Additional information
supplemental material -
Ünal, E., Mamus, E., & Özyürek, A. (2024). Multimodal encoding of motion events in speech, gesture, and cognition. Language and Cognition, 16(4), 785-804. doi:10.1017/langcog.2023.61.
Abstract
How people communicate about motion events and how this is shaped by language typology are mostly studied with a focus on linguistic encoding in speech. Yet, human communication typically involves an interactional exchange of multimodal signals, such as hand gestures that have different affordances for representing event components. Here, we review recent empirical evidence on multimodal encoding of motion in speech and gesture to gain a deeper understanding of whether and how language typology shapes linguistic expressions in different modalities, and how this changes across different sensory modalities of input and interacts with other aspects of cognition. Empirical evidence strongly suggests that Talmy’s typology of event integration predicts multimodal event descriptions in speech and gesture and visual attention to event components prior to producing these descriptions. Furthermore, variability within the event itself, such as type and modality of stimuli, may override the influence of language typology, especially for expression of manner. -
Eijk, L., Rasenberg, M., Arnese, F., Blokpoel, M., Dingemanse, M., Doeller, C. F., Ernestus, M., Holler, J., Milivojevic, B., Özyürek, A., Pouw, W., Van Rooij, I., Schriefers, H., Toni, I., Trujillo, J. P., & Bögels, S. (2022). The CABB dataset: A multimodal corpus of communicative interactions for behavioural and neural analyses. NeuroImage, 264: 119734. doi:10.1016/j.neuroimage.2022.119734.
Abstract
We present a dataset of behavioural and fMRI observations acquired in the context of humans involved in multimodal referential communication. The dataset contains audio/video and motion-tracking recordings of face-to-face, task-based communicative interactions in Dutch, as well as behavioural and neural correlates of participants’ representations of dialogue referents. Seventy-one pairs of unacquainted participants performed two interleaved interactional tasks in which they described and located 16 novel geometrical objects (i.e., Fribbles) yielding spontaneous interactions of about one hour. We share high-quality video (from three cameras), audio (from head-mounted microphones), and motion-tracking (Kinect) data, as well as speech transcripts of the interactions. Before and after engaging in the face-to-face communicative interactions, participants’ individual representations of the 16 Fribbles were estimated. Behaviourally, participants provided a written description (one to three words) for each Fribble and positioned them along 29 independent conceptual dimensions (e.g., rounded, human, audible). Neurally, fMRI signal evoked by each Fribble was measured during a one-back working-memory task. To enable functional hyperalignment across participants, the dataset also includes fMRI measurements obtained during visual presentation of eight animated movies (35 minutes total). We present analyses for the various types of data demonstrating their quality and consistency with earlier research. Besides high-resolution multimodal interactional data, this dataset includes different correlates of communicative referents, obtained before and after face-to-face dialogue, allowing for novel investigations into the relation between communicative behaviours and the representational space shared by communicators. This unique combination of data can be used for research in neuroscience, psychology, linguistics, and beyond. -
Kan, U., Gökgöz, K., Sumer, B., Tamyürek, E., & Özyürek, A. (2022). Emergence of negation in a Turkish homesign system: Insights from the family context. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (
Eds. ), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 387-389). Nijmegen: Joint Conference on Language Evolution (JCoLE). -
Rasenberg, M., Pouw, W., Özyürek, A., & Dingemanse, M. (2022). The multimodal nature of communicative efficiency in social interaction. Scientific Reports, 12: 19111. doi:10.1038/s41598-022-22883-w.
Abstract
How does communicative efficiency shape language use? We approach this question by studying it at the level of the dyad, and in terms of multimodal utterances. We investigate whether and how people minimize their joint speech and gesture efforts in face-to-face interactions, using linguistic and kinematic analyses. We zoom in on other-initiated repair—a conversational microcosm where people coordinate their utterances to solve problems with perceiving or understanding. We find that efforts in the spoken and gestural modalities are wielded in parallel across repair turns of different types, and that people repair conversational problems in the most cost-efficient way possible, minimizing the joint multimodal effort for the dyad as a whole. These results are in line with the principle of least collaborative effort in speech and with the reduction of joint costs in non-linguistic joint actions. The results extend our understanding of those coefficiency principles by revealing that they pertain to multimodal utterance design.Additional information
Data and analysis scripts -
Rasenberg, M., Özyürek, A., Bögels, S., & Dingemanse, M. (2022). The primacy of multimodal alignment in converging on shared symbols for novel referents. Discourse Processes, 59(3), 209-236. doi:10.1080/0163853X.2021.1992235.
Abstract
When people establish shared symbols for novel objects or concepts, they have been shown to rely on the use of multiple communicative modalities as well as on alignment (i.e., cross-participant repetition of communicative behavior). Yet these interactional resources have rarely been studied together, so little is known about if and how people combine multiple modalities in alignment to achieve joint reference. To investigate this, we systematically track the emergence of lexical and gestural alignment in a referential communication task with novel objects. Quantitative analyses reveal that people frequently use a combination of lexical and gestural alignment, and that such multimodal alignment tends to emerge earlier compared to unimodal alignment. Qualitative analyses of the interactional contexts in which alignment emerges reveal how people flexibly deploy lexical and gestural alignment (independently, simultaneously or successively) to adjust to communicative pressures. -
Schubotz, L., Özyürek, A., & Holler, J. (2022). Individual differences in working memory and semantic fluency predict younger and older adults' multimodal recipient design in an interactive spatial task. Acta Psychologica, 229: 103690. doi:10.1016/j.actpsy.2022.103690.
Abstract
Aging appears to impair the ability to adapt speech and gestures based on knowledge shared with an addressee
(common ground-based recipient design) in narrative settings. Here, we test whether this extends to spatial settings
and is modulated by cognitive abilities. Younger and older adults gave instructions on how to assemble 3D-
models from building blocks on six consecutive trials. We induced mutually shared knowledge by either
showing speaker and addressee the model beforehand, or not. Additionally, shared knowledge accumulated
across the trials. Younger and crucially also older adults provided recipient-designed utterances, indicated by a
significant reduction in the number of words and of gestures when common ground was present. Additionally, we
observed a reduction in semantic content and a shift in cross-modal distribution of information across trials.
Rather than age, individual differences in verbal and visual working memory and semantic fluency predicted the
extent of addressee-based adaptations. Thus, in spatial tasks, individual cognitive abilities modulate the inter-
active language use of both younger and older adulAdditional information
1-s2.0-S0001691822002050-mmc1.docx -
Slonimska, A., Özyürek, A., & Capirci, O. (2022). Simultaneity as an emergent property of efficient communication in language: A comparison of silent gesture and sign language. Wiley Interdisciplinary Reviews: Cognitive Science, 46(5): 13133. doi:10.1111/cogs.13133.
Abstract
Sign languages use multiple articulators and iconicity in the visual modality which allow linguistic units to be organized not only linearly but also simultaneously. Recent research has shown that users of an established sign language such as LIS (Italian Sign Language) use simultaneous and iconic constructions as a modality-specific resource to achieve communicative efficiency when they are required to encode informationally rich events. However, it remains to be explored whether the use of such simultaneous and iconic constructions recruited for communicative efficiency can be employed even without a linguistic system (i.e., in silent gesture) or whether they are specific to linguistic patterning (i.e., in LIS). In the present study, we conducted the same experiment as in Slonimska et al. with 23 Italian speakers using silent gesture and compared the results of the two studies. The findings showed that while simultaneity was afforded by the visual modality to some extent, its use in silent gesture was nevertheless less frequent and qualitatively different than when used within a linguistic system. Thus, the use of simultaneous and iconic constructions for communicative efficiency constitutes an emergent property of sign languages. The present study highlights the importance of studying modality-specific resources and their use for linguistic expression in order to promote a more thorough understanding of the language faculty and its modality-specific adaptive capabilities. -
Slonimska, A., Özyürek, A., & Capirci, O. (2022). Simultaneity as an emergent property of sign languages. In A. Ravignani, R. Asano, D. Valente, F. Ferretti, S. Hartmann, M. Hayashi, Y. Jadoul, M. Martins, Y. Oseki, E. D. Rodrigues, O. Vasileva, & S. Wacewicz (
Eds. ), The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE) (pp. 678-680). Nijmegen: Joint Conference on Language Evolution (JCoLE). -
Sumer, B., & Özyürek, A. (2022). Cross-modal investigation of event component omissions in language development: A comparison of signing and speaking children. Language, Cognition and Neuroscience, 37(8), 1023-1039. doi:10.1080/23273798.2022.2042336.
Abstract
Language development research suggests a universal tendency for children to be under- informative in narrating motion events by omitting components such as Path, Manner or Ground. However, this assumption has not been tested for children acquiring sign language. Due to the affordances of the visual-spatial modality of sign languages for iconic expression, signing children might omit event components less frequently than speaking children. Here we analysed motion event descriptions elicited from deaf children (4–10 years) acquiring Turkish Sign Language (TİD) and their Turkish-speaking peers. While children omitted all types of event components more often than adults, signing children and adults encoded more Path and Manner in TİD than their peers in Turkish. These results provide more evidence for a general universal tendency for children to omit event components as well as a modality bias for sign languages to encode both Manner and Path more frequently than spoken languages. -
Sumer, B., & Özyürek, A. (2022). Language use in deaf children with early-signing versus late-signing deaf parents. Frontiers in Communication, 6: 804900. doi:10.3389/fcomm.2021.804900.
Abstract
Previous research has shown that spatial language is sensitive to the effects of delayed language exposure. Locative encodings of late-signing deaf adults varied from those of early-signing deaf adults in the preferred types of linguistic forms. In the current study, we investigated whether such differences would be found in spatial language use of deaf children with deaf parents who are either early or late signers of Turkish Sign Language (TİD). We analyzed locative encodings elicited from these two groups of deaf children for the use of different linguistic forms and the types of classifier handshapes. Our findings revealed differences between these two groups of deaf children in their preferred types of linguistic forms, which showed parallels to differences between late versus early deaf adult signers as reported by earlier studies. Deaf children in the current study, however, were similar to each other in the type of classifier handshapes that they used in their classifier constructions. Our findings have implications for expanding current knowledge on to what extent variation in language input (i.e., from early vs. late deaf signers) is reflected in children’s productions as well as the role of linguistic input on language development in general. -
Ter Bekke, M., Özyürek, A., & Ünal, E. (2022). Speaking but not gesturing predicts event memory: A cross-linguistic comparison. Language and Cognition, 14(3), 362-384. doi:10.1017/langcog.2022.3.
Abstract
Every day people see, describe, and remember motion events. However, the relation between multimodal encoding of motion events in speech and gesture, and memory is not yet fully understood. Moreover, whether language typology modulates this relation remains to be tested. This study investigates whether the type of motion event information (path or manner) mentioned in speech and gesture predicts which information is remembered and whether this varies across speakers of typologically different languages. Dutch- and Turkish-speakers watched and described motion events and completed a surprise recognition memory task. For both Dutch- and Turkish-speakers, manner memory was at chance level. Participants who mentioned path in speech during encoding were more accurate at detecting changes to the path in the memory task. The relation between mentioning path in speech and path memory did not vary cross-linguistically. Finally, the co-speech gesture did not predict memory above mentioning path in speech. These findings suggest that how speakers describe a motion event in speech is more important than the typology of the speakers’ native language in predicting motion event memory. The motion event videos are available for download for future research at https://osf.io/p8cas/.Additional information
S1866980822000035sup001.docx -
Trujillo, J. P., Özyürek, A., Kan, C., Sheftel-Simanova, I., & Bekkering, H. (2022). Differences in functional brain organization during gesture recognition between autistic and neurotypical individuals. Social Cognitive and Affective Neuroscience, 17(11), 1021-1034. doi:10.1093/scan/nsac026.
Abstract
Persons with and without autism process sensory information differently. Differences in sensory processing are directly relevant to social functioning and communicative abilities, which are known to be hampered in persons with autism. We collected functional magnetic resonance imaging (fMRI) data from 25 autistic individuals and 25 neurotypical individuals while they performed a silent gesture recognition task. We exploited brain network topology, a holistic quantification of how networks within the brain are organized to provide new insights into how visual communicative signals are processed in autistic and neurotypical individuals. Performing graph theoretical analysis, we calculated two network properties of the action observation network: local efficiency, as a measure of network segregation, and global efficiency, as a measure of network integration. We found that persons with autism and neurotypical persons differ in how the action observation network is organized. Persons with autism utilize a more clustered, local-processing-oriented network configuration (i.e., higher local efficiency), rather than the more integrative network organization seen in neurotypicals (i.e., higher global efficiency). These results shed new light on the complex interplay between social and sensory processing in autism.Additional information
nsac026_supp.zip -
Ünal, E., Manhardt, F., & Özyürek, A. (2022). Speaking and gesturing guide event perception during message conceptualization: Evidence from eye movements. Cognition, 225: 105127. doi:10.1016/j.cognition.2022.105127.
Abstract
Speakers’ visual attention to events is guided by linguistic conceptualization of information in spoken language
production and in language-specific ways. Does production of language-specific co-speech gestures further guide
speakers’ visual attention during message preparation? Here, we examine the link between visual attention and
multimodal event descriptions in Turkish. Turkish is a verb-framed language where speakers’ speech and gesture
show language specificity with path of motion mostly expressed within the main verb accompanied by path
gestures. Turkish-speaking adults viewed motion events while their eye movements were recorded during non-
linguistic (viewing-only) and linguistic (viewing-before-describing) tasks. The relative attention allocated to path
over manner was higher in the linguistic task compared to the non-linguistic task. Furthermore, the relative
attention allocated to path over manner within the linguistic task was higher when speakers (a) encoded path in
the main verb versus outside the verb and (b) used additional path gestures accompanying speech versus not.
Results strongly suggest that speakers’ visual attention is guided by language-specific event encoding not only in
speech but also in gesture. This provides evidence consistent with models that propose integration of speech and
gesture at the conceptualization level of language production and suggests that the links between the eye and the
mouth may be extended to the eye and the hand. -
Allen, S., Ozyurek, A., Kita, S., Brown, A., Furman, R., Ishizuka, T., & Fujii, M. (2007). Language-specific and universal influences in children's syntactic packaging of manner and path: A comparison of English, Japanese, and Turkish. Cognition, 102, 16-48. doi:10.1016/j.cognition.2005.12.006.
Abstract
Different languages map semantic elements of spatial relations onto different lexical and syntactic units. These crosslinguistic differences raise important questions for language development in terms of how this variation is learned by children. We investigated how Turkish-, English-, and Japanese-speaking children (mean age 3;8) package the semantic elements of Manner and Path onto syntactic units when both the Manner and the Path of the moving Figure occur simultaneously and are salient in the event depicted. Both universal and language-specific patterns were evident in our data. Children used the semantic-syntactic mappings preferred by adult speakers of their own languages, and even expressed subtle syntactic differences that encode different relations between Manner and Path in the same way as their adult counterparts (i.e., Manner causing vs. incidental to Path). However, not all types of semantics-syntax mappings were easy for children to learn (e.g., expressing Manner and Path elements in two verbal clauses). In such cases, Turkish- and Japanese-speaking children frequently used syntactic patterns that were not typical in the target language but were similar to patterns used by English-speaking children, suggesting some universal influence. Thus, both language-specific and universal tendencies guide the development of complex spatial expressions. -
Furman, R., & Ozyurek, A. (2007). Development of interactional discourse markers: Insights from Turkish children's and adults' narratives. Journal of Pragmatics, 39(10), 1742-1757. doi:10.1016/j.pragma.2007.01.008.
Abstract
Discourse markers (DMs) are linguistic elements that index different relations and coherence between units of talk (Schiffrin, Deborah, 1987. Discourse Markers. Cambridge University Press, Cambridge). Most research on the development of these forms has focused on conversations rather than narratives and furthermore has not directly compared children's use of DMs to adult usage. This study examines the development of three DMs (şey ‘uuhh’, yani ‘I mean’, işte ‘y’know’) that mark interactional levels of discourse in oral Turkish narratives in 60 Turkish children (3-, 5- and 9-year-olds) and 20 Turkish-speaking adults. The results show that the frequency and functions of DMs change with age. Children learn şey, which mainly marks exchange level structures, earliest. However, yani and işte have multi-functions such as marking both information states and participation frameworks and are consequently learned later. Children also use DMs with different functions than adults. Overall, the results show that learning to use interactional DMs in narratives is complex and goes beyond age 9, especially for multi-functional DMs that index an interplay of discourse coherence at different levels. -
Gürcanli, Ö., Nakipoglu Demiralp, M., & Ozyurek, A. (2007). Shared information and argument omission in Turkish. In H. Caunt-Nulton, S. Kulatilake, & I. Woo (
Eds. ), Proceedings of the 31st Annual Boston University Conference on Language Developement (pp. 267-273). Somerville, Mass: Cascadilla Press. -
Kelly, S. D., & Ozyurek, A. (
Eds. ). (2007). Gesture, language, and brain [Special Issue]. Brain and Language, 101(3). -
Kita, S., Ozyurek, A., Allen, S., Brown, A., Furman, R., & Ishizuka, T. (2007). Relations between syntactic encoding and co-speech gestures: Implications for a model of speech and gesture production. Language and Cognitive Processes, 22(8), 1212-1236. doi:10.1080/01690960701461426.
Abstract
Gestures that accompany speech are known to be tightly coupled with speech production. However little is known about the cognitive processes that underlie this link. Previous cross-linguistic research has provided preliminary evidence for online interaction between the two systems based on the systematic co-variation found between how different languages syntactically package Manner and Path information of a motion event and how gestures represent Manner and Path. Here we elaborate on this finding by testing whether speakers within the same language gesturally express Manner and Path differently according to their online choice of syntactic packaging of Manner and Path, or whether gestural expression is pre-determined by a habitual conceptual schema congruent with the linguistic typology. Typologically congruent and incongruent syntactic structures for expressing Manner and Path (i.e., in a single clause or multiple clauses) were elicited from English speakers. We found that gestural expressions were determined by the online choice of syntactic packaging rather than by a habitual conceptual schema. It is therefore concluded that speech and gesture production processes interface online at the conceptual planning phase. Implications of the findings for models of speech and gesture production are discussed -
Kita, S., & Ozyurek, A. (2007). How does spoken language shape iconic gestures? In S. Duncan, J. Cassel, & E. Levy (
Eds. ), Gesture and the dynamic dimension of language (pp. 67-74). Amsterdam: Benjamins. -
Ozyurek, A., Willems, R. M., Kita, S., & Hagoort, P. (2007). On-line integration of semantic information from speech and gesture: Insights from event-related brain potentials. Journal of Cognitive Neuroscience, 19(4), 605-616. doi:10.1162/jocn.2007.19.4.605.
Abstract
During language comprehension, listeners use the global semantic representation from previous sentence or discourse context to immediately integrate the meaning of each upcoming word into the unfolding message-level representation. Here we investigate whether communicative gestures that often spontaneously co-occur with speech are processed in a similar fashion and integrated to previous sentence context in the same way as lexical meaning. Event-related potentials were measured while subjects listened to spoken sentences with a critical verb (e.g., knock), which was accompanied by an iconic co-speech gesture (i.e., KNOCK). Verbal and/or gestural semantic content matched or mismatched the content of the preceding part of the sentence. Despite the difference in the modality and in the specificity of meaning conveyed by spoken words and gestures, the latency, amplitude, and topographical distribution of both word and gesture mismatches are found to be similar, indicating that the brain integrates both types of information simultaneously. This provides evidence for the claim that neural processing in language comprehension involves the simultaneous incorporation of information coming from a broader domain of cognition than only verbal semantics. The neural evidence for similar integration of information from speech and gesture emphasizes the tight interconnection between speech and co-speech gestures. -
Ozyurek, A. (2007). Processing of multi-modal semantic information: Insights from cross-linguistic comparisons and neurophysiological recordings. In T. Sakamoto (
Ed. ), Communicating skills of intention (pp. 131-142). Tokyo: Hituzi Syobo Publishing. -
Ozyurek, A., & Kelly, S. D. (2007). Gesture, language, and brain. Brain and Language, 101(3), 181-185. doi:10.1016/j.bandl.2007.03.006.
-
Ozyurek, A., Kita, S., Allen, S., Furman, R., & Brown, A. (2007). How does linguistic framing of events influence co-speech gestures? Insights from crosslinguistic variations and similarities. In K. Liebal, C. Müller, & S. Pika (
Eds. ), Gestural communication in nonhuman and human primates (pp. 199-218). Amsterdam: Benjamins.Abstract
What are the relations between linguistic encoding and gestural representations of events during online speaking? The few studies that have been conducted on this topic have yielded somewhat incompatible results with regard to whether and how gestural representations of events change with differences in the preferred semantic and syntactic encoding possibilities of languages. Here we provide large scale semantic, syntactic and temporal analyses of speech- gesture pairs that depict 10 different motion events from 20 Turkish and 20 English speakers. We find that the gestural representations of the same events differ across languages when they are encoded by different syntactic frames (i.e., verb-framed or satellite-framed). However, where there are similarities across languages, such as omission of a certain element of the event in the linguistic encoding, gestural representations also look similar and omit the same content. The results are discussed in terms of what gestures reveal about the influence of language specific encoding on on-line thinking patterns and the underlying interactions between speech and gesture during the speaking process. -
Willems, R. M., Ozyurek, A., & Hagoort, P. (2007). When language meets action: The neural integration of gesture and speech. Cerebral Cortex, 17(10), 2322-2333. doi:10.1093/cercor/bhl141.
Abstract
Although generally studied in isolation, language and action often co-occur in everyday life. Here we investigated one particular form of simultaneous language and action, namely speech and gestures that speakers use in everyday communication. In a functional magnetic resonance imaging study, we identified the neural networks involved in the integration of semantic information from speech and gestures. Verbal and/or gestural content could be integrated easily or less easily with the content of the preceding part of speech. Premotor areas involved in action observation (Brodmann area [BA] 6) were found to be specifically modulated by action information "mismatching" to a language context. Importantly, an increase in integration load of both verbal and gestural information into prior speech context activated Broca's area and adjacent cortex (BA 45/47). A classical language area, Broca's area, is not only recruited for language-internal processing but also when action observation is integrated with speech. These findings provide direct evidence that action and language processing share a high-level neural integration system.
Share this page