Displaying 1 - 18 of 18
-
Akamine, S., Ghaleb, E., Rasenberg, M., Fernandez, R., Meyer, A. S., & Özyürek, A. (2024). Speakers align both their gestures and words not only to establish but also to maintain reference to create shared labels for novel objects in interaction. In L. K. Samuelson, S. L. Frank, A. Mackey, & E. Hazeltine (
Eds. ), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 2435-2442).Abstract
When we communicate with others, we often repeat aspects of each other's communicative behavior such as sentence structures and words. Such behavioral alignment has been mostly studied for speech or text. Yet, language use is mostly multimodal, flexibly using speech and gestures to convey messages. Here, we explore the use of alignment in speech (words) and co-speech gestures (iconic gestures) in a referential communication task aimed at finding labels for novel objects in interaction. In particular, we investigate how people flexibly use lexical and gestural alignment to create shared labels for novel objects and whether alignment in speech and gesture are related over time. The present study shows that interlocutors establish shared labels multimodally, and alignment in words and iconic gestures are used throughout the interaction. We also show that the amount of lexical alignment positively associates with the amount of gestural alignment over time, suggesting a close relationship between alignment in the vocal and manual modalities.Additional information
link to eScholarship -
Ghaleb, E., Rasenberg, M., Pouw, W., Toni, I., Holler, J., Özyürek, A., & Fernandez, R. (2024). Analysing cross-speaker convergence through the lens of automatically detected shared linguistic constructions. In L. K. Samuelson, S. L. Frank, A. Mackey, & E. Hazeltine (
Eds. ), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 1717-1723).Abstract
Conversation requires a substantial amount of coordination between dialogue participants, from managing turn taking to negotiating mutual understanding. Part of this coordination effort surfaces as the reuse of linguistic behaviour across speakers, a process often referred to as alignment. While the presence of linguistic alignment is well documented in the literature, several questions remain open, including the extent to which patterns of reuse across speakers have an impact on the emergence of labelling conventions for novel referents. In this study, we put forward a methodology for automatically detecting shared lemmatised constructions---expressions with a common lexical core used by both speakers within a dialogue---and apply it to a referential communication corpus where participants aim to identify novel objects for which no established labels exist. Our analyses uncover the usage patterns of shared constructions in interaction and reveal that features such as their frequency and the amount of different constructions used for a referent are associated with the degree of object labelling convergence the participants exhibit after social interaction. More generally, the present study shows that automatically detected shared constructions offer a useful level of analysis to investigate the dynamics of reference negotiation in dialogue.Additional information
link to eScholarship -
Ghaleb, E., Burenko, I., Rasenberg, M., Pouw, W., Uhrig, P., Holler, J., Toni, I., Ozyurek, A., & Fernandez, R. (2024). Cospeech gesture detection through multi-phase sequence labeling. In Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024) (pp. 4007-4015).
Abstract
Gestures are integral components of face-to-face communication. They unfold over time, often following predictable movement phases of preparation, stroke, and re-
traction. Yet, the prevalent approach to automatic gesture detection treats the problem as binary classification, classifying a segment as either containing a gesture or not, thus failing to capture its inherently sequential and contextual nature. To address this, we introduce a novel framework that reframes the task as a multi-phase sequence labeling problem rather than binary classification. Our model processes sequences of skeletal movements over time windows, uses Transformer encoders to learn contextual embeddings, and leverages Conditional Random Fields to perform sequence labeling. We evaluate our proposal on a large dataset of diverse co-speech gestures in task-oriented face-to-face dialogues. The results consistently demonstrate that our method significantly outperforms strong baseline models in detecting gesture strokes. Furthermore, applying Transformer encoders to learn contextual embeddings from movement sequences substantially improves gesture unit detection. These results highlight our framework’s capacity to capture the fine-grained dynamics of co-speech gesture phases, paving the way for more nuanced and accurate gesture detection and analysis. -
Hagoort, P., & Özyürek, A. (2024). Extending the architecture of language from a multimodal perspective. Topics in Cognitive Science. Advance online publication. doi:10.1111/tops.12728.
Abstract
Language is inherently multimodal. In spoken languages, combined spoken and visual signals (e.g., co-speech gestures) are an integral part of linguistic structure and language representation. This requires an extension of the parallel architecture, which needs to include the visual signals concomitant to speech. We present the evidence for the multimodality of language. In addition, we propose that distributional semantics might provide a format for integrating speech and co-speech gestures in a common semantic representation. -
Karadöller, D. Z., Sümer, B., Ünal, E., & Özyürek, A. (2024). Sign advantage: Both children and adults’ spatial expressions in sign are more informative than those in speech and gestures combined. Journal of Child Language, 51(4), 876-902. doi:10.1017/S0305000922000642.
Abstract
Expressing Left-Right relations is challenging for speaking-children. Yet, this challenge was absent for signing-children, possibly due to iconicity in the visual-spatial modality of expression. We investigate whether there is also a modality advantage when speaking-children’s co-speech gestures are considered. Eight-year-old child and adult hearing monolingual Turkish speakers and deaf signers of Turkish-Sign-Language described pictures of objects in various spatial relations. Descriptions were coded for informativeness in speech, sign, and speech-gesture combinations for encoding Left-Right relations. The use of co-speech gestures increased the informativeness of speakers’ spatial expressions compared to speech-only. This pattern was more prominent for children than adults. However, signing-adults and children were more informative than child and adult speakers even when co-speech gestures were considered. Thus, both speaking- and signing-children benefit from iconic expressions in visual modality. Finally, in each modality, children were less informative than adults, pointing to the challenge of this spatial domain in development. -
Karadöller, D. Z., Peeters, D., Manhardt, F., Özyürek, A., & Ortega, G. (2024). Iconicity and gesture jointly facilitate learning of second language signs at first exposure in hearing non-signers. Language Learning, 74(4), 781-813. doi:10.1111/lang.12636.
Abstract
When learning a spoken second language (L2), words overlapping in form and meaning with one’s native language (L1) help break into the new language. When non-signing speakers learn a sign language as L2, such forms are absent because of the modality differences (L1:speech, L2:sign). In such cases, non-signing speakers might use iconic form-meaning mappings in signs or their own gestural experience as gateways into the to-be-acquired sign language. Here, we investigated how both these factors may contribute jointly to the acquisition of sign language vocabulary by hearing non-signers. Participants were presented with three types of sign in NGT (Sign Language of the Netherlands): arbitrary signs, iconic signs with high or low gesture overlap. Signs that were both iconic and highly overlapping with gestures boosted learning most at first exposure, and this effect remained the day after. Findings highlight the influence of modality-specific factors supporting the acquisition of a signed lexicon. -
Karadöller*, D. Z., Sümer*, B., & Özyürek, A. (2024). First-language acquisition in a multimodal language framework: Insights from speech, gesture, and sign. First Language. Advance online publication. doi:10.1177/01427237241290678.
Abstract
*=shared first authorship
Children across the world acquire their first language(s) naturally, regardless of typology or modality (e.g. sign or spoken). Various attempts have been made to explain the puzzle of language acquisition using several approaches, trying to understand to what extent it can be explained by what children bring to language-learning situations as well as what they learn from the input and the interactive context. However, most of these approaches consider only speech development, thus ignoring the inherently multimodal nature of human language. As a multimodal view of language is becoming more widely adopted for the study of adult language, a multimodal approach to language acquisition is inevitable. Not only do children have the capacity to learn spoken and sign language equally easily, but spoken language acquisition consists of learning to coordinate linguistic expressions in both modalities, that is, in both speech and gesture. To provide a step forward in this direction, this article aims to synthesize findings from research studies that take a multimodal perspective on language acquisition in different sign and spoken languages, including the development of speech and accompanying gestures. Our review shows that while some aspects of language acquisition seem to be modality-independent, others might differ according to the affordances of each modality when used separately as well as together (either in sign, speech, and/or gesture). We argue that these findings need to be integrated into our understanding of language acquisition. We also identify which areas need future research for both spoken and sign language acquisition, taking into account not only multimodal but also cross-linguistic variation. -
Sekine, K., & Özyürek, A. (2024). Children benefit from gestures to understand degraded speech but to a lesser extent than adults. Frontiers in Psychology, 14: 1305562. doi:10.3389/fpsyg.2023.1305562.
Abstract
The present study investigated to what extent children, compared to adults, benefit from gestures to disambiguate degraded speech by manipulating speech signals and manual modality. Dutch-speaking adults (N = 20) and 6- and 7-year-old children (N = 15) were presented with a series of video clips in which an actor produced a Dutch action verb with or without an accompanying iconic gesture. Participants were then asked to repeat what they had heard. The speech signal was either clear or altered into 4- or 8-band noise-vocoded speech. Children had more difficulty than adults in disambiguating degraded speech in the speech-only condition. However, when presented with both speech and gestures, children reached a comparable level of accuracy to that of adults in the degraded-speech-only condition. Furthermore, for adults, the enhancement of gestures was greater in the 4-band condition than in the 8-band condition, whereas children showed the opposite pattern. Gestures help children to disambiguate degraded speech, but children need more phonological information than adults to benefit from use of gestures. Children’s multimodal language integration needs to further develop to adapt flexibly to challenging situations such as degraded speech, as tested in our study, or instances where speech is heard with environmental noise or through a face mask.Additional information
supplemental material -
Ünal, E., Mamus, E., & Özyürek, A. (2024). Multimodal encoding of motion events in speech, gesture, and cognition. Language and Cognition, 16(4), 785-804. doi:10.1017/langcog.2023.61.
Abstract
How people communicate about motion events and how this is shaped by language typology are mostly studied with a focus on linguistic encoding in speech. Yet, human communication typically involves an interactional exchange of multimodal signals, such as hand gestures that have different affordances for representing event components. Here, we review recent empirical evidence on multimodal encoding of motion in speech and gesture to gain a deeper understanding of whether and how language typology shapes linguistic expressions in different modalities, and how this changes across different sensory modalities of input and interacts with other aspects of cognition. Empirical evidence strongly suggests that Talmy’s typology of event integration predicts multimodal event descriptions in speech and gesture and visual attention to event components prior to producing these descriptions. Furthermore, variability within the event itself, such as type and modality of stimuli, may override the influence of language typology, especially for expression of manner. -
Goldin-Meadow, S., Chee So, W., Ozyurek, A., & Mylander, C. (2008). The natural order of events: how speakers of different languages represent events nonverbally. Proceedings of the National Academy of Sciences of the USA, 105(27), 9163-9168. doi:10.1073/pnas.0710060105.
Abstract
To test whether the language we speak influences our behavior even when we are not speaking, we asked speakers of four languages differing in their predominant word orders (English, Turkish, Spanish, and Chinese) to perform two nonverbal tasks: a communicative task (describing an event by using gesture without speech) and a noncommunicative task (reconstructing an event with pictures). We found that the word orders speakers used in their everyday speech did not influence their nonverbal behavior. Surprisingly, speakers of all four languages used the same order and on both nonverbal tasks. This order, actor–patient–act, is analogous to the subject–object–verb pattern found in many languages of the world and, importantly, in newly developing gestural languages. The findings provide evidence for a natural order that we impose on events when describing and reconstructing them nonverbally and exploit when constructing language anew.Additional information
GoldinMeadow_2008_naturalSuppl.pdf -
Ozyurek, A., Kita, S., Allen, S., Brown, A., Furman, R., & Ishizuka, T. (2008). Development of cross-linguistic variation in speech and gesture: motion events in English and Turkish. Developmental Psychology, 44(4), 1040-1054. doi:10.1037/0012-1649.44.4.1040.
Abstract
The way adults express manner and path components of a motion event varies across typologically different languages both in speech and cospeech gestures, showing that language specificity in event encoding influences gesture. The authors tracked when and how this multimodal cross-linguistic variation develops in children learning Turkish and English, 2 typologically distinct languages. They found that children learn to speak in language-specific ways from age 3 onward (i.e., English speakers used 1 clause and Turkish speakers used 2 clauses to express manner and path). In contrast, English- and Turkish-speaking children’s gestures looked similar at ages 3 and 5 (i.e., separate gestures for manner and path), differing from each other only at age 9 and in adulthood (i.e., English speakers used 1 gesture, but Turkish speakers used separate gestures for manner and path). The authors argue that this pattern of the development of cospeech gestures reflects a gradual shift to language-specific representations during speaking and shows that looking at speech alone may not be sufficient to understand the full process of language acquisition. -
Perniss, P. M., & Ozyurek, A. (2008). Representations of action, motion and location in sign space: A comparison of German (DGS) and Turkish (TID) sign language narratives. In J. Quer (
Ed. ), Signs of the time: Selected papers from TISLR 8 (pp. 353-376). Seedorf: Signum Press. -
Senghas, A., Kita, S., & Ozyurek, A. (2008). Children creating core properties of language: Evidence from an emerging sign language in Nicaragua. In K. A. Lindgren, D. DeLuca, & D. J. Napoli (
Eds. ), Signs and Voices: Deaf Culture, Identity, Language, and Arts. Washington, DC: Gallaudet University Press. -
Willems, R. M., Ozyurek, A., & Hagoort, P. (2008). Seeing and hearing meaning: ERP and fMRI evidence of word versus picture integration into a sentence context. Journal of Cognitive Neuroscience, 20, 1235-1249. doi:10.1162/jocn.2008.20085.
Abstract
Understanding language always occurs within a situational context and, therefore, often implies combining streams of information from different domains and modalities. One such combination is that of spoken language and visual information, which are perceived together in a variety of ways during everyday communication. Here we investigate whether and how words and pictures differ in terms of their neural correlates when they are integrated into a previously built-up sentence context. This is assessed in two experiments looking at the time course (measuring event-related potentials, ERPs) and the locus (using functional magnetic resonance imaging, fMRI) of this integration process. We manipulated the ease of semantic integration of word and/or picture to a previous sentence context to increase the semantic load of processing. In the ERP study, an increased semantic load led to an N400 effect which was similar for pictures and words in terms of latency and amplitude. In the fMRI study, we found overlapping activations to both picture and word integration in the left inferior frontal cortex. Specific activations for the integration of a word were observed in the left superior temporal cortex. We conclude that despite obvious differences in representational format, semantic information coming from pictures and words is integrated into a sentence context in similar ways in the brain. This study adds to the growing insight that the language system incorporates (semantic) information coming from linguistic and extralinguistic domains with the same neural time course and by recruitment of overlapping brain areas. -
Zwitserlood, I., Ozyurek, A., & Perniss, P. M. (2008). Annotation of sign and gesture cross-linguistically. In O. Crasborn, E. Efthimiou, T. Hanke, E. D. Thoutenhoofd, & I. Zwitserlood (
Eds. ), Construction and Exploitation of Sign Language Corpora. 3rd Workshop on the Representation and Processing of Sign Languages (pp. 185-190). Paris: ELDA.Abstract
This paper discusses the construction of a cross-linguistic, bimodal corpus containing three modes of expression: expressions from two sign languages, speech and gestural expressions in two spoken languages and pantomimic expressions by users of two spoken languages who are requested to convey information without speaking. We discuss some problems and tentative solutions for the annotation of utterances expressing spatial information about referents in these three modes, suggesting a set of comparable codes for the description of both sign and gesture. Furthermore, we discuss the processing of entered annotations in ELAN, e.g. relating descriptive annotations to analytic annotations in all three modes and performing relational searches across annotations on different tiers. -
Kuntay, A., & Ozyurek, A. (2002). Joint attention and the development of the use of demonstrative pronouns in Turkish. In B. Skarabela, S. Fish, & A. H. Do (
Eds. ), Proceedings of the 26th annual Boston University Conference on Language Development (pp. 336-347). Somerville, MA: Cascadilla Press. -
Ozyurek, A. (2002). Do speakers design their co-speech gestures for their addresees? The effects of addressee location on representational gestures. Journal of Memory and Language, 46(4), 688-704. doi:10.1006/jmla.2001.2826.
Abstract
Do speakers use spontaneous gestures accompanying their speech for themselves or to communicate their message to their addressees? Two experiments show that speakers change the orientation of their gestures depending on the location of shared space, that is, the intersection of the gesture spaces of the speakers and addressees. Gesture orientations change more frequently when they accompany spatial prepositions such as into and out, which describe motion that has a beginning and end point, rather than across, which depicts an unbounded path across space. Speakers change their gestures so that they represent the beginning and end point of motion INTO or OUT by moving into or out of the shared space. Thus, speakers design their gestures for their addressees and therefore use them to communicate. This has implications for the view that gestures are a part of language use as well as for the role of gestures in speech production. -
Ozyurek, A. (2002). Speech-gesture relationship across languages and in second language learners: Implications for spatial thinking and speaking. In B. Skarabela, S. Fish, & A. H. Do (
Eds. ), Proceedings of the 26th annual Boston University Conference on Language Development (pp. 500-509). Somerville, MA: Cascadilla Press.
Share this page