Displaying 1 - 19 of 19
-
Akamine, S., Ghaleb, E., Rasenberg, M., Fernandez, R., Meyer, A. S., & Özyürek, A. (2024). Speakers align both their gestures and words not only to establish but also to maintain reference to create shared labels for novel objects in interaction. In L. K. Samuelson, S. L. Frank, A. Mackey, & E. Hazeltine (
Eds. ), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 2435-2442).Abstract
When we communicate with others, we often repeat aspects of each other's communicative behavior such as sentence structures and words. Such behavioral alignment has been mostly studied for speech or text. Yet, language use is mostly multimodal, flexibly using speech and gestures to convey messages. Here, we explore the use of alignment in speech (words) and co-speech gestures (iconic gestures) in a referential communication task aimed at finding labels for novel objects in interaction. In particular, we investigate how people flexibly use lexical and gestural alignment to create shared labels for novel objects and whether alignment in speech and gesture are related over time. The present study shows that interlocutors establish shared labels multimodally, and alignment in words and iconic gestures are used throughout the interaction. We also show that the amount of lexical alignment positively associates with the amount of gestural alignment over time, suggesting a close relationship between alignment in the vocal and manual modalities.Additional information
link to eScholarship -
Ghaleb, E., Rasenberg, M., Pouw, W., Toni, I., Holler, J., Özyürek, A., & Fernandez, R. (2024). Analysing cross-speaker convergence through the lens of automatically detected shared linguistic constructions. In L. K. Samuelson, S. L. Frank, A. Mackey, & E. Hazeltine (
Eds. ), Proceedings of the 46th Annual Meeting of the Cognitive Science Society (CogSci 2024) (pp. 1717-1723).Abstract
Conversation requires a substantial amount of coordination between dialogue participants, from managing turn taking to negotiating mutual understanding. Part of this coordination effort surfaces as the reuse of linguistic behaviour across speakers, a process often referred to as alignment. While the presence of linguistic alignment is well documented in the literature, several questions remain open, including the extent to which patterns of reuse across speakers have an impact on the emergence of labelling conventions for novel referents. In this study, we put forward a methodology for automatically detecting shared lemmatised constructions---expressions with a common lexical core used by both speakers within a dialogue---and apply it to a referential communication corpus where participants aim to identify novel objects for which no established labels exist. Our analyses uncover the usage patterns of shared constructions in interaction and reveal that features such as their frequency and the amount of different constructions used for a referent are associated with the degree of object labelling convergence the participants exhibit after social interaction. More generally, the present study shows that automatically detected shared constructions offer a useful level of analysis to investigate the dynamics of reference negotiation in dialogue.Additional information
link to eScholarship -
Ghaleb, E., Burenko, I., Rasenberg, M., Pouw, W., Uhrig, P., Holler, J., Toni, I., Ozyurek, A., & Fernandez, R. (2024). Cospeech gesture detection through multi-phase sequence labeling. In Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024) (pp. 4007-4015).
Abstract
Gestures are integral components of face-to-face communication. They unfold over time, often following predictable movement phases of preparation, stroke, and re-
traction. Yet, the prevalent approach to automatic gesture detection treats the problem as binary classification, classifying a segment as either containing a gesture or not, thus failing to capture its inherently sequential and contextual nature. To address this, we introduce a novel framework that reframes the task as a multi-phase sequence labeling problem rather than binary classification. Our model processes sequences of skeletal movements over time windows, uses Transformer encoders to learn contextual embeddings, and leverages Conditional Random Fields to perform sequence labeling. We evaluate our proposal on a large dataset of diverse co-speech gestures in task-oriented face-to-face dialogues. The results consistently demonstrate that our method significantly outperforms strong baseline models in detecting gesture strokes. Furthermore, applying Transformer encoders to learn contextual embeddings from movement sequences substantially improves gesture unit detection. These results highlight our framework’s capacity to capture the fine-grained dynamics of co-speech gesture phases, paving the way for more nuanced and accurate gesture detection and analysis. -
Hagoort, P., & Özyürek, A. (2024). Extending the architecture of language from a multimodal perspective. Topics in Cognitive Science. Advance online publication. doi:10.1111/tops.12728.
Abstract
Language is inherently multimodal. In spoken languages, combined spoken and visual signals (e.g., co-speech gestures) are an integral part of linguistic structure and language representation. This requires an extension of the parallel architecture, which needs to include the visual signals concomitant to speech. We present the evidence for the multimodality of language. In addition, we propose that distributional semantics might provide a format for integrating speech and co-speech gestures in a common semantic representation. -
Karadöller, D. Z., Sümer, B., Ünal, E., & Özyürek, A. (2024). Sign advantage: Both children and adults’ spatial expressions in sign are more informative than those in speech and gestures combined. Journal of Child Language, 51(4), 876-902. doi:10.1017/S0305000922000642.
Abstract
Expressing Left-Right relations is challenging for speaking-children. Yet, this challenge was absent for signing-children, possibly due to iconicity in the visual-spatial modality of expression. We investigate whether there is also a modality advantage when speaking-children’s co-speech gestures are considered. Eight-year-old child and adult hearing monolingual Turkish speakers and deaf signers of Turkish-Sign-Language described pictures of objects in various spatial relations. Descriptions were coded for informativeness in speech, sign, and speech-gesture combinations for encoding Left-Right relations. The use of co-speech gestures increased the informativeness of speakers’ spatial expressions compared to speech-only. This pattern was more prominent for children than adults. However, signing-adults and children were more informative than child and adult speakers even when co-speech gestures were considered. Thus, both speaking- and signing-children benefit from iconic expressions in visual modality. Finally, in each modality, children were less informative than adults, pointing to the challenge of this spatial domain in development. -
Karadöller, D. Z., Peeters, D., Manhardt, F., Özyürek, A., & Ortega, G. (2024). Iconicity and gesture jointly facilitate learning of second language signs at first exposure in hearing non-signers. Language Learning, 74(4), 781-813. doi:10.1111/lang.12636.
Abstract
When learning a spoken second language (L2), words overlapping in form and meaning with one’s native language (L1) help break into the new language. When non-signing speakers learn a sign language as L2, such forms are absent because of the modality differences (L1:speech, L2:sign). In such cases, non-signing speakers might use iconic form-meaning mappings in signs or their own gestural experience as gateways into the to-be-acquired sign language. Here, we investigated how both these factors may contribute jointly to the acquisition of sign language vocabulary by hearing non-signers. Participants were presented with three types of sign in NGT (Sign Language of the Netherlands): arbitrary signs, iconic signs with high or low gesture overlap. Signs that were both iconic and highly overlapping with gestures boosted learning most at first exposure, and this effect remained the day after. Findings highlight the influence of modality-specific factors supporting the acquisition of a signed lexicon. -
Karadöller*, D. Z., Sümer*, B., & Özyürek, A. (2024). First-language acquisition in a multimodal language framework: Insights from speech, gesture, and sign. First Language. Advance online publication. doi:10.1177/01427237241290678.
Abstract
*=shared first authorship
Children across the world acquire their first language(s) naturally, regardless of typology or modality (e.g. sign or spoken). Various attempts have been made to explain the puzzle of language acquisition using several approaches, trying to understand to what extent it can be explained by what children bring to language-learning situations as well as what they learn from the input and the interactive context. However, most of these approaches consider only speech development, thus ignoring the inherently multimodal nature of human language. As a multimodal view of language is becoming more widely adopted for the study of adult language, a multimodal approach to language acquisition is inevitable. Not only do children have the capacity to learn spoken and sign language equally easily, but spoken language acquisition consists of learning to coordinate linguistic expressions in both modalities, that is, in both speech and gesture. To provide a step forward in this direction, this article aims to synthesize findings from research studies that take a multimodal perspective on language acquisition in different sign and spoken languages, including the development of speech and accompanying gestures. Our review shows that while some aspects of language acquisition seem to be modality-independent, others might differ according to the affordances of each modality when used separately as well as together (either in sign, speech, and/or gesture). We argue that these findings need to be integrated into our understanding of language acquisition. We also identify which areas need future research for both spoken and sign language acquisition, taking into account not only multimodal but also cross-linguistic variation. -
Sekine, K., & Özyürek, A. (2024). Children benefit from gestures to understand degraded speech but to a lesser extent than adults. Frontiers in Psychology, 14: 1305562. doi:10.3389/fpsyg.2023.1305562.
Abstract
The present study investigated to what extent children, compared to adults, benefit from gestures to disambiguate degraded speech by manipulating speech signals and manual modality. Dutch-speaking adults (N = 20) and 6- and 7-year-old children (N = 15) were presented with a series of video clips in which an actor produced a Dutch action verb with or without an accompanying iconic gesture. Participants were then asked to repeat what they had heard. The speech signal was either clear or altered into 4- or 8-band noise-vocoded speech. Children had more difficulty than adults in disambiguating degraded speech in the speech-only condition. However, when presented with both speech and gestures, children reached a comparable level of accuracy to that of adults in the degraded-speech-only condition. Furthermore, for adults, the enhancement of gestures was greater in the 4-band condition than in the 8-band condition, whereas children showed the opposite pattern. Gestures help children to disambiguate degraded speech, but children need more phonological information than adults to benefit from use of gestures. Children’s multimodal language integration needs to further develop to adapt flexibly to challenging situations such as degraded speech, as tested in our study, or instances where speech is heard with environmental noise or through a face mask.Additional information
supplemental material -
Ünal, E., Mamus, E., & Özyürek, A. (2024). Multimodal encoding of motion events in speech, gesture, and cognition. Language and Cognition, 16(4), 785-804. doi:10.1017/langcog.2023.61.
Abstract
How people communicate about motion events and how this is shaped by language typology are mostly studied with a focus on linguistic encoding in speech. Yet, human communication typically involves an interactional exchange of multimodal signals, such as hand gestures that have different affordances for representing event components. Here, we review recent empirical evidence on multimodal encoding of motion in speech and gesture to gain a deeper understanding of whether and how language typology shapes linguistic expressions in different modalities, and how this changes across different sensory modalities of input and interacts with other aspects of cognition. Empirical evidence strongly suggests that Talmy’s typology of event integration predicts multimodal event descriptions in speech and gesture and visual attention to event components prior to producing these descriptions. Furthermore, variability within the event itself, such as type and modality of stimuli, may override the influence of language typology, especially for expression of manner. -
Emmorey, K., & Ozyurek, A. (2014). Language in our hands: Neural underpinnings of sign language and co-speech gesture. In M. S. Gazzaniga, & G. R. Mangun (
Eds. ), The cognitive neurosciences (5th ed., pp. 657-666). Cambridge, Mass: MIT Press. -
Furman, R., Kuntay, A., & Ozyurek, A. (2014). Early language-specificity of children's event encoding in speech and gesture: Evidence from caused motion in Turkish. Language, Cognition and Neuroscience, 29, 620-634. doi:10.1080/01690965.2013.824993.
Abstract
Previous research on language development shows that children are tuned early on to the language-specific semantic and syntactic encoding of events in their native language. Here we ask whether language-specificity is also evident in children's early representations in gesture accompanying speech. In a longitudinal study, we examined the spontaneous speech and cospeech gestures of eight Turkish-speaking children aged one to three and focused on their caused motion event expressions. In Turkish, unlike in English, the main semantic elements of caused motion such as Action and Path can be encoded in the verb (e.g. sok- ‘put in’) and the arguments of a verb can be easily omitted. We found that Turkish-speaking children's speech indeed displayed these language-specific features and focused on verbs to encode caused motion. More interestingly, we found that their early gestures also manifested specificity. Children used iconic cospeech gestures (from 19 months onwards) as often as pointing gestures and represented semantic elements such as Action with Figure and/or Path that reinforced or supplemented speech in language-specific ways until the age of three. In the light of previous reports on the scarcity of iconic gestures in English-speaking children's early productions, we argue that the language children learn shapes gestures and how they get integrated with speech in the first three years of life. -
Holler, J., Schubotz, L., Kelly, S., Hagoort, P., Schuetze, M., & Ozyurek, A. (2014). Social eye gaze modulates processing of speech and co-speech gesture. Cognition, 133, 692-697. doi:10.1016/j.cognition.2014.08.008.
Abstract
In human face-to-face communication, language comprehension is a multi-modal, situated activity. However, little is known about how we combine information from different modalities during comprehension, and how perceived communicative intentions, often signaled through visual signals, influence this process. We explored this question by simulating a multi-party communication context in which a speaker alternated her gaze between two recipients. Participants viewed speech-only or speech + gesture object-related messages when being addressed (direct gaze) or unaddressed (gaze averted to other participant). They were then asked to choose which of two object images matched the speaker’s preceding message. Unaddressed recipients responded significantly more slowly than addressees for speech-only utterances. However, perceiving the same speech accompanied by gestures sped unaddressed recipients up to a level identical to that of addressees. That is, when unaddressed recipients’ speech processing suffers, gestures can enhance the comprehension of a speaker’s message. We discuss our findings with respect to two hypotheses attempting to account for how social eye gaze may modulate multi-modal language comprehension. -
Ortega, G., Sumer, B., & Ozyurek, A. (2014). Type of iconicity matters: Bias for action-based signs in sign language acquisition. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (
Eds. ), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014) (pp. 1114-1119). Austin, Tx: Cognitive Science Society.Abstract
Early studies investigating sign language acquisition claimed
that signs whose structures are motivated by the form of their
referent (iconic) are not favoured in language development.
However, recent work has shown that the first signs in deaf
children’s lexicon are iconic. In this paper we go a step
further and ask whether different types of iconicity modulate
learning sign-referent links. Results from a picture description
task indicate that children and adults used signs with two
possible variants differentially. While children signing to
adults favoured variants that map onto actions associated with
a referent (action signs), adults signing to another adult
produced variants that map onto objects’ perceptual features
(perceptual signs). Parents interacting with children used
more action variants than signers in adult-adult interactions.
These results are in line with claims that language
development is tightly linked to motor experience and that
iconicity can be a communicative strategy in parental input. -
Ozyurek, A. (2014). Hearing and seeing meaning in speech and gesture: Insights from brain and behaviour. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences, 369(1651): 20130296. doi:10.1098/rstb.2013.0296.
Abstract
As we speak, we use not only the arbitrary form–meaning mappings of the speech channel but also motivated form–meaning correspondences, i.e. iconic gestures that accompany speech (e.g. inverted V-shaped hand wiggling across gesture space to demonstrate walking). This article reviews what we know about processing of semantic information from speech and iconic gestures in spoken languages during comprehension of such composite utterances. Several studies have shown that comprehension of iconic gestures involves brain activations known to be involved in semantic processing of speech: i.e. modulation of the electrophysiological recording component N400, which is sensitive to the ease of semantic integration of a word to previous context, and recruitment of the left-lateralized frontal–posterior temporal network (left inferior frontal gyrus (IFG), medial temporal gyrus (MTG) and superior temporal gyrus/sulcus (STG/S)). Furthermore, we integrate the information coming from both channels recruiting brain areas such as left IFG, posterior superior temporal sulcus (STS)/MTG and even motor cortex. Finally, this integration is flexible: the temporal synchrony between the iconic gesture and the speech segment, as well as the perceived communicative intent of the speaker, modulate the integration process. Whether these findings are special to gestures or are shared with actions or other visual accompaniments to speech (e.g. lips) or other visual symbols such as pictures are discussed, as well as the implications for a multimodal view of language. -
Peeters, D., Azar, Z., & Ozyurek, A. (2014). The interplay between joint attention, physical proximity, and pointing gesture in demonstrative choice. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (
Eds. ), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014) (pp. 1144-1149). Austin, Tx: Cognitive Science Society. -
Sumer, B., Perniss, P., Zwitserlood, I., & Ozyurek, A. (2014). Learning to express "left-right" & "front-behind" in a sign versus spoken language. In P. Bello, M. Guarini, M. McShane, & B. Scassellati (
Eds. ), Proceedings of the 36th Annual Meeting of the Cognitive Science Society (CogSci 2014) (pp. 1550-1555). Austin, Tx: Cognitive Science Society.Abstract
Developmental studies show that it takes longer for
children learning spoken languages to acquire viewpointdependent
spatial relations (e.g., left-right, front-behind),
compared to ones that are not viewpoint-dependent (e.g.,
in, on, under). The current study investigates how
children learn to express viewpoint-dependent relations
in a sign language where depicted spatial relations can be
communicated in an analogue manner in the space in
front of the body or by using body-anchored signs (e.g.,
tapping the right and left hand/arm to mean left and
right). Our results indicate that the visual-spatial
modality might have a facilitating effect on learning to
express these spatial relations (especially in encoding of
left-right) in a sign language (i.e., Turkish Sign
Language) compared to a spoken language (i.e.,
Turkish). -
Kuntay, A., & Ozyurek, A. (2002). Joint attention and the development of the use of demonstrative pronouns in Turkish. In B. Skarabela, S. Fish, & A. H. Do (
Eds. ), Proceedings of the 26th annual Boston University Conference on Language Development (pp. 336-347). Somerville, MA: Cascadilla Press. -
Ozyurek, A. (2002). Do speakers design their co-speech gestures for their addresees? The effects of addressee location on representational gestures. Journal of Memory and Language, 46(4), 688-704. doi:10.1006/jmla.2001.2826.
Abstract
Do speakers use spontaneous gestures accompanying their speech for themselves or to communicate their message to their addressees? Two experiments show that speakers change the orientation of their gestures depending on the location of shared space, that is, the intersection of the gesture spaces of the speakers and addressees. Gesture orientations change more frequently when they accompany spatial prepositions such as into and out, which describe motion that has a beginning and end point, rather than across, which depicts an unbounded path across space. Speakers change their gestures so that they represent the beginning and end point of motion INTO or OUT by moving into or out of the shared space. Thus, speakers design their gestures for their addressees and therefore use them to communicate. This has implications for the view that gestures are a part of language use as well as for the role of gestures in speech production. -
Ozyurek, A. (2002). Speech-gesture relationship across languages and in second language learners: Implications for spatial thinking and speaking. In B. Skarabela, S. Fish, & A. H. Do (
Eds. ), Proceedings of the 26th annual Boston University Conference on Language Development (pp. 500-509). Somerville, MA: Cascadilla Press.
Share this page