Yayun Zhang

Publications

Displaying 1 - 5 of 5
  • Amatuni, A., Schroer, S. E., Zhang, Y., Peters, R. E., Reza, M. A., Crandall, D., & Yu, C. (2021). In-the-moment visual information from the infant's egocentric view determines the success of infant word learning: A computational study. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 265-271). Vienna: Cognitive Science Society.

    Abstract

    Infants learn the meaning of words from accumulated experiences of real-time interactions with their caregivers. To study the effects of visual sensory input on word learning, we recorded infant's view of the world using head-mounted eye trackers during free-flowing play with a caregiver. While playing, infants were exposed to novel label-object mappings and later learning outcomes for these items were tested after the play session. In this study we use a classification based approach to link properties of infants' visual scenes during naturalistic labeling moments to their word learning outcomes. We find that a model which integrates both highly informative and ambiguous sensory evidence is a better fit to infants' individual learning outcomes than models where either type of evidence is taken alone, and that raw labeling frequency is unable to account for the word learning differences we observe. Here we demonstrate how a computational model, using only raw pixels taken from the egocentric scene image, can derive insights on human language learning.
  • Falk, J. J., Zhang, Y., Scheutz, M., & Yu, C. (2021). Parents adaptively use anaphora during parent-child social interaction. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 1472-1478). Vienna: Cognitive Science Society.

    Abstract

    Anaphora, a ubiquitous feature of natural language, poses a particular challenge to young children as they first learn language due to its referential ambiguity. In spite of this, parents and caregivers use anaphora frequently in child-directed speech, potentially presenting a risk to effective communication if children do not yet have the linguistic capabilities of resolving anaphora successfully. Through an eye-tracking study in a naturalistic free-play context, we examine the strategies that parents employ to calibrate their use of anaphora to their child's linguistic development level. We show that, in this way, parents are able to intuitively scaffold the complexity of their speech such that greater referential ambiguity does not hurt overall communication success.
  • Yu, C., Zhang, Y., Slone, L. K., & Smith, L. B. (2021). The infant’s view redefines the problem of referential uncertainty in early word learning. Proceedings of the National Academy of Sciences of the United States of America, 118(52): e2107019118. doi:10.1073/pnas.2107019118.

    Abstract

    The learning of first object names is deemed a hard problem due to the uncertainty inherent in mapping a heard name to the intended referent in a cluttered and variable world. However, human infants readily solve this problem. Despite considerable theoretical discussion, relatively little is known about the uncertainty infants face in the real world. We used head-mounted eye tracking during parent–infant toy play and quantified the uncertainty by measuring the distribution of infant attention to the potential referents when a parent named both familiar and unfamiliar toy objects. The results show that infant gaze upon hearing an object name is often directed to a single referent which is equally likely to be a wrong competitor or the intended target. This bimodal gaze distribution clarifies and redefines the uncertainty problem and constrains possible solutions.
  • Zhang, Y., Yurovsky, D., & Yu, C. (2021). Cross-situational learning from ambiguous egocentric input is a continuous process: Evidence using the human simulation paradigm. Cognitive Science, 45(7): e13010. doi:10.1111/cogs.13010.

    Abstract

    Recent laboratory experiments have shown that both infant and adult learners can acquire word-referent mappings using cross-situational statistics. The vast majority of the work on this topic has used unfamiliar objects presented on neutral backgrounds as the visual contexts for word learning. However, these laboratory contexts are much different than the real-world contexts in which learning occurs. Thus, the feasibility of generalizing cross-situational learning beyond the laboratory is in question. Adapting the Human Simulation Paradigm, we conducted a series of experiments examining cross-situational learning from children's egocentric videos captured during naturalistic play. Focusing on individually ambiguous naming moments that naturally occur during toy play, we asked how statistical learning unfolds in real time through accumulating cross-situational statistics in naturalistic contexts. We found that even when learning situations were individually ambiguous, learners' performance gradually improved over time. This improvement was driven in part by learners' use of partial knowledge acquired from previous learning situations, even when they had not yet discovered correct word-object mappings. These results suggest that word learning is a continuous process by means of real-time information integration.
  • Zhang, Y., Amatuni, A., Cain, E., Wang, X., Crandall, D., & Yu, C. (2021). Human learners integrate visual and linguistic information cross-situational verb learning. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (Eds.), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 2267-2273). Vienna: Cognitive Science Society.

    Abstract

    Learning verbs is challenging because it is difficult to infer the precise meaning of a verb when there are a multitude of relations that one can derive from a single event. To study this verb learning challenge, we used children's egocentric view collected from naturalistic toy-play interaction as learning materials and investigated how visual and linguistic information provided in individual naming moments as well as cross-situational information provided from multiple learning moments can help learners resolve this mapping problem using the Human Simulation Paradigm. Our results show that learners benefit from seeing children's egocentric views compared to third-person observations. In addition, linguistic information can help learners identify the correct verb meaning by eliminating possible meanings that do not belong to the linguistic category. Learners are also able to integrate visual and linguistic information both within and across learning situations to reduce the ambiguity in the space of possible verb meanings.

Share this page