Displaying 1 - 8 of 8
-
Tsutsui, S., Wang, X., Weng, G., Zhang, Y., Crandall, D., & Yu, C. (2022). Action recognition based on cross-situational action-object statistics. In Proceedings of the 2022 IEEE International Conference on Development and Learning (ICDL 2022).
Abstract
Machine learning models of visual action recognition are typically trained and tested on data from specific situations where actions are associated with certain objects. It is an open question how action-object associations in the training set influence a model's ability to generalize beyond trained situations. We set out to identify properties of training data that lead to action recognition models with greater generalization ability. To do this, we take inspiration from a cognitive mechanism called cross-situational learning, which states that human learners extract the meaning of concepts by observing instances of the same concept across different situations. We perform controlled experiments with various types of action-object associations, and identify key properties of action-object co-occurrence in training data that lead to better classifiers. Given that these properties are missing in the datasets that are typically used to train action classifiers in the computer vision literature, our work provides useful insights on how we should best construct datasets for efficiently training for better generalization. -
Zhang, Y., & Yu, C. (2022). Examining real-time attention dynamics in parent-infant picture book reading. In J. Culbertson, A. Perfors, H. Rabagliati, & V. Ramenzoni (
Eds. ), Proceedings of the 44th Annual Conference of the Cognitive Science Society (CogSci 2022) (pp. 1367-1374). Toronto, Canada: Cognitive Science Society.Abstract
Picture book reading is a common word-learning context from which parents repeatedly name objects to their child and it has been found to facilitate early word learning. To learn the correct word-object mappings in a book-reading context, infants need to be able to link what they see with what they hear. However, given multiple objects on every book page, it is not clear how infants direct their attention to objects named by parents. The aim of the current study is to examine how infants mechanistically discover the correct word-object mappings during book reading in real time. We used head-mounted eye-tracking during parent-infant picture book reading and measured the infant's moment-by-moment visual attention to the named referent. We also examined how gesture cues provided by both the child and the parent may influence infants' attention to the named target. We found that although parents provided many object labels during book reading, infants were not able to attend to the named objects easily. However, their abilities to follow and use gestures to direct the other social partner’s attention increase the chance of looking at the named target during parent naming. -
Amatuni, A., Schroer, S. E., Zhang, Y., Peters, R. E., Reza, M. A., Crandall, D., & Yu, C. (2021). In-the-moment visual information from the infant's egocentric view determines the success of infant word learning: A computational study. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (
Eds. ), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 265-271). Vienna: Cognitive Science Society.Abstract
Infants learn the meaning of words from accumulated experiences of real-time interactions with their caregivers. To study the effects of visual sensory input on word learning, we recorded infant's view of the world using head-mounted eye trackers during free-flowing play with a caregiver. While playing, infants were exposed to novel label-object mappings and later learning outcomes for these items were tested after the play session. In this study we use a classification based approach to link properties of infants' visual scenes during naturalistic labeling moments to their word learning outcomes. We find that a model which integrates both highly informative and ambiguous sensory evidence is a better fit to infants' individual learning outcomes than models where either type of evidence is taken alone, and that raw labeling frequency is unable to account for the word learning differences we observe. Here we demonstrate how a computational model, using only raw pixels taken from the egocentric scene image, can derive insights on human language learning. -
Falk, J. J., Zhang, Y., Scheutz, M., & Yu, C. (2021). Parents adaptively use anaphora during parent-child social interaction. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (
Eds. ), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 1472-1478). Vienna: Cognitive Science Society.Abstract
Anaphora, a ubiquitous feature of natural language, poses a particular challenge to young children as they first learn language due to its referential ambiguity. In spite of this, parents and caregivers use anaphora frequently in child-directed speech, potentially presenting a risk to effective communication if children do not yet have the linguistic capabilities of resolving anaphora successfully. Through an eye-tracking study in a naturalistic free-play context, we examine the strategies that parents employ to calibrate their use of anaphora to their child's linguistic development level. We show that, in this way, parents are able to intuitively scaffold the complexity of their speech such that greater referential ambiguity does not hurt overall communication success. -
Yu, C., Zhang, Y., Slone, L. K., & Smith, L. B. (2021). The infant’s view redefines the problem of referential uncertainty in early word learning. Proceedings of the National Academy of Sciences of the United States of America, 118(52): e2107019118. doi:10.1073/pnas.2107019118.
Abstract
The learning of first object names is deemed a hard problem due to the uncertainty inherent in mapping a heard name to the intended referent in a cluttered and variable world. However, human infants readily solve this problem. Despite considerable theoretical discussion, relatively little is known about the uncertainty infants face in the real world. We used head-mounted eye tracking during parent–infant toy play and quantified the uncertainty by measuring the distribution of infant attention to the potential referents when a parent named both familiar and unfamiliar toy objects. The results show that infant gaze upon hearing an object name is often directed to a single referent which is equally likely to be a wrong competitor or the intended target. This bimodal gaze distribution clarifies and redefines the uncertainty problem and constrains possible solutions. -
Zhang, Y., Yurovsky, D., & Yu, C. (2021). Cross-situational learning from ambiguous egocentric input is a continuous process: Evidence using the human simulation paradigm. Cognitive Science, 45(7): e13010. doi:10.1111/cogs.13010.
Abstract
Recent laboratory experiments have shown that both infant and adult learners can acquire word-referent mappings using cross-situational statistics. The vast majority of the work on this topic has used unfamiliar objects presented on neutral backgrounds as the visual contexts for word learning. However, these laboratory contexts are much different than the real-world contexts in which learning occurs. Thus, the feasibility of generalizing cross-situational learning beyond the laboratory is in question. Adapting the Human Simulation Paradigm, we conducted a series of experiments examining cross-situational learning from children's egocentric videos captured during naturalistic play. Focusing on individually ambiguous naming moments that naturally occur during toy play, we asked how statistical learning unfolds in real time through accumulating cross-situational statistics in naturalistic contexts. We found that even when learning situations were individually ambiguous, learners' performance gradually improved over time. This improvement was driven in part by learners' use of partial knowledge acquired from previous learning situations, even when they had not yet discovered correct word-object mappings. These results suggest that word learning is a continuous process by means of real-time information integration. -
Zhang, Y., Amatuni, A., Cain, E., Wang, X., Crandall, D., & Yu, C. (2021). Human learners integrate visual and linguistic information cross-situational verb learning. In T. Fitch, C. Lamm, H. Leder, & K. Teßmar-Raible (
Eds. ), Proceedings of the 43rd Annual Conference of the Cognitive Science Society (CogSci 2021) (pp. 2267-2273). Vienna: Cognitive Science Society.Abstract
Learning verbs is challenging because it is difficult to infer the precise meaning of a verb when there are a multitude of relations that one can derive from a single event. To study this verb learning challenge, we used children's egocentric view collected from naturalistic toy-play interaction as learning materials and investigated how visual and linguistic information provided in individual naming moments as well as cross-situational information provided from multiple learning moments can help learners resolve this mapping problem using the Human Simulation Paradigm. Our results show that learners benefit from seeing children's egocentric views compared to third-person observations. In addition, linguistic information can help learners identify the correct verb meaning by eliminating possible meanings that do not belong to the linguistic category. Learners are also able to integrate visual and linguistic information both within and across learning situations to reduce the ambiguity in the space of possible verb meanings. -
Zhang, Y., Amatuni, A., Crain, E., & Yu, C. (2020). Seeking meaning: Examining a cross-situational solution to learn action verbs using human simulation paradigm. In S. Denison, M. Mack, Y. Xu, & B. C. Armstrong (
Eds. ), Proceedings of the 42nd Annual Meeting of the Cognitive Science Society (CogSci 2020) (pp. 2854-2860). Montreal, QB: Cognitive Science Society.Abstract
To acquire the meaning of a verb, language learners not only need to find the correct mapping between a specific verb and an action or event in the world, but also infer the underlying relational meaning that the verb encodes. Most verb naming instances in naturalistic contexts are highly ambiguous as many possible actions can be embedded in the same scenario and many possible verbs can be used to describe those actions. To understand whether learners can find the correct verb meaning from referentially ambiguous learning situations, we conducted three experiments using the Human Simulation Paradigm with adult learners. Our results suggest that although finding the right verb meaning from one learning instance is hard, there is a statistical solution to this problem. When provided with multiple verb learning instances all referring to the same verb, learners are able to aggregate information across situations and gradually converge to the correct semantic space. Even in cases where they may not guess the exact target verb, they can still discover the right meaning by guessing a similar verb that is semantically close to the ground truth.
Share this page