Jinbiao Yang

Publications

Displaying 1 - 5 of 5
  • Yang, J. (2022). Discovering the units in language cognition: From empirical evidence to a computational model. PhD Thesis, Radboud University Nijmegen, Nijmegen.
  • Yang, J., Van den Bosch, A., & Frank, S. L. (2022). Unsupervised text segmentation predicts eye fixations during reading. Frontiers in Artificial Intelligence, 5: 731615. doi:10.3389/frai.2022.731615.

    Abstract

    Words typically form the basis of psycholinguistic and computational linguistic studies about sentence processing. However, recent evidence shows the basic units during reading, i.e., the items in the mental lexicon, are not always words, but could also be sub-word and supra-word units. To recognize these units, human readers require a cognitive mechanism to learn and detect them. In this paper, we assume eye fixations during reading reveal the locations of the cognitive units, and that the cognitive units are analogous with the text units discovered by unsupervised segmentation models. We predict eye fixations by model-segmented units on both English and Dutch text. The results show the model-segmented units predict eye fixations better than word units. This finding suggests that the predictive performance of model-segmented units indicates their plausibility as cognitive units. The Less-is-Better (LiB) model, which finds the units that minimize both long-term and working memory load, offers advantages both in terms of prediction score and efficiency among alternative models. Our results also suggest that modeling the least-effort principle for the management of long-term and working memory can lead to inferring cognitive units. Overall, the study supports the theory that the mental lexicon stores not only words but also smaller and larger units, suggests that fixation locations during reading depend on these units, and shows that unsupervised segmentation models can discover these units.
  • Teng, X., Ma, M., Yang, J., Blohm, S., Cai, Q., & Tian, X. (2020). Constrained structure of ancient Chinese poetry facilitates speech content grouping. Current Biology, 30, 1299-1305. doi:10.1016/j.cub.2020.01.059.

    Abstract

    Ancient Chinese poetry is constituted by structured language that deviates from ordinary language usage [1, 2]; its poetic genres impose unique combinatory constraints on linguistic elements [3]. How does the constrained poetic structure facilitate speech segmentation when common linguistic [4, 5, 6, 7, 8] and statistical cues [5, 9] are unreliable to listeners in poems? We generated artificial Jueju, which arguably has the most constrained structure in ancient Chinese poetry, and presented each poem twice as an isochronous sequence of syllables to native Mandarin speakers while conducting magnetoencephalography (MEG) recording. We found that listeners deployed their prior knowledge of Jueju to build the line structure and to establish the conceptual flow of Jueju. Unprecedentedly, we found a phase precession phenomenon indicating predictive processes of speech segmentation—the neural phase advanced faster after listeners acquired knowledge of incoming speech. The statistical co-occurrence of monosyllabic words in Jueju negatively correlated with speech segmentation, which provides an alternative perspective on how statistical cues facilitate speech segmentation. Our findings suggest that constrained poetic structures serve as a temporal map for listeners to group speech contents and to predict incoming speech signals. Listeners can parse speech streams by using not only grammatical and statistical cues but also their prior knowledge of the form of language.

    Additional information

    Supplemental Information
  • Yang, J., Van den Bosch, A., & Frank, S. L. (2020). Less is Better: A cognitively inspired unsupervised model for language segmentation. In M. Zock, E. Chersoni, A. Lenci, & E. Santus (Eds.), Proceedings of the Workshop on the Cognitive Aspects of the Lexicon ( 28th International Conference on Computational Linguistics) (pp. 33-45). Stroudsburg: Association for Computational Linguistics.

    Abstract

    Language users process utterances by segmenting them into many cognitive units, which vary in their sizes and linguistic levels. Although we can do such unitization/segmentation easily, its cognitive mechanism is still not clear. This paper proposes an unsupervised model, Less-is-Better (LiB), to simulate the human cognitive process with respect to language unitization/segmentation. LiB follows the principle of least effort and aims to build a lexicon which minimizes the number of unit tokens (alleviating the effort of analysis) and number of unit types (alleviating the effort of storage) at the same time on any given corpus. LiB’s workflow is inspired by empirical cognitive phenomena. The design makes the mechanism of LiB cognitively plausible and the computational requirement light-weight. The lexicon generated by LiB performs the best among different types of lexicons (e.g. ground-truth words) both from an information-theoretical view and a cognitive view, which suggests that the LiB lexicon may be a plausible proxy of the mental lexicon.

    Additional information

    full text via ACL website
  • Yang, J., Cai, Q., & Tian, X. (2020). How do we segment text? Two-stage chunking operation in reading. eNeuro, 7(3): ENEURO.0425-19.2020. doi:10.1523/ENEURO.0425-19.2020.

    Abstract

    Chunking in language comprehension is a process that segments continuous linguistic input into smaller chunks that are in the reader’s mental lexicon. Effective chunking during reading facilitates disambiguation and enhances efficiency for comprehension. However, the chunking mechanisms remain elusive, especially in reading given that information arrives simultaneously yet the written systems may not have explicit cues for labeling boundaries such as Chinese. What are the mechanisms of chunking that mediates the reading of the text that contains hierarchical information? We investigated this question by manipulating the lexical status of the chunks at distinct levels in four-character Chinese strings, including the two-character local chunk and four-character global chunk. Male and female human participants were asked to make lexical decisions on these strings in a behavioral experiment, followed by a passive reading task when their electroencephalography (EEG) was recorded. The behavioral results showed that the lexical decision time of lexicalized two-character local chunks was influenced by the lexical status of the four-character global chunk, but not vice versa, which indicated the processing of global chunks possessed priority over the local chunks. The EEG results revealed that familiar lexical chunks were detected simultaneously at both levels and further processed in a different temporal order – the onset of lexical access for the global chunks was earlier than that of local chunks. These consistent results suggest a two-stage operation for chunking in reading–– the simultaneous detection of familiar lexical chunks at multiple levels around 100 ms followed by recognition of chunks with global precedence.

Share this page