Displaying 1 - 11 of 11
-
Harbusch, K., & Kempen, G. (2007). Clausal coordinate ellipsis in German: The TIGER treebank as a source of evidence. In J. Nivre, H. J. Kaalep, M. Kadri, & M. Koit (
Eds. ), Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007) (pp. 81-88). Tartu: University of Tartu.Abstract
Syntactic parsers and generators need highquality grammars of coordination and coordinate ellipsis—structures that occur very frequently but are much less well understood theoretically than many other domains of grammar. Modern grammars of coordinate ellipsis are based nearly exclusively on linguistic judgments (intuitions). The extent to which grammar rules based on this type of empirical evidence generate all and only the structures in text corpora, is unknown. As part of a project on the development of a grammar and a generator for coordinate ellipsis in German, we undertook an extensive exploration of the TIGER treebank—a syntactically annotated corpus of about 50,000 newspaper sentences. We report (1) frequency data for the various patterns of coordinate ellipsis, and (2) several rarely (but regularly) occurring ‘fringe deviations’ from the intuition-based rules for several ellipsis types. This information can help improve parser and generator performance. -
Harbusch, K., Breugel, C., Koch, U., & Kempen, G. (2007). Interactive sentence combining and paraphrasing in support of integrated writing and grammar instruction: A new application area for natural language sentence generators. In S. Busemann (
Ed. ), Proceedings of the 11th Euopean Workshop in Natural Language Generation (ENLG07) (pp. 65-68). ACL Anthology.Abstract
The potential of sentence generators as engines in Intelligent Computer-Assisted Language Learning and teaching (ICALL) software has hardly been explored. We sketch the prototype of COMPASS, a system that supports integrated writing and grammar curricula for 10 to 14 year old elementary or secondary schoolers. The system enables first- or second-language teachers to design controlled writing exercises, in particular of the “sentence combining” variety. The system includes facilities for error diagnosis and on-line feedback. Syntactic structures built by students or system can be displayed as easily understood phrase-structure or dependency trees, adapted to the student’s level of grammatical knowledge. The heart of the system is a specially designed generator capable of lexically guided sentence generation, of generating syntactic paraphrases, and displaying syntactic structures visually. -
Kempen, G. (2007). De kunst van het weglaten: Elliptische nevenschikking in een model van de spreker. In F. Moerdijk, A. van Santen, & R. Tempelaars (
Eds. ), Leven met woorden: Afscheidsbundel voor Piet van Sterkenburg (pp. 397-407). Leiden: Brill.Abstract
This paper is an abridged version (in Dutch) of an in-press article by the same author (Kempen, G. (2008/9). Clausal coordination and coordinate ellipsis in a model of the speaker. To be published in: Linguistics). The two papers present a psycholinguistically inspired approach to the syntax of clause-level coordination and coordinate ellipsis. It departs from the assumption that coordinations are structurally similar to so-called appropriateness repairs Ñ an important type of self-repairs in spontaneous speech. Coordinate structures and appropriateness repairs can both be viewed as ÒupdateÓ con-structions. Updating is defined as a special sentence production mode that efficiently revises or augments existing sentential structure in response to modifications in the speakerÕs communicative intention. This perspective is shown to offer an empirically satisfactory and theoretically parsimonious account of two prominent types of coordinate ellipsis, in particular Forward Conjunction Reduction (FCR) and Gapping (including Long-Distance Gapping and Subgapping). They are analyzed as different manifestations of Òincremental updatingÓ Ñ efficient updating of only part of the existing sentential structure. Based on empirical data from Dutch and German, novel treatments are proposed for both types of clausal coordinate ellipsis. Two other forms of coordinate ellipsis Ñ SGF (ÒSubject Gap in Finite clauses with fronted verbÓ), and Backward Conjunction Reduction (BCR; also known as Right Node Raising or RNR) Ñ are shown to be incompatible with the notion of incremental updating. Alternative theoretical interpretations of these phenomena are proposed. The four types of clausal coordinate ellipsis Ñ SGF, Gapping, FCR and BCR Ñ are argued to originate in four different stages of sentence production: Intending (i.e. preparing the communicative intention), Conceptualization, Grammatical Encoding, and Phonological Encoding, respectively. -
Kuiper, K., Van Egmond, M.-E., Kempen, G., & Sprenger, S. A. (2007). Slipping on superlemmas: Multiword lexical items in speech production. The Mental Lexicon, 2(3), 313-357.
Abstract
Only relatively recently have theories of speech production concerned themselves with the part idioms and other multi-word lexical items (MLIs) play in the processes of speech production. Two theories of speech production which attempt to account for the accessing of idioms in speech production are those of Cutting and Bock (1997) and superlemma theory (Sprenger, 2003; Sprenger, Levelt, & Kempen, 2006). Much of the data supporting theories of speech production comes either from time course experiments or from slips of the tongue (Bock & Levelt, 1994). The latter are of two kinds: experimentally induced (Baars, 1992) or naturally observed (Fromkin, 1980). Cutting and Bock use experimentally induced speech errors while Sprenger et al. use time course experiments. The missing data type that has a bearing on speech production involving MLIs is that of naturally occurring slips. In this study the impact of data taken from naturally observed slips involving English and Dutch MLIs are brought to bear on these theories. The data are taken initially from a corpus of just over 1000 naturally observed English slips involving MLIs (the Tuggy corpus). Our argument proceeds as follows. First we show that slips occur independent of whether or not there are MLIs involved. In other words, speech production proceeds in certain of its aspects as though there were no MLI present. We illustrate these slips from the Tuggy data. Second we investigate the predictions of superlemma theory. Superlemma theory (Sprenger et al., 2006) accounts for the selection of MLIs and how their properties enter processes of speech production. It predicts certain activation patterns dependent on a MLI being selected. Each such pattern might give rise to slips of the tongue. This set of predictions is tested against the Tuggy data. Each of the predicted activation patterns yields a significant number of slips. These findings are therefore compatible with a view of MLIs as single units in so far as their activation by lexical concepts goes. However, the theory also predicts that some slips are likely not to occur. We confirm that such slips are not present in the data. These findings are further corroborated by reference a second smaller dataset of slips involving Dutch MLIs (the Kempen corpus). We then use slips involving irreversible binomials to distinguish between the predictions of superlemma theory which are supported by slips involving irreversible binomials and the Cutting and Bock model's predictions for slips involving these MLIs which are not -
Kempen, G. (1977). Building a psychologically plausible sentence generator. In P. A. M. Seuren (
Ed. ), Symposium on semantic theory: held at Nijmegen, March 14-18, 1977 / Volume 9 (pp. 107-117 ). Nijmegen: Katholieke Universiteit Nijmegen.Abstract
The psychological process of translating semantic into syntactic structures has dynamic properties such as the following. (1) The speaker is able to start pronouncing an utterance before having worked out the semantic content he wishes to express. Selection of semantic content and construction of syntactic form proceed partially in parallel. (2) The human sentence generator takes as input not only a specification of semantic content but also some indication of desired syntactic shape. Such indications, if present, do not complicate the generation process but make it easier. (3) Certain regularities of speech errors suggest a two-stage generation process. Stage I constructs the “syntactic skeleton” of an utterance; stage II provides the skeleton with morpho- honological information. An outline is given of the type of grammar which is used by a sentence generation system embodying these characteristics. The system is being implemented on a computer. -
Kempen, G. (1977). Conceptualizing and formulating in sentence production. In S. Rosenberg (
Ed. ), Sentence production: Developments in research and theory (pp. 259-274). Hillsdale, NJ: Erlbaum. -
Kempen, G. (1977). [Review of the book Explorations in cognition by D. Norman, D. Rumelhart and the LNR Research Group]. Journal of Psycholinguistic Research, 6(2), 184-186. doi:10.1007/BF01074377.
-
Kempen, G. (1977). Man's sentence generator: Aspects of its control structure. In M. De Mey, R. Pinxten, M. Poriau, & E. Vandamme (
Eds. ), International workshop on the cognitive viewpoint. Ghent: University of Ghent, Communication & Cognition. -
Kempen, G. (1977). Onder woorden brengen: Psychologische aspecten van expressief taalgebruik [Inaugural lecture]. Groningen: Wolters-Noordhoff.
Abstract
Rede, uitgesproken bij de aanvaarding van het ambt van lector in de taalpsychologie aan de Katholieke Universiteit te Nijmegen op Vrijdag 10 juni 1977 -
Kempen, G. (1977). Wat is psycholinguistiek? In B. T. M. Tervoort (
Ed. ), Wetenschap en taal: Het verschijnsel taal van verschillende zijden benaderd (pp. 86-99 ). Muiderberg: Coutinho. -
Kempen, G., & Maassen, B. (1977). The time course of conceptualizing and formulating processes during the production of simple sentences. In Proceedings of The Third Prague Conference on the Psychology of Human Learning and Development. Prague: Institute of Psychology.
Abstract
The psychological process of producing sentences includes conceptualization (selecting to-beexpressed conceptual content) and formulation (translating conceptual content into syntactic structures of a language). There is ample evidence, both intuitive and experimental, that the conceptualizing and formulating processes often proceed concurrently, not strictly serially. James Lindsley (Cognitive Psych.,1975, 7, 1-19; J.Psycholinguistic Res., 1976, 5, 331-354) has developed a concurrent model which proved succesful in an experimental situation where simple English Subject-Verb (SV) sentences such as “The boy is greeting”,”The girl is kicking” were produced as descriptions of pictures which showed actor and action. The measurements were reaction times defined as the interval between the moment a picture appeared on a screen and the onset of the vocal utterance by the speaker. Lindsley could show, among other things, that the formulation process for an SV sentence doesn’t start immediately after the actor of a picture (that is, the conceptual content underlying the surface Subject phrase) has been identified, but is somewhat delayed. The delay was needed, according to Lindsley, in order to prevent dysfluencies (hesitations) between surface Subject and verb. We replicated Lindsley’s data for Dutch. However, his model proved inadequate when we added Dutch Verb-Subject (VS) constructions which are obligatory in certain syntactic contexts but synonymous with their SV counterparts. A sentence production theory which is being developed by the first author is able to provide an accurate account of the data. The abovementioned delay is attributed to certain precautions the sentence generator has to take in case of SV but not of VS sentences. These precautions are related to the goal of attaining syntactic coherence of the utterance as a whole, not to the prevention of dysfluencies.
Share this page