Gerard Kempen

Publications

Displaying 1 - 7 of 7
  • Harbusch, K., & Kempen, G. (2007). Clausal coordinate ellipsis in German: The TIGER treebank as a source of evidence. In J. Nivre, H. J. Kaalep, M. Kadri, & M. Koit (Eds.), Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007) (pp. 81-88). Tartu: University of Tartu.

    Abstract

    Syntactic parsers and generators need highquality grammars of coordination and coordinate ellipsis—structures that occur very frequently but are much less well understood theoretically than many other domains of grammar. Modern grammars of coordinate ellipsis are based nearly exclusively on linguistic judgments (intuitions). The extent to which grammar rules based on this type of empirical evidence generate all and only the structures in text corpora, is unknown. As part of a project on the development of a grammar and a generator for coordinate ellipsis in German, we undertook an extensive exploration of the TIGER treebank—a syntactically annotated corpus of about 50,000 newspaper sentences. We report (1) frequency data for the various patterns of coordinate ellipsis, and (2) several rarely (but regularly) occurring ‘fringe deviations’ from the intuition-based rules for several ellipsis types. This information can help improve parser and generator performance.
  • Harbusch, K., Breugel, C., Koch, U., & Kempen, G. (2007). Interactive sentence combining and paraphrasing in support of integrated writing and grammar instruction: A new application area for natural language sentence generators. In S. Busemann (Ed.), Proceedings of the 11th Euopean Workshop in Natural Language Generation (ENLG07) (pp. 65-68). ACL Anthology.

    Abstract

    The potential of sentence generators as engines in Intelligent Computer-Assisted Language Learning and teaching (ICALL) software has hardly been explored. We sketch the prototype of COMPASS, a system that supports integrated writing and grammar curricula for 10 to 14 year old elementary or secondary schoolers. The system enables first- or second-language teachers to design controlled writing exercises, in particular of the “sentence combining” variety. The system includes facilities for error diagnosis and on-line feedback. Syntactic structures built by students or system can be displayed as easily understood phrase-structure or dependency trees, adapted to the student’s level of grammatical knowledge. The heart of the system is a specially designed generator capable of lexically guided sentence generation, of generating syntactic paraphrases, and displaying syntactic structures visually.
  • Kempen, G. (2007). De kunst van het weglaten: Elliptische nevenschikking in een model van de spreker. In F. Moerdijk, A. van Santen, & R. Tempelaars (Eds.), Leven met woorden: Afscheidsbundel voor Piet van Sterkenburg (pp. 397-407). Leiden: Brill.

    Abstract

    This paper is an abridged version (in Dutch) of an in-press article by the same author (Kempen, G. (2008/9). Clausal coordination and coordinate ellipsis in a model of the speaker. To be published in: Linguistics). The two papers present a psycholinguistically inspired approach to the syntax of clause-level coordination and coordinate ellipsis. It departs from the assumption that coordinations are structurally similar to so-called appropriateness repairs Ñ an important type of self-repairs in spontaneous speech. Coordinate structures and appropriateness repairs can both be viewed as ÒupdateÓ con-structions. Updating is defined as a special sentence production mode that efficiently revises or augments existing sentential structure in response to modifications in the speakerÕs communicative intention. This perspective is shown to offer an empirically satisfactory and theoretically parsimonious account of two prominent types of coordinate ellipsis, in particular Forward Conjunction Reduction (FCR) and Gapping (including Long-Distance Gapping and Subgapping). They are analyzed as different manifestations of Òincremental updatingÓ Ñ efficient updating of only part of the existing sentential structure. Based on empirical data from Dutch and German, novel treatments are proposed for both types of clausal coordinate ellipsis. Two other forms of coordinate ellipsis Ñ SGF (ÒSubject Gap in Finite clauses with fronted verbÓ), and Backward Conjunction Reduction (BCR; also known as Right Node Raising or RNR) Ñ are shown to be incompatible with the notion of incremental updating. Alternative theoretical interpretations of these phenomena are proposed. The four types of clausal coordinate ellipsis Ñ SGF, Gapping, FCR and BCR Ñ are argued to originate in four different stages of sentence production: Intending (i.e. preparing the communicative intention), Conceptualization, Grammatical Encoding, and Phonological Encoding, respectively.
  • Kuiper, K., Van Egmond, M.-E., Kempen, G., & Sprenger, S. A. (2007). Slipping on superlemmas: Multiword lexical items in speech production. The Mental Lexicon, 2(3), 313-357.

    Abstract

    Only relatively recently have theories of speech production concerned themselves with the part idioms and other multi-word lexical items (MLIs) play in the processes of speech production. Two theories of speech production which attempt to account for the accessing of idioms in speech production are those of Cutting and Bock (1997) and superlemma theory (Sprenger, 2003; Sprenger, Levelt, & Kempen, 2006). Much of the data supporting theories of speech production comes either from time course experiments or from slips of the tongue (Bock & Levelt, 1994). The latter are of two kinds: experimentally induced (Baars, 1992) or naturally observed (Fromkin, 1980). Cutting and Bock use experimentally induced speech errors while Sprenger et al. use time course experiments. The missing data type that has a bearing on speech production involving MLIs is that of naturally occurring slips. In this study the impact of data taken from naturally observed slips involving English and Dutch MLIs are brought to bear on these theories. The data are taken initially from a corpus of just over 1000 naturally observed English slips involving MLIs (the Tuggy corpus). Our argument proceeds as follows. First we show that slips occur independent of whether or not there are MLIs involved. In other words, speech production proceeds in certain of its aspects as though there were no MLI present. We illustrate these slips from the Tuggy data. Second we investigate the predictions of superlemma theory. Superlemma theory (Sprenger et al., 2006) accounts for the selection of MLIs and how their properties enter processes of speech production. It predicts certain activation patterns dependent on a MLI being selected. Each such pattern might give rise to slips of the tongue. This set of predictions is tested against the Tuggy data. Each of the predicted activation patterns yields a significant number of slips. These findings are therefore compatible with a view of MLIs as single units in so far as their activation by lexical concepts goes. However, the theory also predicts that some slips are likely not to occur. We confirm that such slips are not present in the data. These findings are further corroborated by reference a second smaller dataset of slips involving Dutch MLIs (the Kempen corpus). We then use slips involving irreversible binomials to distinguish between the predictions of superlemma theory which are supported by slips involving irreversible binomials and the Cutting and Bock model's predictions for slips involving these MLIs which are not
  • Kempen, G., & Kolk, H. (1980). Apentaal, een kwestie van intelligentie, niet van taalaanleg. Cahiers Biowetenschappen en Maatschappij, 6, 31-36.
  • Kempen, G., & Van Wijk, C. (1980). Leren formuleren: Hoe uit opstellen een objektieve index voor formuleervaardigheid afgeleid kan worden. De Psycholoog, 15, 609-621.
  • Van Wijk, C., & Kempen, G. (1980). Functiewoorden: Een inventarisatie voor het Nederlands. ITL: Review of Applied Linguistics, 53-68.

Share this page