Gerard Kempen

Publications

Displaying 1 - 19 of 19
  • Harbusch, K., & Kempen, G. (2007). Clausal coordinate ellipsis in German: The TIGER treebank as a source of evidence. In J. Nivre, H. J. Kaalep, M. Kadri, & M. Koit (Eds.), Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007) (pp. 81-88). Tartu: University of Tartu.

    Abstract

    Syntactic parsers and generators need highquality grammars of coordination and coordinate ellipsis—structures that occur very frequently but are much less well understood theoretically than many other domains of grammar. Modern grammars of coordinate ellipsis are based nearly exclusively on linguistic judgments (intuitions). The extent to which grammar rules based on this type of empirical evidence generate all and only the structures in text corpora, is unknown. As part of a project on the development of a grammar and a generator for coordinate ellipsis in German, we undertook an extensive exploration of the TIGER treebank—a syntactically annotated corpus of about 50,000 newspaper sentences. We report (1) frequency data for the various patterns of coordinate ellipsis, and (2) several rarely (but regularly) occurring ‘fringe deviations’ from the intuition-based rules for several ellipsis types. This information can help improve parser and generator performance.
  • Harbusch, K., Breugel, C., Koch, U., & Kempen, G. (2007). Interactive sentence combining and paraphrasing in support of integrated writing and grammar instruction: A new application area for natural language sentence generators. In S. Busemann (Ed.), Proceedings of the 11th Euopean Workshop in Natural Language Generation (ENLG07) (pp. 65-68). ACL Anthology.

    Abstract

    The potential of sentence generators as engines in Intelligent Computer-Assisted Language Learning and teaching (ICALL) software has hardly been explored. We sketch the prototype of COMPASS, a system that supports integrated writing and grammar curricula for 10 to 14 year old elementary or secondary schoolers. The system enables first- or second-language teachers to design controlled writing exercises, in particular of the “sentence combining” variety. The system includes facilities for error diagnosis and on-line feedback. Syntactic structures built by students or system can be displayed as easily understood phrase-structure or dependency trees, adapted to the student’s level of grammatical knowledge. The heart of the system is a specially designed generator capable of lexically guided sentence generation, of generating syntactic paraphrases, and displaying syntactic structures visually.
  • Kempen, G. (2007). De kunst van het weglaten: Elliptische nevenschikking in een model van de spreker. In F. Moerdijk, A. van Santen, & R. Tempelaars (Eds.), Leven met woorden: Afscheidsbundel voor Piet van Sterkenburg (pp. 397-407). Leiden: Brill.

    Abstract

    This paper is an abridged version (in Dutch) of an in-press article by the same author (Kempen, G. (2008/9). Clausal coordination and coordinate ellipsis in a model of the speaker. To be published in: Linguistics). The two papers present a psycholinguistically inspired approach to the syntax of clause-level coordination and coordinate ellipsis. It departs from the assumption that coordinations are structurally similar to so-called appropriateness repairs Ñ an important type of self-repairs in spontaneous speech. Coordinate structures and appropriateness repairs can both be viewed as ÒupdateÓ con-structions. Updating is defined as a special sentence production mode that efficiently revises or augments existing sentential structure in response to modifications in the speakerÕs communicative intention. This perspective is shown to offer an empirically satisfactory and theoretically parsimonious account of two prominent types of coordinate ellipsis, in particular Forward Conjunction Reduction (FCR) and Gapping (including Long-Distance Gapping and Subgapping). They are analyzed as different manifestations of Òincremental updatingÓ Ñ efficient updating of only part of the existing sentential structure. Based on empirical data from Dutch and German, novel treatments are proposed for both types of clausal coordinate ellipsis. Two other forms of coordinate ellipsis Ñ SGF (ÒSubject Gap in Finite clauses with fronted verbÓ), and Backward Conjunction Reduction (BCR; also known as Right Node Raising or RNR) Ñ are shown to be incompatible with the notion of incremental updating. Alternative theoretical interpretations of these phenomena are proposed. The four types of clausal coordinate ellipsis Ñ SGF, Gapping, FCR and BCR Ñ are argued to originate in four different stages of sentence production: Intending (i.e. preparing the communicative intention), Conceptualization, Grammatical Encoding, and Phonological Encoding, respectively.
  • Kuiper, K., Van Egmond, M.-E., Kempen, G., & Sprenger, S. A. (2007). Slipping on superlemmas: Multiword lexical items in speech production. The Mental Lexicon, 2(3), 313-357.

    Abstract

    Only relatively recently have theories of speech production concerned themselves with the part idioms and other multi-word lexical items (MLIs) play in the processes of speech production. Two theories of speech production which attempt to account for the accessing of idioms in speech production are those of Cutting and Bock (1997) and superlemma theory (Sprenger, 2003; Sprenger, Levelt, & Kempen, 2006). Much of the data supporting theories of speech production comes either from time course experiments or from slips of the tongue (Bock & Levelt, 1994). The latter are of two kinds: experimentally induced (Baars, 1992) or naturally observed (Fromkin, 1980). Cutting and Bock use experimentally induced speech errors while Sprenger et al. use time course experiments. The missing data type that has a bearing on speech production involving MLIs is that of naturally occurring slips. In this study the impact of data taken from naturally observed slips involving English and Dutch MLIs are brought to bear on these theories. The data are taken initially from a corpus of just over 1000 naturally observed English slips involving MLIs (the Tuggy corpus). Our argument proceeds as follows. First we show that slips occur independent of whether or not there are MLIs involved. In other words, speech production proceeds in certain of its aspects as though there were no MLI present. We illustrate these slips from the Tuggy data. Second we investigate the predictions of superlemma theory. Superlemma theory (Sprenger et al., 2006) accounts for the selection of MLIs and how their properties enter processes of speech production. It predicts certain activation patterns dependent on a MLI being selected. Each such pattern might give rise to slips of the tongue. This set of predictions is tested against the Tuggy data. Each of the predicted activation patterns yields a significant number of slips. These findings are therefore compatible with a view of MLIs as single units in so far as their activation by lexical concepts goes. However, the theory also predicts that some slips are likely not to occur. We confirm that such slips are not present in the data. These findings are further corroborated by reference a second smaller dataset of slips involving Dutch MLIs (the Kempen corpus). We then use slips involving irreversible binomials to distinguish between the predictions of superlemma theory which are supported by slips involving irreversible binomials and the Cutting and Bock model's predictions for slips involving these MLIs which are not
  • Kempen, G., Anbeek, G., Desain, P., Konst, L., & De Smedt, K. (1987). Auteursomgevingen: Vijfde-generatie tekstverwerkers. Informatie, 29, 988-993.
  • Kempen, G., Anbeek, G., Desain, P., Konst, L., & De Semdt, K. (1987). Author environments: Fifth generation text processors. In Commission of the European Communities. Directorate-General for Telecommunications, Information Industries, and Innovation (Ed.), Esprit'86: Results and achievements (pp. 365-372). Amsterdam: Elsevier Science Publishers.
  • Kempen, G., Anbeek, G., Desain, P., Konst, L., & De Smedt, K. (1987). Author environments: Fifth generation text processors. In Commission of the European Communities. Directorate-General for Telecommunications, Information Industries, and Innovation (Ed.), Esprit'86: Results and achievements (pp. 365-372). Amsterdam: Elsevier Science Publishers.
  • Kempen, G., & Hoenkamp, E. (1987). An incremental procedural grammar for sentence formulation. Cognitive Science, 11(2), 201-258.

    Abstract

    This paper presents a theory of the syntactic aspects of human sentence production. An important characteristic of unprepared speech is that overt pronunciation of a sentence can be initiated before the speaker has completely worked out the meaning content he or she is going to express in that sentence. Apparently, the speaker is able to build up a syntactically coherent utterance out of a series of syntactic fragments each rendering a new part of the meaning content. This incremental, left-to-right mode of sentence production is the central capability of the proposed Incremental Procedural Grammar (IPG). Certain other properties of spontaneous speech, as derivable from speech errors, hesitations, self-repairs, and language pathology, are accounted for as well. The psychological plausibility thus gained by the grammar appears compatible with a satisfactory level of linguistic plausibility in that sentences receive structural descriptions which are in line with current theories of grammar. More importantly, an explanation for the existence of configurational conditions on transformations and other linguistics rules is proposed. The basic design feature of IPG which gives rise to these psychologically and linguistically desirable properties, is the “Procedures + Stack” concept. Sentences are built not by a central constructing agency which overlooks the whole process but by a team of syntactic procedures (modules) which work-in parallel-on small parts of the sentence, have only a limited overview, and whose sole communication channel is a stock. IPG covers object complement constructions, interrogatives, and word order in main and subordinate clauses. It handles unbounded dependencies, cross-serial dependencies and coordination phenomena such as gapping and conjunction reduction. It is also capable of generating self-repairs and elliptical answers to questions. IPG has been implemented as an incremental Dutch sentence generator written in LISP.
  • Kempen, G. (Ed.). (1987). Natural language generation: New results in artificial intelligence, psychology and linguistics. Dordrecht: Nijhoff.
  • Kempen, G. (Ed.). (1987). Natuurlijke taal en kunstmatige intelligentie: Taal tussen mens en machine. Groningen: Wolters-Noordhoff.
  • Kempen, G. (1987). Tekstverwerking: De vijfde generatie. Informatie, 29, 402-406.
  • Pijls, F., Daelemans, W., & Kempen, G. (1987). Artificial intelligence tools for grammar and spelling instruction. Instructional Science, 16(4), 319-336. doi:10.1007/BF00117750.

    Abstract

    In The Netherlands, grammar teaching is an especially important subject in the curriculum of children aged 10-15 for several reasons. However, in spite of all attention and time invested, the results are poor. This article describes the problems and our attempt to overcome them by developing an intelligent computational instructional environment consisting of: a linguistic expert system, containing a module representing grammar and spelling rules and a number of modules to manipulate these rules; a didactic module; and a student interface with special facilities for grammar and spelling. Three prototypes of the functionality are discussed: BOUWSTEEN and COGO, which are programs for constructing and analyzing Dutch sentences; and TDTDT, a program for the conjugation of Dutch verbs.
  • Pijls, F., & Kempen, G. (1987). Kennistechnologische leermiddelen in het grammatica- en spellingonderwijs. Nederlands Tijdschrift voor de Psychologie, 42, 354-363.
  • De Smedt, K., & Kempen, G. (1987). Incremental sentence production, self-correction, and coordination. In G. Kempen (Ed.), Natural language generation: New results in artificial intelligence, psychology and linguistics (pp. 365-376). Dordrecht: Nijhoff.
  • Van Wijk, C., & Kempen, G. (1987). A dual system for producing self-repairs in spontaneous speech: Evidence from experimentally elicited corrections. Cognitive Psychology, 19, 403-440. doi:10.1016/0010-0285(87)90014-4.

    Abstract

    This paper presents a cognitive theory on the production and shaping of selfrepairs during speaking. In an extensive experimental study, a new technique is tried out: artificial elicitation of self-repairs. The data clearly indicate that two mechanisms for computing the shape of self-repairs should be distinguished. One is based on the repair strategy called reformulation, the second one on lemma substitution. W. Levelt’s (1983, Cognition, 14, 41- 104) well-formedness rule, which connects self-repairs to coordinate structures, is shown to apply only to reformulations. In case of lemma substitution, a totally different set of rules is at work. The linguistic unit of central importance in reformulations is the major syntactic constituent; in lemma substitutions it is a prosodic unit. the phonological phrase. A parametrization of the model yielded a very satisfactory fit between observed and reconstructed scores.
  • Kempen, G. (1976). De taalgebruiker in de mens: Een uitzicht over de taalpsychologie. Groningen: H.D. Tjeenk Willink.
  • Kempen, G. (1976). Syntactic constructions as retrieval plans. British Journal of Psychology, 67(2), 149-160. doi:10.1111/j.2044-8295.1976.tb01505.x.

    Abstract

    Four probe latency experiments show that the ‘constituent boundary effect’ (transitions between constituents are more difficult than within constituents) is a retrieval and not a storage phenomenon. The experimental logic used is called paraphrastic reproduction: after verbatim memorization of some sentences, subjects were instructed to reproduce them both in their original wording and in the form of sentences that, whilst preserving the original meaning, embodied different syntactic constructions. Syntactic constructions are defined as pairs which consist of a pattern of conceptual information and a syntactic scheme, i.e. a sequence of syntactic word categories and function words. For example, the sequence noun + finite intransitive main verb (‘John runs’) expresses a conceptual actor-action relationship. It is proposed that for each overlearned and simple syntactic construction there exists a retrieval plan which does the following. It searches through the long-term memory information that has been designated as the conceptual content of the utterance(s) to be produced, looking for a token of its conceptual pattern. The retrieved information is then cast into the format of its syntactic scheme. The organization of such plans is held responsible for the constituent boundary effect.
  • Levelt, W. J. M., & Kempen, G. (1976). Taal. In J. Michon, E. Eijkman, & L. De Klerk (Eds.), Handboek der Psychonomie (pp. 492-523). Deventer: Van Loghum Slaterus.
  • Thomassen, A., & Kempen, G. (1976). Geheugen. In J. A. Michon, E. Eijkman, & L. F. De Klerk (Eds.), Handboek der Psychonomie (pp. 354-387). Deventer: Van Loghum Slaterus.

Share this page