Distributions of cognates in Europe as based on Levenshtein distance

Schepens, J., Dijksta, T., & Grootjen, F. (2012). Distributions of cognates in Europe as based on Levenshtein distance. Bilingualism: Language and Cognition, 15(SI ), 157-166. doi:10.1017/S1366728910000623.

Researchers on bilingual processing can benefit from computational tools developed in artificial intelligence. We show that a normalized Levenshtein distance function can efficiently and reliably simulate bilingual orthographic similarity ratings. Orthographic similarity distributions of cognates and non-cognates were identified across pairs of six European languages: English, German, French, Spanish, Italian, and Dutch. Semantic equivalence was determined using the conceptual structure of a translation database. By using a similarity threshold, large numbers of cognates could be selected that nearly completely included the stimulus materials of experimental studies. The identified numbers of form-similar and identical cognates correlated highly with branch lengths of phylogenetic language family trees, supporting the usefulness of the new measure for cross-language comparison. The normalized Levenshtein distance function can be considered as a new formal model of cross-language orthographic similarity.

Permanent link to publication record

Publication type

Journal article

Publication date

2012

Files public

DOI

PDF

Distributions of cognates in Europe as based on Levenshtein distance

Contact

Follow us

Breadcrumb

Distributions of cognates in Europe as based on Levenshtein distance

Share this page