Welcome to Lemmatisation Portal

LemmaGen project aims at providing standardized open source multilingual platform for lemmatisation. We started this work as a result of lack of high quality lemmatiser for Slovene language. Currently we have, not only the lemmatiser for Slovene, but also for 11 other European languages and the system which is able to learn lemmatisation rules for new languages by providing it with existing wordform-lemma pair examples.

One of our strong intentions is to increase the number of supported languages. For that we hope we can count also on you, the users of these services. We invite you to contact us in case you have any data which could be used to build new lemmatisers for currently unsupported and also already supported languages.

LemmaGen name was originally abbreviation for “Lemmatiser Generator”. However, it often stands for “Lemma Generator”. These meanings also illustrate the two main components of this web site of which each user should be aware of: one main part deals with usage of prebuilt lemmatisers while the other one is concentrated on definition and creation of new lemmatisers (e.g. for new languages).

The main characteristics of the LemmaGen are:

  • it is free - open source licence for all the code included in the project,
  • multilingual support - currently 12 different languages included,
  • lemmatisation does not rely on sentence structure of the text which is processed (can be applied on each word separately, useful for example for lemmatising search query words),
  • wide variety of APIs which enable you to include LemmaGen into your own projects,
  • all sources are downloadable,
  • multiple implementations (C++, C++.Net, Python, and, C#.Net),
  • variety of platforms supported: downloadable content prebuilt for Windows & Linux, however, it can be recompiled for almost any platform,
  • range of online services which run on most web-enabled devices (can be invoked even from javascript),
  • very efficient implementation (millions of words per second of processing speed),
  • good documentation & support, and,
  • scientifically proven (and reported) quality of prebuilt lemmatisation models.

In case you are the first time visitor of our site, we recommend you to visit more detailed description of this portal on Overview page.

welcome

LemmaGen team wishes you joyful experience using our lemmatisation through either downloadable content or online services. In a case of any difficulties you can contact us via email.