LexMan

From String

Acronym

LexMan stands for Lexical Morphological analyzer


Brief Description

LexMan is responsible for according to each token its part-of-speech (POS) and any other relevant morphosyntactic feature (gender, number, tense, mood, case, degree, etc.), using finite state transducers.

LexMan uses very rich, highly granular tagset, featuring 12 categories (v.g. noun, verb, adjective, pronoun, article, adverb, preposition, conjunction, numeral, interjection, ponctuation, and symbol) and 11 fields (scilicet, category (CAT), subcategory (SCT), mood (MOD), tense (TEN), person (PER), number (NUM), gender (GEN), degree (DEG), case (CAS), syntactic features (SYN), and semantic features (SEM)). No category uses all ten fields.

LexMan is used to generate and validate all the inflected forms associated to lexical lemmas, along with the corresponding morpho-syntactic information. LexMan also provides an efficient, fast and ductile way of maintaining and updating the lexicons.


Architecture

LexMan has four main modules:

  • Word generator - uses the lemmas and the flexional paradigms to generate all the forms (words);
  • Transducer generator - uses the forms and the information about clitics and afixes to generate "the transducer";
  • Guesser - proposes some tags to words that have not been generated by the "Word Generator";
  • Morphological Parser - receives a word and uses the transducer to find out all the possible tags of that word. If could not find any uses the "Guesser" module.


Module evolution

A new version of LexMan, capable of performing tokenization, is currently being developed by Alexandre Vicente.


Demo

LexMan can be tested here


User's Manual

Though LexMan is not freely available, the user's manual will be available here as soon as possible...


Publications