Difference between revisions of "LexMan"

From String
Jump to: navigation, search
(Created page with "LexMan is responsible for according to each token its part-of-speech (POS) and any other relevant morphosyntactic feature (gender, number, tense, mood, case, degree, etc.). The...")
 
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
LexMan is responsible for according to each token its part-of-speech (POS) and any other relevant morphosyntactic feature (gender, number, tense, mood, case, degree, etc.).
+
<div style="float:right;">__TOC__</div>
 +
==== Acronym ====
 +
'''''LexMan''''' stands for '''Lex'''ical '''M'''orphological '''an'''alyzer
  
The rich tag set has a high granularity featuring 12 POS categories and 11 fields.
 
  
 +
==== Brief Description ====
 +
[[LexMan]] is responsible for according to each token its part-of-speech (POS) and any other relevant morphosyntactic feature (gender, number, tense, mood, case, degree, etc.), using [http://en.wikipedia.org/wiki/Finite_state_transducer finite state transducers].
  
 +
[[LexMan]] uses very rich, highly granular tagset, featuring 12 '''categories''' (''v.g.'' noun, verb, adjective, pronoun, article, adverb, preposition, conjunction, numeral, interjection, ponctuation, and symbol) and 11 '''fields''' (''scilicet'', category (CAT), subcategory (SCT), mood (MOD), tense (TEN), person (PER), number (NUM), gender (GEN), degree (DEG), case (CAS), syntactic features (SYN), and semantic features (SEM)). No category uses all ten fields.
  
 +
[[LexMan]] is used to generate and validate all the inflected forms associated to lexical lemmas, along with the corresponding morpho-syntactic information.
 +
[[LexMan]] also provides an efficient, fast and ductile way of maintaining and updating the lexicons.
  
  
 +
==== Architecture ====
 +
[[file:LexManArchitecture.jpg|600px]]
  
 +
[[LexMan]] has four main modules:
 +
* ''Word generator'' - uses the lemmas and the flexional paradigms to generate all the forms (words);
 +
* ''Transducer generator'' - uses the forms and the information about clitics and afixes to generate "the transducer";
 +
* ''Guesser'' - proposes some tags to words that have not been generated by the "Word Generator";
 +
* ''Morphological Parser'' - receives a word and uses the transducer to find out all the possible tags of that word. If could not find any uses the "Guesser" module.
  
==== PUBLICATIONS ====
+
 
 +
==== Module evolution ====
 +
A new version of LexMan, capable of performing tokenization, is currently being developed by Alexandre Vicente.
 +
 
 +
 
 +
==== Demo ====
 +
[[LexMan]] can be tested [http://string.l2f.inesc-id.pt/demo/postagger.pl here]
 +
 
 +
 
 +
==== User's Manual ====
 +
Though [[LexMan]] is not freely available, the user's manual will be available here as soon as possible...
 +
 
 +
 
 +
==== Publications ====

Latest revision as of 02:34, 10 March 2012

Acronym

LexMan stands for Lexical Morphological analyzer


Brief Description

LexMan is responsible for according to each token its part-of-speech (POS) and any other relevant morphosyntactic feature (gender, number, tense, mood, case, degree, etc.), using finite state transducers.

LexMan uses very rich, highly granular tagset, featuring 12 categories (v.g. noun, verb, adjective, pronoun, article, adverb, preposition, conjunction, numeral, interjection, ponctuation, and symbol) and 11 fields (scilicet, category (CAT), subcategory (SCT), mood (MOD), tense (TEN), person (PER), number (NUM), gender (GEN), degree (DEG), case (CAS), syntactic features (SYN), and semantic features (SEM)). No category uses all ten fields.

LexMan is used to generate and validate all the inflected forms associated to lexical lemmas, along with the corresponding morpho-syntactic information. LexMan also provides an efficient, fast and ductile way of maintaining and updating the lexicons.


Architecture

LexManArchitecture.jpg

LexMan has four main modules:

  • Word generator - uses the lemmas and the flexional paradigms to generate all the forms (words);
  • Transducer generator - uses the forms and the information about clitics and afixes to generate "the transducer";
  • Guesser - proposes some tags to words that have not been generated by the "Word Generator";
  • Morphological Parser - receives a word and uses the transducer to find out all the possible tags of that word. If could not find any uses the "Guesser" module.


Module evolution

A new version of LexMan, capable of performing tokenization, is currently being developed by Alexandre Vicente.


Demo

LexMan can be tested here


User's Manual

Though LexMan is not freely available, the user's manual will be available here as soon as possible...


Publications