From String
Revision as of 12:05, 5 March 2012 by Njm (talk | contribs)
Jump to: navigation, search

The RuDriCo2 module is responsible for the word-splitting (i.e. solving contractions);

(e.g. \textit{comigo} = \textit{com}/Prep + \textit{eu}/Pron

it also applies a considerably large set of disambiguation rules; finally, it identifies many unambiguous compound words.

Module evolution

This new version \textsc{RuDriCo2} is significantly (10 times) faster that the previous version, uses a more expressive language (allowing negation and disjunction, the use of regular expressions both in the lemma and in the surface form) and constitutes an approach to the XIP parser syntax (see below). It also validates the input data, features error messages and warnings for potential problems.


[1] Cláudio Diniz, Um Conversor baseado em regras de transformação declarativas, MSc thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Lisboa, Portugal, October 2010 (bibtex)

[2] Cláudio Diniz, Nuno Mamede, João D. Pereira, RuDriCo2 - a faster disambiguator and segmentation modifier, in II Simpósio de Informática (INForum 2010), Universidade do Minho, pages 573-584, September 2010 (bibtex)