Main Page

From String
Revision as of 22:09, 9 March 2012 by Njm (talk | contribs)
Jump to: navigation, search


Welcome to the L2F's wiki about

STRING — An Hybrid Statistical and Rule-Based Natural Language Processing Chain for Portuguese.

STRING has a modular structure and performs all basic text processing tasks, namely:

  • tokenization and text segmentation,
  • part-of-speech tagging,
  • morphosyntactic disambiguation,
  • shallow parsing (chunking), and
  • deep parsing (dependency extraction).

STRING is organized as follows. The first module receives the text to process and tokenizes it, defining the segments that compose the text. LexMan is a morphological tagger that receives the result of this segmentation as input and associates all possible part-of-speech (POS) tags to each segment. The next module groups the segments into sentences. The next module to apply is RuDriCo2. This module is a rule-based morphological disambiguator and it also makes segmentation changes to the input, like joining segments (compound words). MARv3 a stochastic morphological disambiguator, receives the result of RuDriCo2 and it selects the best POS tag to each segment. Finally, the last module to apply is XIP which is responsible for the syntactic analysis.

STRING performs:

  • Named Entity Recognition,
  • Information Retrieval,
  • Anaphora Resolution, and
  • other NLP tasks.

A web-interface makes STRING available to the community and the general public.