Difference between revisions of "Main Page"

From String
Jump to: navigation, search
Line 1: Line 1:
{{DISPLAYTITLE:<span style="display:none">{{STRING}}</span>}}
{{DISPLAYTITLE:<span style="display:none">{{MAIN PAGE}}</span>}}
Welcome to the <span class="plainlinks">[http://www.l2f.inesc-id.pt L2F]</span>'s wiki about
Welcome to the <span class="plainlinks">[http://www.l2f.inesc-id.pt L2F]</span>'s wiki about

Revision as of 22:09, 9 March 2012

Welcome to the L2F's wiki about

STRING — An Hybrid Statistical and Rule-Based Natural Language Processing Chain for Portuguese.

STRING has a modular structure and performs all basic text processing tasks, namely:

  • tokenization and text segmentation,
  • part-of-speech tagging,
  • morphosyntactic disambiguation,
  • shallow parsing (chunking), and
  • deep parsing (dependency extraction).

STRING is organized as follows. The first module receives the text to process and tokenizes it, defining the segments that compose the text. LexMan is a morphological tagger that receives the result of this segmentation as input and associates all possible part-of-speech (POS) tags to each segment. The next module groups the segments into sentences. The next module to apply is RuDriCo2. This module is a rule-based morphological disambiguator and it also makes segmentation changes to the input, like joining segments (compound words). MARv3 a stochastic morphological disambiguator, receives the result of RuDriCo2 and it selects the best POS tag to each segment. Finally, the last module to apply is XIP which is responsible for the syntactic analysis.

STRING performs:

  • Named Entity Recognition,
  • Information Retrieval,
  • Anaphora Resolution, and
  • other NLP tasks.

A web-interface makes STRING available to the community and the general public.