Main Page: Difference between revisions
No edit summary |
mNo edit summary |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
< | {{DISPLAYTITLE: STRING}} | ||
Welcome to the <span class="plainlinks">[http://www.hlt.inesc-id.pt HLT]</span>'s wiki about | |||
<blockquote>'''''STRING''''' — A Hybrid Statistical and Rule-Based Natural Language Processing Chain for Portuguese.</blockquote> | |||
---- | |||
'''''STRING''''' has a modular structure and performs all basic text processing tasks, namely: | |||
* tokenization and text segmentation, | |||
* part-of-speech tagging, | |||
* morphosyntactic disambiguation, | |||
* shallow parsing (chunking), and | |||
* deep parsing (dependency extraction). | |||
'''''STRING''''' is organized as follows. The first module receives the text to process and tokenizes it, defining the segments that compose the text. [[LexMan]] is a morphological tagger that receives the result of this segmentation as input and associates all possible part-of-speech (POS) tags to each segment. The next module groups the segments into sentences. The next module to apply is [[RuDriCo2]]. This module is a rule-based morphological disambiguator and it also makes segmentation changes to the input, like joining segments (compound words). [[MARv4]] a stochastic morphological disambiguator, receives the result of [[RuDriCo2]] and it selects the best POS tag to each segment. Finally, the last module to apply is [[XIP]] which is responsible for the syntactic analysis. | |||
* | '''''STRING''''' performs: | ||
* Named Entity Recognition, | |||
* Information Retrieval, | |||
* Anaphora Resolution, and | |||
* other NLP tasks. | |||
Though the initial modules of the STRING chain can be traced back as far as 2001 (see publications), the onset of current architecture could be placed in 2006, with the integration of the XIP parser in the NLP chain and the development of the corresponding Portuguese grammar. | |||
A [http://string.hlt.inesc-id.pt/demo web-interface] makes '''''STRING''''' available to the community and the general public. |
Latest revision as of 13:00, 10 January 2024
Welcome to the HLT's wiki about
STRING — A Hybrid Statistical and Rule-Based Natural Language Processing Chain for Portuguese.
STRING has a modular structure and performs all basic text processing tasks, namely:
- tokenization and text segmentation,
- part-of-speech tagging,
- morphosyntactic disambiguation,
- shallow parsing (chunking), and
- deep parsing (dependency extraction).
STRING is organized as follows. The first module receives the text to process and tokenizes it, defining the segments that compose the text. LexMan is a morphological tagger that receives the result of this segmentation as input and associates all possible part-of-speech (POS) tags to each segment. The next module groups the segments into sentences. The next module to apply is RuDriCo2. This module is a rule-based morphological disambiguator and it also makes segmentation changes to the input, like joining segments (compound words). MARv4 a stochastic morphological disambiguator, receives the result of RuDriCo2 and it selects the best POS tag to each segment. Finally, the last module to apply is XIP which is responsible for the syntactic analysis.
STRING performs:
- Named Entity Recognition,
- Information Retrieval,
- Anaphora Resolution, and
- other NLP tasks.
Though the initial modules of the STRING chain can be traced back as far as 2001 (see publications), the onset of current architecture could be placed in 2006, with the integration of the XIP parser in the NLP chain and the development of the corresponding Portuguese grammar.
A web-interface makes STRING available to the community and the general public.