Main Page: Difference between revisions

Revision as of 11:58, 10 January 2024

Welcome to the L2F's wiki about

STRING — A Hybrid Statistical and Rule-Based Natural Language Processing Chain for Portuguese.

STRING has a modular structure and performs all basic text processing tasks, namely:

tokenization and text segmentation,
part-of-speech tagging,
morphosyntactic disambiguation,
shallow parsing (chunking), and
deep parsing (dependency extraction).

STRING is organized as follows. The first module receives the text to process and tokenizes it, defining the segments that compose the text. LexMan is a morphological tagger that receives the result of this segmentation as input and associates all possible part-of-speech (POS) tags to each segment. The next module groups the segments into sentences. The next module to apply is RuDriCo2. This module is a rule-based morphological disambiguator and it also makes segmentation changes to the input, like joining segments (compound words). MARv4 a stochastic morphological disambiguator, receives the result of RuDriCo2 and it selects the best POS tag to each segment. Finally, the last module to apply is XIP which is responsible for the syntactic analysis.

STRING performs:

Named Entity Recognition,
Information Retrieval,
Anaphora Resolution, and
other NLP tasks.

Though the initial modules of the STRING chain can be traced back as far as 2001 (see publications), the onset of current architecture could be placed in 2006, with the integration of the XIP parser in the NLP chain and the development of the corresponding Portuguese grammar.

A web-interface makes STRING available to the community and the general public.

@@ Line 1: / Line 1: @@
-<strong>MediaWiki has been installed.</strong>
+{{DISPLAYTITLE: STRING}}
+Welcome to the <span class="plainlinks">[http://www.hlt.inesc-id.pt L2F]</span>'s wiki about
+<blockquote>'''''STRING'''''  — A Hybrid Statistical and Rule-Based Natural Language Processing Chain for Portuguese.</blockquote>
+----
-Consult the [https://www.mediawiki.org/wiki/Special:MyLanguage/Help:Contents User's Guide] for information on using the wiki software.
+'''''STRING''''' has a modular structure and performs all basic text processing tasks, namely:
+* tokenization and text segmentation,
+* part-of-speech tagging,
+* morphosyntactic disambiguation,
+* shallow parsing (chunking), and
+* deep parsing (dependency extraction).
-== Getting started ==
-* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Configuration_settings Configuration settings list]
+'''''STRING''''' is organized as follows. The first module receives the text to process and tokenizes it, defining the segments that compose the text. [[LexMan]] is a morphological tagger that receives the result of this segmentation as input and associates all possible part-of-speech (POS) tags to each segment. The next module groups the segments into sentences. The next module to apply is [[RuDriCo2]]. This module is a rule-based morphological disambiguator and it also makes segmentation changes to the input, like joining segments (compound words). [[MARv4]] a stochastic morphological disambiguator, receives the result of [[RuDriCo2]] and it selects the best POS tag to each segment. Finally, the last module to apply is [[XIP]] which is responsible for the syntactic analysis.
-* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:FAQ MediaWiki FAQ]
-* [https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/ MediaWiki release mailing list]
-* [https://www.mediawiki.org/wiki/Special:MyLanguage/Localisation#Translation_resources Localise MediaWiki for your language]
+'''''STRING''''' performs:
-* [https://www.mediawiki.org/wiki/Special:MyLanguage/Manual:Combating_spam Learn how to combat spam on your wiki]
+* Named Entity Recognition,
+* Information Retrieval,
+* Anaphora Resolution, and
+* other NLP tasks.
+Though the initial modules of the STRING chain can be traced back as far as 2001 (see publications), the onset of current architecture could be placed in 2006, with the integration of the XIP parser in the NLP chain and the development of the corresponding Portuguese grammar.
+A [http://string.hlt.inesc-id.pt/demo web-interface] makes '''''STRING''''' available to the community and the general public.