New pages

New pages
Hide registered users | Show bots | Hide redirects
  • 15:49, 10 January 2024Other (hist | edit) ‎[140 bytes]Eugenio (talk | contribs) (Created page with "This page is really empty! Please visit us soon.... <blockquote><blockquote> file:UnderConstruction.png </blockquote></blockquote>")
  • 15:34, 10 January 2024Grammar (hist | edit) ‎[471 bytes]Eugenio (talk | contribs) (Created page with "==== Description ==== * Gramáticas locais: 3164 rules * Chunker: 344 rules * Dependências: 1613 rules ==== Local Grammars ==== * LGAbstraction * LGAdvérbios * LGCulture * LGDatum * LGElectronic * LGEvent * LGLocation * LGMeasure * LGNumber * LGOrg * LGPeople * LGProfession * LGPronouns * LGRelatives * LGSports * LGTime ==== Dependencies ==== * Auxiliary * Syntactic * BuildingLocation * BusinessRelations * Family * FixedPhrase * Lifetime * PeopleLocation * Time")
  • 15:31, 10 January 2024Disambiguation (hist | edit) ‎[324 bytes]Eugenio (talk | contribs) (Created page with "==== Description ==== Regras de descontracção: 178 Regras de desambiguação: 188 ==== Disambiguation ==== * Disamb * DisambAdjNoun * DisambAdjVerb * DisambAdv * DisambArtPron * DisambDLF * DisambExpandLast * DisambIdiomatic * DisambLast * DisambLemma * DisambPastPartNoun * DisambPrefix * DisambVerb * DisambVerbNoun")
  • 15:27, 10 January 2024Transfer (hist | edit) ‎[1,133 bytes]Eugenio (talk | contribs) (Created page with "==== Unbabel ==== [http://unbabel.com/ Unbabel] uses the Portuguese Named Entities Recognition modules of STRING for the ''anonymisation'' (or ''de-identification'') and the ''re-identification'' of named entities in the distributed translation process. Anonymisation is required for dealing with privacy issues whenever sensitive data sharing is involved, as in the [http://unbabel.com/ Unbabel] crowdsourcing translation service. ==== OOBIAN ==== Main_Pa...")
  • 14:52, 10 January 2024Compound Adverbs (hist | edit) ‎[4,369 bytes]Eugenio (talk | contribs) (Created page with "This is a list of adverbs.")
  • 14:24, 10 January 2024Dictionaries (hist | edit) ‎[29,195 bytes]Eugenio (talk | contribs) (Created page with "<div style="float:right;">__TOC__</div> === Description === STRING operates based on large-sized, comprehensive, highly granular lexical resources. Much emphasis is put in building them, under the conviction that the lexicon is key to many NLP tasks and applications. This page, constantly under construction, describes briefly the main resources already available and being used by STRING. === LexMan Dictionary === LexMan uses a dictionary of lemmas containing, for the m...")
  • 14:12, 10 January 2024Corpora (hist | edit) ‎[15,142 bytes]Eugenio (talk | contribs) (Created page with "=== Zero Anaphora Corpus (ZAC) === <div style="float:right;">__TOC__</div> ZAC - Zero Anaphora Corpus is a corpus of Brazilian Portuguese texts built in view of the construction of an Anaphora Resolution system, which is part of the STRING system. The ZAC corpus is aimed at the resolution of the so-called zero-anaphora, that is, an anaphora relation where the anaphoric expression (or anaphor) has been zeroed. In the following, we briefly present the main linguistic asp...")
  • 13:52, 10 January 2024MARv4 (hist | edit) ‎[5,752 bytes]Eugenio (talk | contribs) (Created page with "<div style="float:right;">__TOC__</div> ==== Acronym ==== '''''MARv''''' stands for '''M'''orphossyntactic '''A'''mbiguity '''R'''esol'''v'''er ==== Introduction ==== MARv2's architecture comprehends two submodules: a set of linguistically-oriented disambiguation rules module and a probabilistic disambiguation module. The linguistic-oriented is no longer used in the STRING chain because that function is now implemented by the RuDriCo module. MARv2...")
  • 13:48, 10 January 2024InverseDic (hist | edit) ‎[2,936 bytes]Eugenio (talk | contribs) (Created page with "{{DISPLAYTITLE: Inverse Vocabulary of Contemporary Portuguese (InVoc-PT)}} === Presentation === <div style="float:right;">__TOC__</div> An inverse vocabulary is a particular type of vocabulary in which words are presented in alphabetical order but sorted from the last to first first character. For example, here are some words (non-contiguous in the alphabet) in the order as they are shown in the inverse vocabulary: aba, alba, alga, malga, salga, ala, bala, pala, tala, e...")
  • 13:33, 10 January 2024RuDriCo2 (hist | edit) ‎[5,984 bytes]Eugenio (talk | contribs) (Created page with "<div style="float:right;">__TOC__</div> ==== Acronym ==== '''''RuDriCo''''' stands for '''''Ru'''''le '''''Dri'''''ven '''''Co'''''nverter ==== Brief Description ==== RuDriCo2's main goal is to provide for an adjustment of the results produced by the LexMan morphological analyzer to the specific needs of each parser. In order to achieve this, it modifies the segmentation that is done by the former. For example, it might contract expressions provided by the morp...")
  • 13:31, 10 January 2024LexMan (hist | edit) ‎[2,134 bytes]Eugenio (talk | contribs) (Created page with "<div style="float:right;">__TOC__</div> ==== Acronym ==== '''''LexMan''''' stands for '''Lex'''ical '''M'''orphological '''an'''alyzer ==== Brief Description ==== LexMan is responsible for according to each token its part-of-speech (POS) and any other relevant morphosyntactic feature (gender, number, tense, mood, case, degree, etc.), using [http://en.wikipedia.org/wiki/Finite_state_transducer finite state transducers]. LexMan uses very rich, highly granular ta...")
  • 13:24, 10 January 2024Contact (hist | edit) ‎[1,040 bytes]Eugenio (talk | contribs) (Created page with "Any comments, suggestions, doubts or ideas, please contact us! We would like to hear from you! We are located in Lisbon, [http://www.l2f.inesc-id.pt/wiki/index.php/Location near the Saldanha area].<br> A general path finder is [http://www.transporlis.sapo.pt/index.cfm here]. Special options can be found [http://www.l2f.inesc-id.pt/wiki/index.php/Contacts_and_Directions here]. ==== Contacts ==== {| width="400" cellspacing="2" cellpadding="2" |- ! width="16%" valign="TO...")
  • 13:22, 10 January 2024XIP (hist | edit) ‎[20,426 bytes]Eugenio (talk | contribs) (Created page with "<div style="float:right;">__TOC__</div> ==== Acronym ==== '''''XIP''''' stands for '''''X'''''EROX '''''I'''''ncremental '''''P'''''arsing ==== Introduction ==== XIP is a <span class="plainlinks">[http://www.xrce.xerox.com/Research-Development/Document-Content-Laboratory/Parsing-Semantics/Robust-Parsing XEROX]</span> parser, based on finite-state technology and able to perform several tasks, namely: * adding lexical, syntactic and semantic information; * applying...")
  • 13:15, 10 January 2024Publications (hist | edit) ‎[24,448 bytes]Eugenio (talk | contribs) (Created page with "<div style="float:right;">__TOC__</div> ====in 2016==== '''[73]''' Francisco Dias [http://www.inesc-id.pt/ficheiros/publicacoes/10593.pdf Multilingual Automated Text Anonymization]. MSc thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, Lisboa, Portugal, June 2016 (bibtex) '''[72]''' Joana Pinto [http://www.inesc-id.pt/ficheiros/publicacoes/10639.pdf Fine-grained POS-tagging: Full disambiguation of verbal morpho-synta...")
  • 13:07, 10 January 2024Team (hist | edit) ‎[40,630 bytes]Eugenio (talk | contribs) (Created page with "== Coordination == {| width="100%" valign="top" cellpadding="10px" |style="vertical-align: top; text-align: left; width: 35%;" | {{Coordinator |name=[http://www.l2f.inesc-id.pt/wiki/index.php/Nuno_Mamede Nuno Mamede] (Computer Science Coordination) |photo=Nuno.png |cv=Nuno J. Mamede received his graduation, MSc and PhD degrees in Electrical and Computer Engineering by the [http://www.ist.utl.pt Instituto Superior Técnico], Lisbon, in 1981, 1985 and 1992, respectively....")
  • 13:03, 10 January 2024Architecture (hist | edit) ‎[8,709 bytes]Eugenio (talk | contribs) (Created page with "<div style="float:right;">__TOC__</div> '''STRING''' is a '''St'''atistical and '''R'''ule-Based '''N'''atural Lan'''g'''uage Processing Chain for Portuguese developed at <span class="plainlinks">[https://www.hlt.inesc-id.pt/wiki/ HLT]</span> and it consists of several modules, which are represented in the next figure: 800px ==== Tokenizer ==== The first module is responsible for text segmentation, and it divides the text into tokens. Besides...")