InverseDic

From String
Revision as of 17:05, 15 May 2013 by Njm (talk | contribs) (Created page with "<div style="float:right;">__TOC__</div> Inverse Vocabulary of Contemporary Portuguese (InVoc-PT) === Presentation === An inverse vocabulary is a particular type of vocabulary ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Inverse Vocabulary of Contemporary Portuguese (InVoc-PT)

Presentation

An inverse vocabulary is a particular type of vocabulary in which words are presented in alphabetical order but sorted from the last to first first character. For example, here are some words (non-contiguous in the alphabet) in the order as they are shown in the inverse vocabulary: aba, alba, alga, malga, salga, ala, bala, pala, tala, etc.

It is a vocabulary and not exactly a dictionary, as it does not produce any definitions of the words there listed, though it usually shows them with their grammatical categories (a.k.a. part-of-speech). Sometimes, some quantitative information is also presented regarding subsets of endings (v.g. number of entries ending in the same 2, 3, 4 characters).

Due to its formal nature, inverse vocabularies often are by-products of "normal" machine-readable dictionaries, and they are built using software specially designed for that purpose.

Inverse vocabularies are important tools for the study of several linguistic phenomena, in particular the mechanisms and productivity of morphological derivation by suffixation.

As far as we know, only two converse Portuguese vocabularies were published to date. In chronological order:

  • o Dicionário inverso da língua portuguesa, de E. M. Wolf et al. (1971): Iit contains about 12,740 word forms, with POS tags, the gender of nouns, and the transitivity information for verbs.
  • o Dicionário inverso do Português, de Ernesto d'Andrade (1993): it contains 42,300 word forms, with POS tags

Note: In 1997, S. Eleutério has developed a reverse index based on dictionary of simple words of the DIGRAMA sistem (Eleutério et al. 1995), however this resource was only available to the research laboratory. The vocabulary featured 95,000 entries, their part-of-speech and their inflectional paradigm. Entries with different part-of-speech were not collapsed under the same entry.

The first book is virtually impossible to find today, except in libraries and specialised booksellers. None of these resources, however, was available at least to the general public, in digital format, which made its use less practical.

It was this gap that the STRING team at the L2F/INESC ID Lisboa intended to fill, by providing access to the InVoc-PT for a broader public via-web consultation. Several sources were used to produce the vocabulary. The InVoc-PT contains 150,700 entries, consisting of the inverted form, their lemmas and POS tags, and for some words it produces the plural inflection. Entries with the same lemma but different POS are collapsed in a single entry.

References

D'Andrade, E. Dicionário Inverso do Português. Lisboa: Cosmos (1993). Eleutério, S.; Ranchhod,E.; Freire, H.; Baptista, J. A system of electronic dictionaries of Portuguese. Linguisticae Investigationes 19:1, pp.57-82 (1995). Wolf, E.M.; Narumov, B.P.; Vaisbord, A.S.; Kosarik, M.A. Dicionário Inverso da Língua Portuguesa [Обратный словарь португальского языка]. Moscovo: Hayka (1971).