Inverse Vocabulary of Contemporary Portuguese (InVoc-PT)

From String
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Presentation

An inverse vocabulary is a particular type of vocabulary in which words are presented in alphabetical order but sorted from the last to first first character. For example, here are some words (non-contiguous in the alphabet) in the order as they are shown in the inverse vocabulary: aba, alba, alga, malga, salga, ala, bala, pala, tala, etc.

It is a vocabulary and not exactly a dictionary, as it does not produce any definitions of the words there listed, though it usually shows them with their grammatical categories (a.k.a. part-of-speech). Sometimes, some quantitative information is also presented regarding subsets of endings (v.g. number of entries ending in the same 2, 3, 4 characters).

Due to its formal nature, inverse vocabularies often are by-products of "normal" machine-readable dictionaries, and they are built using software specially designed for that purpose. Inverse vocabularies are important tools for the study of several linguistic phenomena, in particular the mechanisms and productivity of morphological derivation by suffixation.

As far as we know, only two converse Portuguese vocabularies were published to date. In chronological order:

  • o Dicionário inverso da língua portuguesa, de E. M. Wolf et al. (1971): Iit contains about 12,740 word forms, with POS tags, the gender of nouns, and the transitivity information for verbs.
  • o Dicionário inverso do Português, de Ernesto d'Andrade (1993): it contains 42,300 word forms, with POS tags

The first book is virtually impossible to find today, except in libraries and specialised booksellers. None of these resources, however, is available, at least to the general public, in digital format, which renders their use less practical.

It was this gap that the STRING team at the L2F/INESC ID Lisboa intended to fill, by providing access to the InVoc-PT for a broader public via-web consultation. Several sources were used to produce the vocabulary. The InVoc-PT contains 150,700 entries, consisting of the inverted form, their lemmas and POS tags, and for some words it produces the plural inflection. Entries with the same lemma but different POS are collapsed in a single entry.

References

[1] D'Andrade, E. Dicionário Inverso do Português. Lisboa: Cosmos (1993).

[2] Wolf, E.M.; Narumov, B.P.; Vaisbord, A.S.; Kosarik, M.A. Dicionário Inverso da Língua Portuguesa [Обратный словарь португальского языка]. Moscovo: Hayka (1971).