Compound Adverbs

Testing List of Examples of the 300 most frequent compound (multi-word) adverbs in BP and EP (*)

This document presents a list of the 300 most frequent compound (multi-word) adverbs that are common to both the Brazilian (BP) and European (EP) varieties of the Portuguese language. The frequency of these adverbs was first determined from the extant lexicon-grammar of 3,500 compound adverbs [4], considering their occurrence on two corpora: the CETEMPúblico corpus [6], and the Corpus Brasileiro [7]. The goal was to map the distribution of compound adverbs in corpora from each variety, as described in [4] (in preparation). Then, these most frequent expressions were queried in the Portuguese TenTen 2020 corpus [3], using the Sketch Engine platform [1]. Furthermore, using the Good Dictionary Examples (GDEX) extraction tool, [2], a selection of examples was collated and carefully edited to shorten each sentence as much as possible without changing the overall meaning or the relevant syntactic dependencies involving the adverb. The example sentences were then translated into English using ChatGPT (version 3.5)[5] and manually revised. Soon, we intend to provide, alongside these examples, the target word that the adverb is modifying within each sentence (or, eventually, the entire sentence). Focus adverbs will be signaled also.

To cite this work, please use:

Müller, Izabela, Nuno Mamede, and Jorge Baptista. Hurdles in Parsing Multi-word Adverbs: Examples from Portuguese, Proceedings of the 16th International Conference on Computational Processing of Portuguese (PROPOR 2024), Universidade de Santiago de Compostela, Galiza, Spain, March 12–15, 2024 (to appear).

@inproceedings{Muller-et-al-2024-Hurdles, author = {M\"uller, Izabela AND Mamede, Nuno AND Baptista,Jorge}, title = Template:Hurdles in Parsing Multi-word Adverbs: Examples from Portuguese, booktitle = {Proceedings of the 16$^{th}$ International Conference on Computational Processing of Portuguese (PROPOR 2024), address= {Universidade de Santiago de Compostela, Galiza, Spain, March 12--15, 2024}, year = {2024}}

Document

Spreadsheet

(*) Research for this paper has been partially supported by national funds from Fundação para a Ciência e a Tecnologia, under project reference DOI: 10.54499/UIDB/50021/2020. Izabela Müller has also received support from the University of Algarve, through the Language Sciences PhD program. This work is disseminated under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. [1]

References

[1] Adam Kilgarriff, Pavel Rychly, Pavel Smrz, and David Tugwell. The sketch engine. Proceedings of the 11th EURALEX International Congress, pages 105–116, 2004.

[2] Adam Kilgarriff, Milos Husak, Katy McAdam, Michael Rundell, and Pavel Rychly. GDEX: Automatically finding good dictionary examples in a corpus. In Proceedings of the 13th EURALEX International Congress, volume 1, pages 425–432. Universitat Pompeu Fabra Barcelona, 2008.

[3] Adam Kilgarriff, Milos Jakubıcek, Jan Pomikalek, Tony Berber Sardinha, and Pen WHITELOCK. PtTenTen: A Corpus for Portuguese Lexicography. Working with Portuguese Corpora, pages 111–30, 2014.

[4] Izabela Müller, Jorge Baptista, and Nuno Mamede. Differentiating Brazilian and European Portuguese Multiword Adverbs. Paper presented to the 39th National Meeting of the Portuguese Linguistics Association (APL), Covilhã, Portugal, October, 2023.

[5] OpenAI. ChatGPT-3.5: Language Models are Few-Shot Learners. [2], 2023. Accessed: [05/01/2024].

[6] Paulo Alexandre Rocha and Diana Santos. CETEMPúblico: Um corpus de grandes dimensões de linguagem jornalística portuguesa. In Maria das Graças Volpe Nunes (ed.) V Encontro para o processamento computacional da língua portuguesa escrita e falada (PROPOR 2000)(Atibaia SP 19-22 de Novembro de 2000), São Paulo, Brasil: ICMC/USP, 2000.

[7] Tony Berber Sardinha. Corpus Brasileiro. Informática, 708:0–1, 2010.