Compound Adverbs

From String
Revision as of 07:50, 11 January 2024 by Jorge.Baptista (talk | contribs)

Testing List of Examples of the 300 most frequent compound (multi-word) adverbs in BP and EP (*)

This document presents a list of the 300 most frequent compound (multi-word) adverbs that are common to both the Brazilian (BP) and European (EP) varieties of the Portuguese language. The frequency of these adverbs was first determined from the extant lexicon-grammar of 3,500 compound adverbs [4], considering their occurrence on two corpora: the CETEM-P ́ublico corpus [6], and the Corpus Brasileiro [7]. The goal was to map the distribution of compound adverbs in corpora from each variety, as described in [4] (in preparation). Then, these most frequent expressions were queried in the Portuguese TenTen 2020 corpus [3], using the Sketch Engine platform [1]. Furthermore, using the Good Dictionary Examples (GDEX), [2], a selection of examples was collated and carefully edited to shorten each sentence as much as possible without changing the overall meaning nor the relevant syntactic dependencies involving the adverb. The example sentences were then translated into English using ChatGPT [5](version 3.5) and manually revised.

In the near future, we intend to provide, alongside these examples, the target word that the adverb is modifying within each sentence (or, eventually, the entire sentence). Furthermore, focus adverbs will be signaled.

To cite this work, please use:

Müller, Izabela, Nuno Mamede, and Jorge Baptista. Hurdles in Parsing Multi-word Adverbs: Examples from Portuguese, Proceedings of the 16th International Conference on Computational Processing of Portuguese (PROPOR 2024), Universidade de Santiago de Compostela, Galiza, Spain, March 12–15, 2024 (to appear).

@inproceedings{Muller-et-al-2024-Hurdles, author = {M\"uller, Izabela AND Mamede, Nuno AND Baptista,Jorge}, title = Template:Hurdles in Parsing Multi-word Adverbs: Examples from Portuguese, booktitle = {Proceedings of the 16$^{th}$ International Conference on Computational Processing of Portuguese (PROPOR 2024), address= {Universidade de Santiago de Compostela, Galiza, Spain, March 12--15, 2024}, year = {2024}}

Document

Spreadsheet

(*) Research for this paper has been partially supported by national funds from Fundação para a Ciência e a Tecnologia, under project reference DOI: 10.54499/UIDB/50021/2020. Izabela Müller has also received support from the University of Algarve, through the Language Sciences PhD program.

This work is disseminated under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. see https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en

References

[1] Adam Kilgarriff, Pavel Rychl ́y, Pavel Smrˇz, and David Tugwell. The sketch engine. Proceedings of the 11th EURALEX International Congress, pages 105–116, 2004.

[2] Adam Kilgarriff, Milos Hus ́ak, Katy McAdam, Michael Rundell, and Pavel Rychl`y. GDEX: Automatically finding good dictionary examples in a corpus. In Proceedings of the XIII EURALEX international congress, volume 1, pages 425–432. Universitat Pompeu Fabra Barcelona, 2008.

[3] Adam Kilgarriff, Miloˇs Jakub ́ıˇcek, Jan Pomik ́alek, Tony Berber Sardinha, and Pen WHITELOCK. PtTenTen: A Corpus for Portuguese Lexicography. Working with Portuguese Corpora, pages 111–30, 2014.

[4] Izabela M ̈uller, Jorge Baptista, and Nuno Mamede. Differentiating Brazilian and European Portuguese Multiword Adverbs. Paper presented to the 39th National Meeting of the Portuguese Linguistics Association (APL), Covilh ã, Portugal, October, 2023, 2023.

[5] OpenAI. ChatGPT-3.5: Language Models are Few-Shot Learners. https://openai.com/blog/chatgpt-3-5/, 2023. Accessed: [05/01/2024].

[6] Paulo Alexandre Rocha and Diana Santos. CETEMP ́ublico: Um corpus de grandes dimens ̃oes de linguagem jornal ́ıstica portuguesa. quot; In Maria das Gra ̧cas Volpe Nunes (ed) V Encontro para o processamento computacional da língua portuguesa escrita e falada (PROPOR 2000)(Atibaia SP 19-22 de Novembro de 2000) S ão Paulo: ICMC/USP, 2000.

[7] Tony Berber Sardinha. Corpus Brasileiro. Inform ́atica, 708:0–1, 2010.