Compound Adverbs and File:Pardal2001.txt: Difference between pages

From String
(Difference between pages)
No edit summary
 
No edit summary
 
Line 1: Line 1:
Testing List of Examples of the 300 most frequent compound (multi-word) adverbs in BP and EP (*)


This document presents a list of the 300 most frequent compound (multi-word) adverbs that are common
to both the Brazilian (BP) and European (EP) varieties of the Portuguese language. The frequency of
these adverbs was first determined from the extant lexicon-grammar of 3,500 compound adverbs [4],
considering their occurrence on two corpora: the CETEM-P ́ublico corpus [6], and the Corpus Brasileiro
[7]. The goal was to map the distribution of compound adverbs in corpora from each variety, as described
in [4] (in preparation). Then, these most frequent expressions were queried in the Portuguese TenTen
2020 corpus [3], using the Sketch Engine platform [1]. Furthermore, using the Good Dictionary Examples
(GDEX), [2], a selection of examples was collated and carefully edited to shorten each sentence as much
as possible without changing the overall meaning nor the relevant syntactic dependencies involving the
adverb. The example sentences were then translated into English using ChatGPT [5](version 3.5) and
manually revised.
In the near future, we intend to provide, alongside these examples, the target word that the adverb
is modifying within each sentence (or, eventually, the entire sentence). Furthermore, focus adverbs will
be signaled.
To cite this work, please use:
Müller, Izabela, Nuno Mamede, and Jorge Baptista. Hurdles in Parsing Multi-word Adverbs: Examples from Portuguese, Proceedings of the 16th International Conference on Computational Processing of Portuguese (PROPOR 2024), Universidade de Santiago de Compostela, Galiza, Spain, March 12–15, 2024 (to appear).
@inproceedings{Muller-et-al-2024-Hurdles, author = {M\"uller, Izabela AND Mamede, Nuno AND Baptista,Jorge}, title = {{Hurdles in Parsing Multi-word Adverbs: Examples from Portuguese}}, booktitle = {Proceedings of the 16$^{th}$ International Conference
on Computational Processing of Portuguese (PROPOR 2024), address= {Universidade de Santiago de Compostela, Galiza, Spain, March 12--15, 2024}, year = {2024}}
[[media:PortugueseCompoundAdverbs.pdf|Document]]
[[media:PortugueseCompoundAdverbs.xlsx|Spreadsheet]]
(*) Research for this paper has been partially supported by national funds from Funda ̧c ̃ao para a Ciˆencia e a Tecnologia,
under project reference DOI: 10.54499/UIDB/50021/2020. Izabela M ̈uller has also received support from the University of
Algarve, through the Language Sciences PhD program.
This work is disseminated under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0
International License. see https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en
References
[1] Adam Kilgarriff, Pavel Rychl ́y, Pavel Smrˇz, and David Tugwell. The sketch engine. Proceedings of the 11th EURALEX International Congress, pages 105–116, 2004.
[2] Adam Kilgarriff, Milos Hus ́ak, Katy McAdam, Michael Rundell, and Pavel Rychl`y. GDEX: Automatically finding good dictionary examples in a corpus. In Proceedings of the XIII EURALEX international congress, volume 1, pages 425–432. Universitat Pompeu Fabra Barcelona, 2008.
[3] Adam Kilgarriff, Miloˇs Jakub ́ıˇcek, Jan Pomik ́alek, Tony Berber Sardinha, and Pen WHITELOCK. PtTenTen: A Corpus for Portuguese Lexicography. Working with Portuguese Corpora, pages 111–30, 2014.
[4] Izabela M ̈uller, Jorge Baptista, and Nuno Mamede. Differentiating Brazilian and European Portuguese Multiword Adverbs. Paper presented to the 39th National Meeting of the Portuguese Linguistics Association (APL), Covilh ã, Portugal, October, 2023, 2023.
[5] OpenAI. ChatGPT-3.5: Language Models are Few-Shot Learners. https://openai.com/blog/chatgpt-3-5/, 2023. Accessed: [05/01/2024].
[6] Paulo Alexandre Rocha and Diana Santos. CETEMP ́ublico: Um corpus de grandes dimens ̃oes de linguagem jornal ́ıstica portuguesa. quot; In Maria das Gra ̧cas Volpe Nunes (ed) V Encontro para o processamento computacional da língua portuguesa escrita e falada (PROPOR 2000)(Atibaia SP 19-22 de Novembro de 2000) S ão Paulo: ICMC/USP, 2000.
[7] Tony Berber Sardinha. Corpus Brasileiro. Inform ́atica, 708:0–1, 2010.

Latest revision as of 12:40, 11 January 2024