CorPop: a corpus of popular Brazilian Portuguese

This research proposes a corpus of popular Brazilian Portuguese, called CorPop, with texts selected based on the average level of literacy of the country's readers. CorPop’s theoretical and methodological bases are interdisciplinary and fall within the scope of Language Studies and related discip...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
Bilingual corpus made out of PDF documents from the European Medicines Agency, (EMEA), https://www.ema.europa.eu, (February 2020) (EN-PT)

EN-PT Bilingual corpus made out of PDF documents from the European Medicines Agency, (EMEA), https://www.ema.europa.eu, (February 2020).

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
ParaCrawl release 7 Portuguese-English

Portuguese-English parallel from release 7 of the ParaCrawl project, specifically "Broader Web-Scale Provision of Parallel Corpora for European Languages". This version is filtered with BiCleaner with a threshold of 0.5. Data was crawled from the web following robots.txt, as is standard practice....

Resource Type:Corpus
Media Type:Text
Languages:English
Portuguese
CINTIL-USuite

CINTIL-USuite is a corpus of Portuguese that is annotated with lemmas, the Universal Part-of-Speech tagset (UPOS) and Universal feature bundles, related to the Universal Dependency framework, and that contains around 1 million annotated tokens. It is described in this article: António Branc...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CRPC Discourse Bank v1.0

The CRPC Discourse Bank is labeled for discourse relations (also referred to as rhetorical relations or coher- ence relations), such as cause and condition, that hold between two spans of text and contribute to ensure the overall cohesion and coherence of the text. The scheme follows the principl...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
CIPM

CIPM is a set of historical, religious, notarial, literary texts in prose and verse, written in medieval portuguese. It has around 3.5 million words.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
ArgMine Corpus

A corpus of opinion articles annotated with arguments, following a claim-premise model.

Resource Type:Corpus
Media Type:Text
Language:Portuguese
FEUP news corpus

News articles collected from Portuguese newspapers.

Resource Type:Corpus
Media Type:Text
Language:Portuguese

Order by:

Filter by:

Portuguese (192)
English (50)
German (20)
French (19)
Czech (17)
Italian (17)
Basque (14)
Bulgarian (14)
Slovak (8)
Polish (7)
Danish (6)
Finnish (6)
Irish (6)
Latvian (6)
Maltese (6)
Swedish (6)
Catalan (3)
Chinese (3)
Spanish (3)
Arabic (2)
Latin (2)
Bosnian (1)
Hindi (1)
Russian (1)
Serbian (1)
Swahili (1)
Thai (1)
Turkish (1)
Urdu (1)