Corpus

Level 3

Short Description

A large, structured collection of text used as training or evaluation data.

Friendly Description: A corpus is just a big, organized pile of text that an AI learns from. The word comes from Latin and means "body," as in a body of writing. Corpuses can be made of books, news articles, websites, scientific papers, or anything else that's useful for the AI to study. The larger and more diverse the corpus, the more the AI tends to know.

Example: Researchers might gather a corpus of every public Wikipedia article in a language and use it to teach an AI the basics of vocabulary, grammar, and general knowledge. Wikipedia alone has been one of the most important corpuses in modern AI history.