Corpus search guidelines

International Corpus of English

Very simple search guide

The list below contains examples of search strings. '*' and '+' are wildcards and denote zero or more strings ('*') or one or more strings ('+'). The vertical bar '|' separates alternatives, so that (would|could) means search for either 'would' or 'could'. Note also the use of the brackets around the alternatives.

Click on keyword to see context. Filter (thin) your search by chosing Country/ies, Region and/or Text type. To see how many matches (hits) your search yields within each subcorpus (country), click Sort.

When you choose Sort to file, a so-called tab-separated file (.tsv) is created. This can be downloaded and imported into e.g. Excel or Word. To import into Excel, use the Data/From Text import tool. To get the file into Word, open it in e.g. Notpad or some other text editor, copy and paste it into Word, select the whole text (Select all) and select Insert/Table/Convert Text to Table. Note that Word is not very good at handling huge files.

Every time you choose Sort to file, a new file with the same name is created, overwriting any existing file with that name. Files created with this function, Sort to file, are deleted automatically every night.

Note that in this searchable version of the ICE corpora, ICE India consists of 499 texts, ICE Philippines of 488 texts and ICE USA of written material only (200 texts). None of the corpora are tagged for part of speech or lemmatised.


British National Corpus

Search guide

Below is a list of possible search string.

Click on keyword to see context. Filter (thin) your search by chosing either Domain(s) and/or Classification codes before searching.

The BNC is both lemmatised and tagged for part of speech (POS), which means that you can search for POS tags or lemmas or words or a combination of these. Lemmas must be enclosed by square brackets [] and POS tags by <>. You may chose a POS tag from the Wordclasses drop-down list, or perhaps better, use a short-hand form such as <aj>, to get all adjectives.

NB! The BNC recource is under development.


Corpus of British Fiction (Free eBooks)

This is the free, open access part of the CBF. It contains texts harvested from the web, most notably from Project Gutenberg and Fadedpage.

Very simple search guide

The list below contains examples of search strings. '*' and '+' are wildcards and denote zero or more strings ('*') or one or more strings ('+'). The vertical bar '|' separates alternatives, so that (would|could) means search for either 'would' or 'could'. Note also the use of the brackets around the alternatives.

Click on keyword to see context. Filter (thin) your search by chosing Decade and/or Sex. If you get more than 1,000 hits, click Sort or Sort to file to extract all hits.

When you choose Sort to file, a so-called tab-separated file (.tsv) is created. This can be downloaded and imported into e.g. Excel or Word. To import into Excel, use the Data/From Text import tool. To get the file into Word, open it in e.g. Notpad or some other text editor, copy and paste it into Word, select the whole text (Select all) and select Insert/Table/Convert Text to Table. Note that Word is not very good at handling huge files.

Every time you choose Sort to file, a new file with the same name is created, overwriting any existing file with that name. Files created with this function, Sort to file, are deleted automatically every night.