Parts of Speech Distribution in the BNC-COCA Word Lists
Keywords:BNC-COCA word lists, parts of speech distribution, vocabulary size tests, content validity
The main strength of the BNC-COCA word frequency list is also its major weakness. The frequency-based organisation of the list is a strength as it allows a systematic and unbiased selection of target words for a vocabulary size test. Using frequency as the sole criterion for target word selection, however, is a weakness because lexicons are much more heterogeneous with a variety of factors affecting difficulty of words. The present paper is an attempt to augment the lists with parts of speech information. The words in the first fourteen baseword lists were tagged for parts of speech and counted. The results revealed 58% of the words in the list to be nouns, 21% verbs, 18% adjectives and only 3% function words. 1K level had a different distribution from other levels due to an uncharacteristically high proportion of function words (19%). It was also found that the relative distribution of the content word categories varied with frequency level. As such, the data did not support the use of a fixed ratio in size tests for all frequency levels. Item numbers for individual frequency levels were proposed for a 140-item vocabulary size test on the basis of the variable ratios obtained in the present data.
Copyright (c) 2022 Meral Öztürk
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.