Websites for words

It’s easy to see that the way we use language changes over time. Dictionaries are one way to measure this, particularly over the long term. Compare a dictionary from a century ago to a modern version and you can see changes in meanings of words as well as changes in the words that are used and the ones that fall out of circulation.

But how about over a shorter time scale? And what about spoken words rather than written language? As it turns out, there are some resources for that too. The Spoken British National Corpus is a collection of data about spoken English, at least in Britain. There are American versions too, such as the Corpus of Contemporary American English. Since I found the British one first, I’ll focus on that one to start with.

You can find out some interesting things about how language use changed between, say, the mid-1990s and about 2015 in England. As you’ve probably noticed, “awesome” is used a great deal more nowadays than it used to be. That’s the case in both the US and England, but in the British case “awesome” seems to have replaced the word “marvelous.” “Marvelous” was never as common in American English as it seems to have been in the UK, by the way, possibly because from their perspective we misspell it every time (in the UK it’s “marvellous”).

You can also track things like “the words that have most declined in usage since 1995.” In England they include some words that were never common here in the first place: “ta-ra”, “mucking”, “matey”, “cobbler”, and “draught”. “Boxer” and “playschool” are also US words, but I suspect that here “playschool” is more likely to be a brand name (Playskool). Another word that’s declined sharply is “cassette.” That illustrates something interesting about the corresponding list of words that have increased in usage — most of them have to do with technology. They include “website”, “laptop”, “texted”, “email”, “internet”, and some more brand names: Google, YouTube, iPhone, iPad, and Facebook.

You can also find some facts that would be quite valuable to have at your fingertips, such as “what are the two the most common adverbs used in movie titles” (“up” and “out” — this is from the US corpus) as well as “do we talk more about education than we used to?” (yes, a lot more, and the use of the word “university” has tripled in the past two decades — this is from the UK project).

It’s not entirely clear how “spoken” English is recorded in these projects — they ingest data from a vast array of publications from magazines to books to scripts, but there seems to be some system of recording things people actually say. I haven’t found the details about how they do that, though. Probably has something to do with the NSA.

In any case, if you want to know whether “smart” is more likely — in fiction — to be used to mean “hurts” as in “ouch, that smarts” or to mean intelligent as in “she’s very smart,” these projects are the best places to go. “Smart” for hurts versus “smart” for bright, by the way, is a tie.

Pylimitics

About Me

Recent Posts