Pylimitics

"Simplicity" rearranged


Karen Spärk Jones

Have you ever wondered how search engines work? Beyond just “make a list of all the words on all the websites,” I mean. The technical term for that “list of all the words on all the websites” is the “bag-of-words model.” No, really, that’s what it’s called. And the bag-of-words model doesn’t work very well so searching documents because the only value it preserves is the frequency of word occurrence. A search engine based on that model would be excellent for retrieving Edgar Allen Poe’s poem The Bells (which repeats the word “bell” many times), but not so hot at finding Silvia Plath’s novel The Bell Jar, which hardly mentions bells or jars at all. 

A better approach to searching is if you could preserve how important a word is to a particular document. One way to do that is to measure how specific the word is to that particular document. You can measure that statistically using a technique called inverse document frequency. And the technical term for this approach is the “Lego-kit-of-words model.” No, I’m just kidding, it’s actually term frequency-inverse document frequency, or tf-idf. This one was invented in 1972 by Karen Spärk Jones, who was born August 26, 1935 in Yorkshire, England. Her surname is not “Jones,” by the way, it’s “Spärk Jones.”

Spärk Jones attended Girton College, Cambridge, and studied history, not math or computer science (computer science was brand-new in the 1950s and Girton College probably didn’t offer it as a major). But when she was an undergraduate, Spärk Jones joined the Cambridge Language Research Unit (CLRU), an independent organization. Margaret Masterman was the head of the CLRU and convinced Spärk Jones to study computer science, which she did. 

She earned a PhD in computer science, but her thesis was criticized as “uninspired and lacking original thought.” That opinion evidently wasn’t shared by everyone, though, because it was later published as a book. Spärk Jones worked on a series of short-term contracts in computer science before joining Cambridge University Computer Laboratory in 1974, where she stayed for the rest of her career. 

He main research interest was natural language processing, and in addition to inventing the basis for search engines, she worked on speech recognition systems as early as the 1980s. She also championed women’s involvement in computer science, saying “Computing is too important to be left to men.” 

Spärk Jones was elected as a Fellow of the British Academy and of several Artificial Intelligence associations (in the second and third waves of “artificial intelligence”). She also won the Lovelace Medal from the British Computing Society (BCS), the Gerard Salton Award and the Allen Newell Award from the Association for Computing Machinery (ACM), as well as the ACM’s Women’s Group Athena award. There’s a Karen Spärk Jones Award and lecture given annually by the BCS for achievement in natural language processing, and in 2017 the University of Huddersfield (in Yorkshire) renamed their School of Computing building in her honor. She didn’t live so see that, though; she died of cancer in 2007. 

But think of it; the latest wave of AI is based on statistical language processing, and here we’ve been processing language with statistics the whole time, thanks to Karen Spärk Jones.



About Me

I’m Pete Harbeson, a writer located near Boston, Massachusetts. In addition to writing my own content, I’ve learned to translate for my loquacious and opinionated puppy Chocolate. I shouldn’t be surprised, but she mostly speaks in doggerel.