English has a problem with vowels; there aren’t enough of them. That’s one reason why English spelling is so ridiculous; you’ve got to cope with “long” vowel sounds like the “a” in “fame” or the “i” in “ice”, as well as “short” vowel sounds like the “a” in “father” or the “i” in “trick”. But it gets worse; while each vowel’s “long” sound is consistent, the “short” sounds vary; the “a” in “father” and the “a” in “mad” have different sounds even though they’re both called “short” version of “a”. But wait, it gets even worse that that. Calling English vowel sounds “short” and “long” makes no sense at all; English doesn’t count the duration of a vowel sound as meaningful, even though there are languages that do.
There is a way to accurately describe the various vowel sounds in English, though. It comes from phonetics, which has to do with how the sounds are produced. Consonants differ from vowels because to make the sound you put an obstruction in the flow of air — to make a “p” sound, for example, you obstruct the air with your lips, while for “t” you use your tongue. Vowel sounds aren’t obstructed at all; the difference between them is the way you control the size and shape of the space the air flows through. To make a “long e” sound, your jaw is more closed, while for a “long o” your jaw is much more open. Another difference is how you position your tongue; sometimes it’s higher (“phone” versus “fawn”) and sometimes it’s further back in your mouth (“bet” versus “boat”). When linguists want to describe language sounds phonetically, they classify them in terms of “high-low” (jaw openness) and “front-back” (tongue position).
If we used “high-mid-low” and “front-center-back” to describe English vowel sounds it would, for one thing, be a lot more useful than the vague and inaccurate “long-short” terms we use now. That means that instead of simply having to memorize aspects of English — which is really the only way you can learn it — there would be a way to actually describe what’s going on.
Of course, what we REALLY need is either diacritical marks to show how a vowel sound is supposed to be pronounced (these are used in many other languages that use the same alphabet) or more vowels in the alphabet. While the alphabet is Yet Another Thing you just have to memorize (it’s in that order because of that song, right?), at least we could eliminate at least some of the current overlap among different letters that sometimes have the same sounds, as well as the inconsistency among letters and their sounds. I mean, come on, the letter “y” can’t even be consistently classified as a consonant or a vowel.
By the way, just to make sure there’s inconsistency everywhere, it’s not entirely true that English doesn’t account for the duration of vowel sounds. It only happens sometimes (naturally), but, mostly without even consciously noticing, we do alter the duration of vowel sounds to distinguish between some words. The vowel sound in “ice” is shorter than the sound in “eyes”, for example. Same thing with “feet” versus “feed”, and “cap” versus “cab”. English relies on these differences to distinguish between words that also have other details distinguishing them (this is the kind of trivia you might pick up when you’re involved with a computer speech recognition project in the 1990s, by the way) but in some languages the vowel duration alone can change the meaning.
While the vagueness and inconsistency of English makes it a particularly difficult environment for artificial speech recognition (it was particularly hard back in an era when a *really fast* computer operated at a blistering 33 MHz), it can also be the source of some amusement. What did you just say? Was it “recognize speech” or “wreck a nice beach?”