I recently started work on a program to write pinyin nicely to Chinese text. In order to test whether it corrected add tone marks to all possible pinyin, I worked through a table of pinyin and was surprised by some of the valid pinyin, which I've not come across before.

Pinyin ending in -en

Initials: d, t, n, l

There are many words ending in -en, but there are gaps in my table corresponding to ten and len. There was an entry for den, although there was no such word on MDBG. According to cojak.org, 扽 (to move, shake) can be pronounced dèn, but dùn is the normal reading. Similarly, 参 (to participate) can be pronounced dēn (also sān, shēn, cēn or sǎn), but is normally cān, which is how I learnt it. There was also an entry for nen, for which there is a single word on MDBG: 嫩 (nèn; tender or inexperienced).

Initials: z, c, s

Other than the very common word, 怎 (zěn; how), there is only one other zen word on MDBG: 谮 (zèn; to slander). As I mentioned above, 参 can be pronounced cēn according to cojak.org, but MBDG lists just three words: 岑 (cén; small hill), 涔 (cén; overflow, rainwater) and 嵾 (cēn; uneven). Finally, I had known one sen word: 森 (sēn; forest), and it seems there is only one other word: 椮 (sēn; lush growth).

Pinyin ending in -ei

There is a single character pronounced ei: 诶, which means "hey" and can have any tone. According to MDBG, the meaning are: ēi - to call someone; éi - to express surprise; ěi - to express disagreement; and èi - to express agreement. It is debatable to what extent these are really words.

Initials: g, k, h

When I was first learning Chinese, I noticed, that while 给 (gěi; to give) is very common, it is the only Chinese word pronounced gei. There is also only one word pronounced kei: 克 (kēi; to scold, beat; more commonly pronounced and meaning gram or to restrain). There are two words pronounced hei: 黑 (hēi; black) and 嘿 (hēi; hey), which I suspect is more modern.

Initials: zh, ch, sh

Like 给, both 这 (zheì or zhè; this, here) and 谁 (shéi; who) are very common words with unique pronunciations. There is no word pronounced chei.

Initials: z, c, s

There are no words pronounced cei or sei, but there is one pronounced zei: 贼 (zéi; thief, deceitful).

Initials: d, t, n, l

Again, like 给, 得 (děi; must, ought to), is common and unique. Like 这, 哪 (nǎ; which) has an alternative ei-pronunciation: něi. There are two other nei words, both quite common: 内 (nèi; inside) and 馁 (něi; hungry). There are many words pronounced lei but none pronounced tei.


This is by no means a definitive look at pinyin frequencies, I know I have missed several rare sounds [EDIT: such as miù (谬, meaning to deceive)]. At some point I'd like to get a full set of counts for all the different sounds for all the words in the MDBG dictionary. I suspect that words with the initials b, p, m or f are most frequently, whilst words starting with d, t, n or l are probably the least frequent.

I don't have any explanation for the distribution of sounds. In several cases the rare pinyin are associated with a common word. I wondered if this was to reduced the chance of confusion by making the most common words, the most different. However, if this were the case, then you would expect the most common verb, 是 to have a rare pronunciation, rather than shì, which I think is the most common.

I wonder if there is some connection with the fact that in most languages, irregular verbs are most likely to be common verbs (to be, to go, to have etc.); verbs used less often have simple rules for past-tense etc. because people are less likely to remember irregularities if they rarely come across them. Maybe there used to be a different distribution of sounds in Chinese, but they have shifted over time, leaving only common words with the more unusal sounds. But that's only a vague hypothesis and I have no evidence for it.

