There are 6763 characters in the GB2312 character set. Of those, 2305 (34%) are simplified, leaving 4458 (66%) traditional (not simplifiable) characters. Visually:
The Big5 character set is split into two parts: frequently-used (5401 characters) and less-frequently-used (7652 characters). Of the 5401 frequently-used characters, 1780 (33%) are simplifiable, leaving 3621 (67%) that are not simplifiable. Visually:
Here's a comparison of GB2312 and common Big5:
What is the nature of the three groups of characters?
For more discussion, see John Pasden's blog entry: A Character-Counting Challenge.
In the GB2312 set, the top five most common stroke counts are 10, 11, 9, 8, and 12, accounting for 51% of all characters.
In the common Big5 set, the top six most common stroke counts are 11, 12, 10, 13, 15, and 14, accounting for 49% of all characters.
(Note: Considering GB2312.)
Of the 214 radicals, one is unused by GB2312: 鬥. This radical simplifies to another radical: 斗.
The three most popular radicals each account for about 5% of the characters: 水 (364 characters), 艸 (338 characters), and 口 (332 characters). This is 1034 characters, which is about 15% of all the characters.
The top 16 radicals (7%) account for 50% of the characters. The remaining 198 radicals (93%) account for the other 50% of the characters. The top 16 radicals are:
水 艸 口 木 手 人 金 心 糸 虫 言 肉 土 女 火 竹
The top 86 radicals (40%) account for 90% of the characters. The remaining 128 radicals (60%) account for the other 10%. The bottom 100 radicals account for only 6% of the characters.
Comparison of top 20 radicals from GB2312 and Big5:
水 艸 口 木 手 人 金 心 糸 虫 言 肉 土 女 火 竹 辵 石 疒 足
水 口 手 木 人 艸 心 言 金 糸 女 肉 土 虫 辵 火 竹 日 玉 疒
(Note: considering Mandarin and GB2312.)
Ignoring tones, there are 407 different syllables. Including tones, there are 1327 syllables. Note that if all 407 syllables covered four tones, the total would be 1628.
There are 185 syllables (45%) that cover all four tones, leaving 222 syllables (55%) that don't.
Tone coverage
There are 335 syllables (82%) that have a first tone; there are 72 (18%) that don't.
There are 265 syllables (65%) that have a second tone; there are 142 (35%) that don't.
Of those that do, 47 (18%) have no third tone.
There are 327 syllables (80%) that have a third tone; there are 80 (20%) that don't.
Of those that do, 109 (33%) have no second tone.
There are 365 (90%) syllables that have a fourth tone; there are 42 (10%) that don't.
One tone only
First tone only (5): den diu hei keng seng
Second tone only (6): fo neng nin shei teng zei
Third tone only (5): dei dia gei lia ruan
Fourth tone only (12): ce kuo miu nen nou nüe nun ri run se te zhei
Neutral tone only (2): lo me
(Note: considering Mandarin and GB2312.)
For those intimidated by the 'r' initial: only 140 characters (2%) use it.
(Most-used refers to usage within the set of GB2312 characters, not what is actually spoken or written.)
Copyright © 2007-2014 by Jens-Ingo Farley