Chinese characters,radicals,pinyin,written Chinese,Chinese,Cantonese,Mandarin,Chinese language,learning Chinese,Hanzi,Hanzi Analyzer,Hànzì Analyzer

Hànzì Analyzer


Simplified vs. Traditional

There are 6763 characters in the GB2312 character set. Of those, 2305 (34%) are simplified, leaving 4458 (66%) traditional (not simplifiable) characters. Visually:


The Big5 character set is split into two parts: frequently-used (5401 characters) and less-frequently-used (7652 characters). Of the 5401 frequently-used characters, 1780 (33%) are simplifiable, leaving 3621 (67%) that are not simplifiable. Visually:


Here's a comparison of GB2312 and common Big5:

What is the nature of the three groups of characters?


For more discussion, see John Pasden's blog entry: A Character-Counting Challenge

Stroke counts


(Note: Considering GB2312.)


Of the 214 radicals, one is unused by GB2312: 鬥. This radical simplifies to another radical: 斗.


The three most popular radicals each account for about 5% of the characters: 水 (364 characters), 艸 (338 characters), and 口 (332 characters). This is 1034 characters, which is about 15% of all the characters.


The top 16 radicals (7%) account for 50% of the characters. The remaining 198 radicals (93%) account for the other 50% of the characters. The top 16 radicals are:

    水 艸 口 木 手 人 金 心 糸 虫 言 肉 土 女 火 竹


The top 86 radicals (40%) account for 90% of the characters. The remaining 128 radicals (60%) account for the other 10%. The bottom 100 radicals account for only 6% of the characters.


Comparison of top 20 radicals from GB2312 and Big5:

Syllables and Tones

(Note: considering Mandarin and GB2312.)


Ignoring tones, there are 407 different syllables. Including tones, there are 1327 syllables. Note that if all 407 syllables covered four tones, the total would be 1628.


There are 185 syllables (45%) that cover all four tones, leaving 222 syllables (55%) that don't.


Tone coverage


One tone only

Syllables and Pronunciation

(Note: considering Mandarin and GB2312.)



For those intimidated by the 'r' initial: only 140 characters (2%) use it.


(Most-used refers to usage within the set of GB2312 characters, not what is actually spoken or written.)




Copyright © 2007-2014 by Jens-Ingo Farley