2012-02-01

Can we measure the complexly of natural language by an entropy based compression method?(1)


Many of my friends came from other countries.  We often talk about our own mother tongues. The discussion goes to which language is difficult or what kind of unique property each language has. German has a complex grammar system, Japanese has complex characters and unique counting system, and English has a huge vocabulary. I wonder ``What is the complexity of natural languages?'' and ``Can we measure them?''

Together with my friends I translated one Japanese text to English and German. Then we apply an entropy based compression method on them to see how much information each translated text has. This might tell which language is complex in a sense of entropy. Namely, I try to measure that ``If the contents are the same, how much information entropy differs depends on a language?''

I will write a few articles regarding with this topic.

No comments: