Skip to main content

Can we measure the complexly of natural language by an entropy based compression method?(1)


Many of my friends came from other countries.  We often talk about our own mother tongues. The discussion goes to which language is difficult or what kind of unique property each language has. German has a complex grammar system, Japanese has complex characters and unique counting system, and English has a huge vocabulary. I wonder ``What is the complexity of natural languages?'' and ``Can we measure them?''

Together with my friends I translated one Japanese text to English and German. Then we apply an entropy based compression method on them to see how much information each translated text has. This might tell which language is complex in a sense of entropy. Namely, I try to measure that ``If the contents are the same, how much information entropy differs depends on a language?''

I will write a few articles regarding with this topic.

Comments

Popular posts from this blog

Why A^{T}A is invertible? (2) Linear Algebra

Why A^{T}A has the inverse Let me explain why A^{T}A has the inverse, if the columns of A are independent. First, if a matrix is n by n, and all the columns are independent, then this is a square full rank matrix. Therefore, there is the inverse. So, the problem is when A is a m by n, rectangle matrix.  Strang's explanation is based on null space. Null space and column space are the fundamental of the linear algebra. This explanation is simple and clear. However, when I was a University student, I did not recall the explanation of the null space in my linear algebra class. Maybe I was careless. I regret that... Explanation based on null space This explanation is based on Strang's book. Column space and null space are the main characters. Let's start with this explanation. Assume  x  where x is in the null space of A .  The matrices ( A^{T} A ) and A share the null space as the following: This means, if x is in the null space of A , x is also in the n...

Gauss's quote for positive, negative, and imaginary number

Recently I watched the following great videos about imaginary numbers by Welch Labs. https://youtu.be/T647CGsuOVU?list=PLiaHhY2iBX9g6KIvZ_703G3KJXapKkNaF I like this article about naming of math by Kalid Azad. https://betterexplained.com/articles/learning-tip-idea-name/ Both articles mentioned about Gauss, who suggested to use other names of positive, negative, and imaginary numbers. Gauss wrote these names are wrong and that is one of the reason people didn't get why negative times negative is positive, or, pure positive imaginary times pure positive imaginary is negative real number. I made a few videos about explaining why -1 * -1 = +1, too. Explanation: why -1 * -1 = +1 by pattern https://youtu.be/uD7JRdAzKP8 Explanation: why -1 * -1 = +1 by climbing a mountain https://youtu.be/uD7JRdAzKP8 But actually Gauss's insight is much powerful. The original is in the Gauß, Werke, Bd. 2, S. 178 . Hätte man +1, -1, √-1) nicht positiv, negative, imaginäre (oder gar um...

No virtual machine on Oracle virtual box and Avira

December 2015, I suddenly cannot run Oracle VM Virtual Box (5.0.10) on Windows 7, my desktop machine. It failed to create a virtual machine, the error message is the following. VirtualBox - Error In supR3HardNtChildWaitFor --------------------------- Timed out after 60001 ms waiting for child request #1 (CloseEvents). (rc=258) where: supR3HardNtChildWaitFor what: 5 Unknown Status 258 (0x102) (258) - Unknown Status 258 (0x102) I relatively less use the virtual machine on this desktop machine. But when I would like to use Linux, then I need to reboot the machine. This is inconvenient. I have another windows 7 notebook, but I don't have this problem. Today I found the solution. https://avira.ideascale.com/a/dtd/Avira-sollte-das-Ausf%C3%BChren-von-VMs-in-Virtualbox-nicht-blocken/160234-26744#idea-tab-comments The combination of Avira's process protection and Virtual Box cause this problem. Avira announced the real solution will be provided at the release of 9th of Feb...