1.3 KMeans fails to learn

4 years ago
7

https://data-information-meaning.blogspot.com/2020/12/13-kmeans-as-means-to-describe-failed.html

KMeans is finding the representation that minimizes the global distortion, i.e. if I can only communicate a single point, what is the message (the single point) that best represents the data? The average (mean) of all the data. Why, since the distortion of the original data is minimized, the distance from the average data point to the other data points is the smallest.
This is true since the data has a Gaussian distribution, so the center is more populated than the edges.
...
Here can be clearly seen that KMeans chooses to communicate the average message, which dilutes the uniqueness of each independent element.
Global maximization is a failed strategy, it is not scale sensitive, since it treats the data as if there is a global single scale.

Another way to say this, I am trying to learn a particular distinction but constrained not to forget the general idea. Or, KMeans attempts to memorize the data, if it can't memorize it in its entirety it will minimize the loss of memorization! not the loss of learning the data.

Memorization is a failed strategy

Loading comments...