S, Kumari and Jayaram, Balasubramaniam
(2017)
Measuring Concentration of Distances - An Effective and Efficient Empirical Index.
IEEE Transactions on Knowledge and Data Engineering, 29 (2).
pp. 373-386.
ISSN 1041-4347
Preview |
|
Text
IEEE Transactions on Knowledge and Data Engineering_29_2_373-386_2017.pdf
- Accepted Version
Download (1MB)
| Preview
|
Abstract
High dimensional data analysis gives rise to many challenges. One such that has come to gain a lot of attention recently is the concentration of distances (CoD) phenomenon, which is the inability of distance functions to distinguish points well in high dimensions. CoD affects almost every machine learning and data analysis algorithm in high dimensions. In this work, we present a novel efficient and effective empirical index that not only illustrates whether a distance function tends to concentrate for a given data set, but also enables us to measure the rate of concentration and allows us to compare different distance functions vis-á-vis their rate of concentration. As opposed to existing empirical indices, the proposed empirical measure uses only the internal characteristics of a given data set and hence is applicable on real data sets, which was hitherto not possible.
Actions (login required)
|
View Item |