Metric Learning for Clustering in Streaming Large-Scale Data

Jain, P (2015) Metric Learning for Clustering in Streaming Large-Scale Data. Masters thesis, Indian Institute of Technology Hyderabad.

Preview

Text
CS13M1008.pdf - Submitted Version
Download (1MB) | Preview

Abstract

Given enormous amount of data produced each day it would be immensely useful if we could use it to learn hidden patterns in the data without the need for explicit labels. Clustering is one of the most popular approaches to label-less or unsupervised learning where the goal is to group together data points (for example, images, objects, web articles etc) into meaningful sub-classes called clusters. Although clustering is a well studied problem in machine learning but being unguided in nature, it may result in uninteresting patterns or trends. In general clustering is considered to be an ill-posed problem and any type of user input will help in guiding clustering towards a useful solution. For specific problems supervised learning is a conventional alternative, but in the real world it is costly to manually label the data and a supervised approach is no longer an option.

[error in script]

IITH Creators: