Algorithms for understanding data on a massive scale

 

 

Computers can search data, but can they understand it and make discoveries of their own?

Summary

During the last decade we have witnessed a dramatic increase in the amount of generated data. The web, network traffic, scientific, health, financial and marketing data sets are a few examples of the massive and heterogeneous data being produced in unprecedented volume. The data often contains information of great value. For example, computers could cross-examine genetic and clinical health data in order to design cures for diseases. However, extracting knowledge from terabytes of raw data requires extremely efficient algorithms, posing a great challenge to algorithm designers.

Rationale

Answering that challenge requires: (1) Developing computational models that capture the essential aspects of computing over massive data sets, yet are expressive enough to support solutions to important problems and (2) creating novel algorithmic techniques capable of analyzing and extracting value from such immense inputs.

Theoretical computer science has begun to address these issues on both fronts. Streaming (computing using a single pass over the data), sketching (extracting succinct data representations sufficient for solving a problem) and sub-linear time/space computing are examples of models that led to the development of a vast array of efficient algorithms. The framework of computational learning theory, especially the notion of amplifying classification accuracy via boosting, yielded efficient and effective algorithms for a variety of machine learning problems. The notions of approximate and randomized algorithms were introduced to handle problems too difficult to be solved exactly and deterministically. Techniques such as random sampling, random projections, Fourier sampling and locality-sensitive hashing were developed and utilized for a variety of problems of practical importance.

Many of the aforementioned developments are based on novel mathematical interpretations of data. Insights from geometry, spectral analysis and topology have been a source of inspiration for designing efficient algorithms. It is likely that such interactions hold the potential for even greater impact in the future.

Contributors and Credits

Petros Drineas, Piotr Indyk, Sampath Kannan, James Lee, Luis Rademacher, Madhu Sudan

 

 


Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: