Wednesday, August 24, 2011
02:40 PM - 03:30 PM
|Level: ||Technical - Intermediate|
With the advent of social media, a new venue has been made available to express one's thoughts and feelings about the world around us. Corporations are increasingly moving to mine this form of expression to learn more about what their customers are saying about their products and their company. The problem presents unique challenges as unstructured content needs analytic techniques that involve taking words and associating quantitative metrics with them to interpret the sentiment embodies in them - suggesting the use of Not Only SQL methods.
This presentation covers a case study involving the analysis of blog posts using both MapReduce and the Python Natural Language Toolkit, in combination with SQL analytics in an EMC Greenplum Database, using sparse vectors and k-means clustering algorithms to identify centroids of sentiments in the posts. It is meant to be an introduction to some of the general techniques that can be used to tackle such problems. The attendee can expect to hear how MapReduce and SQL can be used to:
- Access text in blog posts
- Parse the text into word lists
- Create histograms of word frequencies
- Transform blog terms into statistically meaningful measures
- Derive blog clusters around iteratively defined clusters
S. Kartik, Ph.D. is the Global Field CTO for the Data Computing Division at EMC, and an EMC Distinguished Engineer. He has been active in Information Technology for the past 15 years, having worked extensively in both academic circles and industry, designing, deploying and managing large infrastructures for Fortune 1000 customers. His experience in EMC has spanned Advanced Business Continuity architecture, enterprise solutions and emerging technologies. His current interests include BI and Analytics, Cloud Computing, Enterprise Virtualization and High Performance Computing.
He holds a Masters and Doctorate in Physics from Indiana University , with over 75 publications to his name. He recieved his M.S in Physics (1984) in the 5-year program at the Indian Institute of Technology in Bombay.