Basic Data Mining Algorithms and their Scalability for Big Data
August 16-21, 2016
Overview: The main purpose of this course is to introduce the basic ideas of data mining algorithms and also of their scalability for Big Data situations. Relatively simple classifier and clustering algorithms are presented and analyzed in detail. Basic ideas of Map-Reduce algorithms for asynchronous computing in the cloud/Hadoop environments are introduced. How the simple data mining algorithms need to be redesigned for the Map-Reduce environment is presented and analyzed.
Objectives: The objectives of this course are to impart knowledge and understanding of the following topics to the participants:
- Details of the decision tree induction algorithms and various issues related to their outcomes and performance.
- Details of association rule mining algorithms and various issues related to their outcomes and performance.
- Sequential and partitional clustering algorithms.
- Basics of the Map-Reduce paradigm for designing algorithms, and will apply this paradigm to redesign the decision tree and association rule induction algorithms.
Every session will be followed by lab and practice assignments. (Matlab and R programming)
Topic List:
- Introduction to Data Mining (Total 4 hours)
- Lecture (2 hours)
- Applications and Need for data mining algorithms
- Scalability issues for data mining tasks
- Relationship to other fields such as statistics and machine learning
- Different types of data: Relations, Graphs, Sequences, and Text
- Types and nature of patterns and knowledge to be discovered in data
- Lab and Practice (2 hours)
- Practice with representation and processing of data in MATLAB
- Classification Algorithms: Decision Trees (6 hours)
- Lecture (3 hours)
- What is a Decision Tree: How does it work
- Algorithms for inducing decision trees from data
- Characteristics of decision tree induction algorithms
- Overfitting and underfitting
- Evaluating the performance of a decision tree
- Applications and Real life cases
- Learning of tree ensembles
- Induction of Decision Trees for Big Data: Issues of performance and algorithms
- Lab and Practice (3 hours)
- Build decision trees from test datasets using MATLAB functions
- Association Analysis (6 hours)
- Lecture (3 hours)
- What are association rules
- Apriori principle for frequent itemset generation
- Association Rule Generation
- Support, confidence, lift etc. metrics
- Lab and Practice (3 hours)
- MATLAB functions to generate association rules: test and practice
- Basic Clustering Algorithms (5 hours)
- Lecture (3 hours)
- Why clustering?
- Sequential Clustering Algorithms
- Partitional Clustering Algorithms: K-means, bisecting k-means
- Evaluating performance of clustering algorithms
- Exercise and Practices (3 hours)
- Exercises with clustering algorithms
- Scalability of Algorithms for Big Data (8 hours)
- Lecture (4 hours)
- Types of Hardware for scaling: Scaling Up vs. Scaling Out
- Hadoop Architecture and map-Reduce Algorithms
- Foundational ideas of Map-Reduce Algorithms
- Simple statistical Functions using MapReduce Formulations
- Lab and Practice (4 hours)
- Exercises in designing algorithms using MapReduce Paradigm
- Design of Clustering Algorithms for Hadoop (5 hours)
- Lecture (2.5 hours)
- K-means algorithm using Map-Reduce
- Other clustering algorithms using Map-Reduce
- Lab practice (2.5 hours)
- Exercises using MapReduce for clustering
6 hours for evaluation and presentation by participants.
Resource Person:
Prof. Raj Bhatnagar Detailed CV
Raj K Bhatnagar Professor of Computer Science Department of Electrical Engineering and Computing Systems University of Cincinnati, Cincinnati, OH 45221, USA Raj.Bhatnagar@uc.edu +1 (513) 556-4932 |
Course Coordinators:
Dr. Pritee Khanna
Associate Professor, Computer Science and Engineering
PDPM IIITDM Jabalpur
email: pkhanna@iiitdmj.ac.in, priteekh@gmail.com
phone: +91761 2794222 (O), +919425324241 (M)
Dr. Sraban Kumar Mohanty
Assistant Professor, Computer Science and Engineering
PDPM IIITDM Jabalpur
email: sraban@iiitdmj.ac.in, sraban@gmail.com
phone: +91761 2794224 (O), +919425807609 (M)
Contact us:
For course related queries kindly write to:
The Course Coordinators
Data Mining Algorithms and their Scalability for Big Data