Quick Links

Our Partners

Accommodation

All the selected participants will be provided boarding & lodging in the institute premise on nominal charges (as mentioned in the table). No Travelling allowance will be paid by the Academy.

Basic Data Mining Algorithms and their Scalability for Big Data

Last date of Online Registration is extended upto August 10, 2016

August 16-21, 2016

Schedule

Download Brochure

Overview: The main purpose of this course is to introduce the basic ideas of data mining algorithms and also of their scalability for Big Data situations. Relatively simple classifier and clustering algorithms are presented and analyzed in detail. Basic ideas of Map-Reduce algorithms for asynchronous computing in the cloud/Hadoop environments are introduced. How the simple data mining algorithms need to be redesigned for the Map-Reduce environment is presented and analyzed.

Objectives: The objectives of this course are to impart knowledge and understanding of the following topics to the participants:

Details of the decision tree induction algorithms and various issues related to their outcomes and performance.
Details of association rule mining algorithms and various issues related to their outcomes and performance.
Sequential and partitional clustering algorithms.
Basics of the Map-Reduce paradigm for designing algorithms, and will apply this paradigm to redesign the decision tree and association rule induction algorithms.

Every session will be followed by lab and practice assignments. (Matlab and R programming)

Topic List:

Introduction to Data Mining (Total 4 hours)

Lecture (2 hours)

Applications and Need for data mining algorithms
Scalability issues for data mining tasks
Relationship to other fields such as statistics and machine learning
Different types of data: Relations, Graphs, Sequences, and Text
Types and nature of patterns and knowledge to be discovered in data

Lab and Practice (2 hours)

Practice with representation and processing of data in MATLAB

Classification Algorithms: Decision Trees (6 hours)

Lecture (3 hours)

What is a Decision Tree: How does it work
Algorithms for inducing decision trees from data
Characteristics of decision tree induction algorithms
Overfitting and underfitting
Evaluating the performance of a decision tree
Applications and Real life cases
Learning of tree ensembles
Induction of Decision Trees for Big Data: Issues of performance and algorithms

Lab and Practice (3 hours)

Build decision trees from test datasets using MATLAB functions

Association Analysis (6 hours)

Lecture (3 hours)

What are association rules
Apriori principle for frequent itemset generation
Association Rule Generation
Support, confidence, lift etc. metrics

Lab and Practice (3 hours)

MATLAB functions to generate association rules: test and practice

Basic Clustering Algorithms (5 hours)

Lecture (3 hours)

Why clustering?
Sequential Clustering Algorithms
Partitional Clustering Algorithms: K-means, bisecting k-means
Evaluating performance of clustering algorithms

Exercise and Practices (3 hours)

Exercises with clustering algorithms

Scalability of Algorithms for Big Data (8 hours)

Lecture (4 hours)

Types of Hardware for scaling: Scaling Up vs. Scaling Out
Hadoop Architecture and map-Reduce Algorithms
Foundational ideas of Map-Reduce Algorithms
Simple statistical Functions using MapReduce Formulations

Lab and Practice (4 hours)

Exercises in designing algorithms using MapReduce Paradigm

Design of Clustering Algorithms for Hadoop (5 hours)

Lecture (2.5 hours)

K-means algorithm using Map-Reduce
Other clustering algorithms using Map-Reduce

Lab practice (2.5 hours)

Exercises using MapReduce for clustering

6 hours for evaluation and presentation by participants.

Resource Person:
Prof. Raj Bhatnagar Detailed CV

Raj K Bhatnagar
Professor of Computer Science
Department of Electrical Engineering and Computing Systems
University of Cincinnati, Cincinnati, OH 45221, USA
Raj.Bhatnagar@uc.edu
+1 (513) 556-4932

Prof. Raj Bhatnagar is Professor of Computer Science at University of Cincinnati, Ohio, USA. His area of research is data mining and pattern recognition and he has worked on problems in this research area for more than twenty five years. His research projects have been funded by NSF, US Air Force, US DARPA, and a number of Industrial sponsors. He has supervised graduate students for eleven Ph.D. dissertations and seventy M.S. theses. His recent research projects include design of mining and analysis algorithms for Big Data situations in Biomedical, Manufacturing, GIS, and Security applications. These problems have involved various types of structured and unstructured data. He has published more than eighty peer-reviewed publications. He has designed and taught graduate level classes on the topics of Data Mining, Big Data Analysis, and Artificial Intelligence. He recently published three papers in the IEEE International conference on Big Data (Oct 2015) and delivered a 3.5 hours tutorial on Design of Analytics Algorithms for Big Data at the Big Data Analytics 2015 (BDA2015) conference held in Hyderabad in December 2015.

Course Coordinators:
Dr. Pritee Khanna
Associate Professor, Computer Science and Engineering
PDPM IIITDM Jabalpur
email: pkhanna@iiitdmj.ac.in, priteekh@gmail.com
phone: +91761 2794222 (O), +919425324241 (M)

Dr. Sraban Kumar Mohanty
Assistant Professor, Computer Science and Engineering
PDPM IIITDM Jabalpur
email: sraban@iiitdmj.ac.in, sraban@gmail.com
phone: +91761 2794224 (O), +919425807609 (M)

Contact us:
For course related queries kindly write to:
The Course Coordinators
Data Mining Algorithms and their Scalability for Big Data

datamining@iiitdmj.ac.in