Mech DAMP Blog

ME781 - Engineering Data Mining

Instructor

Asim Tewari

Semester

Autumn ‘20

Course Difficulty

Moderately challenging for someone who hasn’t done a basic ML course.
For those who have some background in data science or ML (even online), it should be a walk in the park.

Time Commitment Required

About an hour a week apart from lectures

Grading Policy and Statistics

Quite lenient, with a vast majority of the class getting >=8.
3 AP, 43 AA, 50 AB, 44 BB in a class of 189.

Attendance Policy

None

Pre-requisites

Knowledge of Python is a must. Basic probability, vector calculus and linear algebra are also informal prerequisites.

Evaluation Scheme

There was one short quiz a week before midsems, four assignments (of which three were coding based and one was theoretical) and one take-home coding based endsem. There was also a very detailed course project, to be done in pre-assigned teams of up to 5.
Final weightages were not disclosed.

Topics Covered in the Course

Emerging trends in Artificial Intelligence and Data Science
Data scales and representation, Set theory, Similarity and Dissimilarity Measures, Central limit theorem, t-distribution and confidence intervals, p-values, Hypothesis testing using simple linear regression, Least squares multivariate linear regression, t-statistics and f-statistics, robust regression, Singular Value Decomposition, ridge and lasso regularization, Bias-variance tradeoff, resampling and bootstrapping, model selection, Principal Component Analysis and dimensionality reduction, gradient descent, stochastic gradient descent, mini-batch gradient descent
Logistic regression, Bayesian algorithms, Linear discriminant analysis, K-nearest neighbours, confusion matrix, recall and precision, tree-based learning algorithms (random forest, bagging and boosting, pruning), cross-entropy metric
Support vector machines, neural networks, dropout, early stopping, convolutional neural nets for image data, padding and pooling, typical CNN architectures (AlexNet, VGG, Inception, ResNet)
Data pre-processing techniques, K-means clustering, hierarchical clustering

Teaching Style

Initially Microsoft Teams and later Google Meet was used to conduct the lectures.
Since the class size was very large, the lectures were mostly monologues, and not very interesting.

Tutorials/Assignments/Projects

3 out of the 4 assignments were extremely coding heavy, and required students to implement various algorithms in Python (using libraries). The other assignment was mathematical, based on the concepts taught in class.
The project was very well-structured and the professor expected everyone to spend a lot of work on technical as well as non-technical areas.

Feedback on Exams

The solitary quiz was mathematical and tested basic knowledge of statistics and probability.
The manner of conduct of the endsem was quite interesting. The question paper was released on Moodle with a list of questions and tasks to perform. The professor had created a portal from where each student could download a personalized data set by requesting for an OTP. The coding tasks had to be performed and a report had to be submitted online within 24 hours of requesting the OTP. The portal was open for about a week, so students could do it whenever they found it convenient.

Motivation for taking this course

A ML course in Mech?! What’s not to like about it :-D
In all seriousness, I was enthused by the opportunity to do a DE in a field that I was genuinely interested in.

Course Importance

Quite good as an introductory ML course. Certainly more non-technical than the contemporaries in the CSE, CSRE, EE and DS departments, while still providing a decent flavour of a plethora of ML concepts.

Looking for an ML crash course before placements or internships? Not a bad course to take.

When to take this course?

5th semester. Probably the best time to take it as well. The professor asked all second year students to de-register due to the large class size.

Going Forward

There are several advanced ML courses in the CSE, EE and CSRE departments. You can also branch out and explore other fields in the larger domain of artificial intelligence, such as NLP, speech recognition, deep learning, reinforcement learning etc.

References Used

The Elements of Statistical Learning by Hastie, Tibshirani et al is quite useful.

ME 781 Review By: Aditya Iyengar

03 Jul 2021

courses

« ME766 - High Performance Scientific Computing ME793 - Multiscale Materials Informatics, Discovery and Design »