Mech DAMP Blog

CS635 - Information Retrieval & Mining for Hypertext & the Web

Instructor

Soumen Chakrabarti

Semester

Autumn ‘21

Course Difficulty

The course content is moderate in difficulty. The lectures are sufficient for the evaluation conducted, but the reference texts are quite relevant and should be explored to get a deeper understanding of topics explored in the lectures. Easy weekly mini-quizzes are conducted to incentivise students to keep up with the lectures.

Time Commitment Required

6-8 hours per week

Grading Policy and Statistics

AA 6
AB 8
BB 3
II 1
Total 18

Attendance Policy

None

Pre-requisites

No hard prerequisites. Fundamental knowledge of probability theory is expected. Basic knowledge of graph theory would be helpful but it is not mandatory.

Evaluation Scheme

Marking scheme for the course (the total is capped at 100):

60% for best 2 out of the 3 exams (a bonus of 7, 5, 3, 1 marks for >=90%, >=80%, >=70%, >=60% in the third exam respectively)
10% for MCQ in final
15% for the safe quizzes (best 6 out of total)
15% for the 2 assignments
any in-class extra credit

Topics Covered in the Course

Text indexing and index compression, Relevance ranking (methods and metrics), Similarity search of documents, Corpus models, Document Labeling and Topic Modelling, Learning to Rank documents subject to a query, Measuring and Modeling the Web, Proximity search in graphs, Web Crawling, Monitoring, and Sampling.

Check out the primary reference text to get an idea of the content that will be covered in this course: https://www.cse.iitb.ac.in/~soumen/mining-the-web/

Tutorials/Assignments/Projects

The coding assignments are moderate in level of difficulty and time commitment required, and they give a nice hands-on experience of the topics being covered in class.

Feedback on Exams

Weekly mini-quizzes are low in difficulty and just require you to be up to date with the content covered in lectures. The quizzes, midsem, and endsem range from moderate to slightly difficult, and require a good grasp of the theory.

Motivation for taking this course

You should take this course if you are highly intrigued by most of the following questions:

How is the very very large amount of data organised on the web?
How is the world wide web crawled to discover new documents?
How are search systems (like search engines) able to return almost instanty - highly relevant results from billions of documents on the web based on a very sparse query?
How do search engines like google continue to improve with time? How have they evolved since their inception?

When to take this course?

I took this course in my 7th semester. For people looking to pursue research in this domain in their final year, 5th semester would be the ideal semester for taking this course.

Going Forward

CS728 - Organising Web Information can be taken up in the following semester. Though CS728 can be taken as a standalone course, CS635 and CS728 have been designed to be taken in this order for an ideal coverage of relevant topics in this domain, and to prepare you to undertake research in this field.

References Used

Mining the Web - Discovering Knowledge from Hypertext Data (https://www.cse.iitb.ac.in/~soumen/mining-the-web/)

Review By: Shubham Lohiya

23 Jul 2022

core courses

courses

« CS416 - Computer and Network Security CS689 - Machine Learning: Theory and Methods »