1. Announcement

  • 31 Oct: Project 1 is now released. Please download the files from Teams → COMP6714-21T3 → General → Files → project.

  • 27 Oct: Assignment 1 is now released. Please see the Assignment section for details.

  • 15 Nov*: The deadline for the assignment are now postponed to 17 Nov, 20:59:59.

  • 15 Nov*: A set of testcases for the project has been uploaded to Teams. Please try it if you want.

  • 22 Sep: If you have recently enrolled in this course (e.g., after week1), please check: 1) you have been enrolled in the group of COMP6714 in MS Teams, and 2) you have been enrolled in Piazza. Please contact the LiC by email if you were not in either of the above groups.

  • 15 Sep: The first lecture will start at 1pm 15th Sep. Please use MS Teams to attend the online lecture. You should have received the invitation through email. It is fine if you cannot attend the lecture - we will put the recording online afterwards. Also please activate your piazza account if you have not done yet, we will use piazza as the course forum this semester.

  • 6 Sep: Course web site online. Please note that (1) we will be using python 3 as the programming language for the course. If you are not familiar with python or other similar languages (e.g., ruby), please start learning python as early as possible. We will also be using Jupyter notebook. (2) You are strongly suggeted to install the anaconda distribution: https://www.anaconda.com/products/individual and use it to manage your python enviroments. (3) https://cocalc.com offers online Jupyter notebook if you want to try it out without installing a local copy. (4) You can google for good tutorials of python and Jupyter notebook by yourself, or use these ones:

2. Course Forum

3. Written Assignments

All assignments are individual assignments.

Specification

Topic(s)

Deadline

Assignemnt 1

Till week 8

15 Nov 2021

4. Programming Projects

The project is an individual work.

Specification

Topic(s)

Deadline

Project 1

Group Varint Encoding and Evaluation

19 Nov 2021

5. About the Course

5.1. Detailed Course Introduction

See here.

5.2. Staff

Name

Role

Email

Dr. Yifang Sun

Lecturer-in-charge

yifangs AT cse.unsw.edu.au (Only for issues not resolvable over the forum)

5.3. Textbook and Reference Books

Ref

Role

Book

[MRS08]

Textbook

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze, Introduction to Information Retrieval, Cambridge University Press. 2008.

[CMS09]

Textbook

W. Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, Pearson. 2009.

[JM19]

Textbook

Dan Jurafsky, James H. Martin, Speech and Language Processing (3rd ed. draft). 2019

[BCC10]

Reference

Stefan Buettcher, Charles L. A. Clarke, Gordon V. Cormack, Information Retrieval: Implementing and Evaluating Search Engines, The MIT Press. 2010.

5.4. Lecture Time

NOTE: all lectures are online

Day

Time

Location

Wed

1300 — 1500

Online

Fri

1100 — 1300

Online

5.5. Consultation Time

  • We will mainly use Piazza for collaborative Q&A.

  • No physical consultation will be offered. Instead, we offer online consultation via Zoom:

    • Weekly consultation session by the tutors: time and room to be announced

    • If you want to have a private online consultation with me, please book an appointment with me with a brief description of your questions (yifangs@cse, with the keyword [COMP6714] in the Subject).

6. Syllabus

Week

Contents

Reading

Assignment/Project

1-a

Course Info + Introduction

CMS Chap 1, MRS Chaps 1, 2

1-b

Boolean Retrieval

CMS Chap 1, MRS Chaps 1, 2

2-a

Boolean Retrieval

CMS Chap 4, 5, MRS Chaps 2, 3

2-b

Preprocessing

CMS Chap 3, 4, 5, MRS Chaps 4, 5

3-a

Tolerant Retrieval

CMS Chap 3, 5, MRS Chaps 4, 5

3-b

Spelling Correction

CMS Chap 3, 5, MRS Chaps 4, 5

4-a

Index Construction

CMS Chap 3, 5, MRS Chaps 4, 5

4-b

Compression

MRS Chaps 5

5-a

Compression

MRS Chaps 5

5-b

Compression + Vector Space Model

MRS Chaps 5

7-a

Vector Space Model

MRS Chaps 5, 6, 7 + CMS Chap 4, 5

7-b

Query Processing in VSM

MRS Chaps 5, 6, 7 + CMS Chap 4, 5

8-a

Query Processing in VSM + Evaluation

MRS Chaps 5, 6, 7 + CMS Chap 4, 5, 7

8-b

Evaluation + Web Characteristics

MRS Chaps 8, 16

9-a

Web Characteristics + Crawling

MRS Chaps 16, 17

9-b

Crawling +Link Analysis

MRS Chaps 17, 21

10-a

Link Analysis + Review

MRS Chaps 21

6.1. Jupyter Notebooks

The easiest way is to use git. Once you have installed git command line (e.g., in OS X, install homebrew and then use brew install git),

  • (only once) go to a folder, and type git clone https://github.com/DBWangGroupUNSW/COMP6714.git

  • to synch up with our repo, just type git pull in the repo.


UNSW (The University of New South Wales) CRICOS Provider Number: 00098G