19 Jun: (1) Ass1 marks released. You can check your mark and comments by classrun -collect ass1. Please contact the marker (wangj@cse) if you have any question. (2) a copy of 05s1 exam paper is available at ~cs9318/share/COMP9318-Final-05s1.ps (Note: it is not accessible from the Web. You need to login to cse and then 'cp ~cs9318/share/COMP9318-Final-05s1.ps . to get it to your current directory. (3) pre-exam consultation: 24 & 25 June, 1500--1700, K17-508M.
4 Jun: A new FAQ entry added to ass1 FAQ page.
2 Jun: (1) We will give a review lecture followed by Q&A this Thu. (2) Tut4 on Thu too. (3) FAQs for ass1 and proj1 updated.
24 May: Check your email for important announcements regarding q1, proj1, tut3, and tut4.
21 May: (1) q1 marks released. You can check your mark and comments by classrun -collect q1. Please first contact the marker (juanjuan for q1.1 and jianbin for q1.2) if you have any question. (2) proj1 spec updated, see the Update History for details.
8 May: (1) Proj1 spec online. (2) Tut2 solution online. (3) There is a typo in ass1 spec (“probably” should be “probability”).
7 May: Ass1 spec online.
5 May: We will have tut2 this week. Note that the THU 1600-1700 and 1700-1800 tutorials will remain in the same place (MorvB LG2) (the one that were displaced in the last two weeks).
30 Apr: (1) tut1 sol online. (2) there was a problem with the give submission system; it was fixed yesterday and you should have no problem submitting your q1. (3) q1 FAQ online.
23 Apr: proj1 spec Part I is online.
15 Apr: (1) We will have tut1 next week (23 Apr). (2) Quiz 1 spec released.
27 Feb: FYI, there is no COMP9318 lecture during Week 0.
24 Feb: You can now take a sneak peak at what you will be implementing in the programming project here.
18 Feb: Rules regarding emails:
All emails should be sent to the course account cs9318@cse.
Strict spam email filtering is enforced due to the high level of spam emails. Please make sure you use either CSE or UNSW email address to email cs9318.
18 Feb: Course web site updated.
We expect that students review the corresponding lecture materials and attempt the tutorial questions before coming to tutorials.
I will send emails and post an announcement on the web page reminding you that we will have a tutorial the next week.
| Tutorial | Week | Contents | Solution |
| tut1 | Week 6 | Data Warehousing and Data Preprocessing | sol |
| tut2 | Week 8 | Clustering | sol |
| tut3 | Week 10 | Classification | sol |
| tut4 | Week 12 | Association Rule Mining | sol |
| Specification | Topic(s) | Deadline |
| Quiz 1 (q1 FAQ) (q1 results) | Data Warehousing and OLAP | 6 May, 2009 |
All assignments are individual assignments.
| Specification | Topic(s) | Deadline | Solution |
| Ass1 (ass1 FAQ) | misc | 4 Jun 2009 | sol |
All projects are individual projects.
| Specification | Topic(s) | Deadline |
| Proj1 (proj1 FAQ) | Clustering Search Engine | 5 Jun 2009 |
It is possible for you to propose your own course project. Please contact the Lecturer-in-Charge to discuss this option.
See here.
| Name | Role | Telephone | |
| Dr. Wei Wang | Lecturer-in-charge | 9385 7162 | cs9318@cse |
| Yifei Lu | tutor | 9385 7225 | yifeil@cse |
| Jianbin Qin | tutor | 9385 7205 | jqin@cse |
| Juanjuan Wang | tutor | wangj@cse |
| Ref | Role | Book |
| [HK00] | Textbook | Data Mining: Concepts and Techniques, Jiawei Han and Micheline Kamber. Kaufmann Publishers, August 2000. ISBN: 1-55860-489-8 |
| [WF00] | Reference Book | Data Mining : Practical Machine Learning Tools andTechniques with Java Implementations, Ian H. Witten, Eibe Frank. Morgan Kaufmann, 2000. ISBN: 1558605525. |
Errata of the textbook [HK00]: here.
| Software | Comment |
| Pentaho Mondrian | OLAP Server |
| Pentaho Kettle | ETL toolkit |
| Weka | Data mining toolkit |
9318 Mondrian Server: http://snare09.cse.unsw.edu.au:8080/mondrian-embedded/index.html
| Day | Time | Location |
| Thu | 1800 -- 2100 | Civil Engineering G1 (K-H20-G1) |
| Day | Time | Location |
| MON | 1600 -- 1700 | K17 507 |
| Week | Contents | Reading | Tut/Quiz/Ass/Proj |
| 1 | Course Introduction + Introduction | Chap 1 | |
| 2 | Data Warehousing and OLAP + BUC (updated 9 Apr; fix the missing (1,1,2) tuple and added two more slides) | Chap 2 + the BUC paper | |
| 3 | Data Warehousing and OLAP + MDX tutorial + Address Data Cleansing | Chap 2 + 9318 Mondrian Server | |
| 4 | Edit Distance + Approx String Join | Chap 3 + Pentaho Kettle | |
| 5 | Data Pre-processing + Similarity Join | Chap 3 | |
| B | |||
| 6 | IR Preliminaries + Clustering | Chap 8 | tut1 |
| 7 | Clustering + Hierarchical Clustering | Chap 8 | |
| 8 | Clustering + DBScan + Classification | Chap 8 + 7 | tut2 |
| 9 | Classification + Text Classification (optional) | Chap 8 + 7 | |
| 10 | Association Rule Mining + FP-tree | Chap 6 | tut3 |
| 11 | Association Rule Mining + TiVo + WWW 2008 paper | Chap 6 + Reading material | |
| 12 | Review | tut4 |
Reading list:
Week 1:
Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth: From Data Mining to Knowledge Discovery in Databases. AI Magazine 17(3): 37-54 (1996)
Week 2:
Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, Hamid Pirahesh: Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub Totals. Data Mining and Knowledge Discovery. 1(1): 29-53 (1997) [Note: there is an “error” on P6 though. ]
Kevin S. Beyer, Raghu Ramakrishnan: Bottom-Up Computation of Sparse and Iceberg CUBEs. SIGMOD Conference 1999: 359-370 [Note: Try to think about a few efficient ways to compute the data cube for large base tables before reading this paper, which talks about an elegant yet highly efficient way to compute the data cube.]
Week 3:
Surajit Chaudhuri, Umeshwar Dayal: An Overview of Data Warehousing and OLAP Technology. SIGMOD Record 26(1): 65-74 (1997).
How many “Wei Wang” are there? See different attempts at
Week 4:
Luis Gravano, Panagiotis G. Ipeirotis, H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Divesh Srivastava: Approximate String Joins in a Database (Almost) for Free. VLDB 2001: 491-500.
Surajit Chaudhuri, Venkatesh Ganti, Raghav Kaushik: A Primitive Operator for Similarity Joins in Data Cleaning. ICDE 2006: 5.
Week 5:
Chuan Xiao, Wei Wang, Xuemin Lin, Jeffrey Xu Yu. Efficient Similarity Joins for Near Duplicate Detection. WWW 2008.
H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Viswanath Poosala, Kenneth C. Sevcik, Torsten Suel: Optimal Histograms with Quality Guarantees. VLDB 1998: 275-286. |
Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997).
Week 6:
Week 7:
Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD 1996: 226-231.
Tim Manns: Importance of Data Mining Analytics in Marketing. 2008.
Week 8:
Week 9:
Chap 13 of Introduction to Information Retrieval
Week 10:
Qiankun Zhao, Sourav S. Bhowmick. Association Rule Mining: A Survey. 2003.
Jiawei Han, Jian Pei, Yiwen Yin: Mining Frequent Patterns without Candidate Generation. SIGMOD 2002: 1-12.
Week 11:
Kamal Ali, Wijnand van Stam: TiVo: making show recommendations using a distributed collaborative filtering architecture. KDD 2004: 394-401.
Atsushi Fujii: Modeling anchor text and classifying queries to enhance web document retrieval. WWW 2008: 337-346.