Spring 2007
Course Information - Topics
Current technology has made available enormous amounts of data. Many
organizations are creating huge databases of business data, such as consumer
data, transaction histories, etc. Scientists and engineers in many fields are
capturing increasingly complex experimental data sets. But why do people
store so much data?
The main objective is to extract (or mine) interesting patterns, associations,
rules, changes, anomalies, and general regularities from the data to improve
the process of decision making.
In this course we will study the tools needed for efficient inference of
these types of knowledge from massive data sets.
Topics covered include:
- data preprocessing,
- data warehousing,
- information filtering,
- mining frequent patterns, associations and correlations,
- stream, time series and sequence data mining,
- spatial, multimedia, web and text mining,
- classification and prediction,
- cluster analysis,
- Bayesian and neural networks,
- classification and regression trees,
- hypotheses evaluation,
- feature extraction,
- dimensionality reduction, singular value decomposition,
- data compression and reconstruction,
- visualization of large data sets,
- fractals in databases, and
- indexing methods that support efficient data mining and queries by
content.
Special emphasis will be given to multimedia, business, scientific,
and medical databases. You can find a tentative outline
of the course here.
Instructor
- Vasilis
Megalooikonomou, email: vasilis AT temple DOT edu
- Office: 314 Wachman Hall, phone: 215-204-5774
- Office Hours: W 3-4pm, Th 11-12noon, or by appointment
Class
- Meets: W 4:40-7:10pm (classes have two parts: the first 4:40-5:50 and the second 6:00-7:10),
room TL403B
Prerequisites
Basic knowledge in database systems (CIS616/661 or permission from the
instructor), programming experience and some preliminary background in
statistics, linear algebra, and data structures
Text
- Required:
Han and Kamber, Data
Mining: Concepts and Techniques, Morgan
Kaufmann Publishers, Second Edition, 2006.
- Optional:
Christos Faloutsos, Searching Multimedia Databases by Content, Kluwer
Academic Press, 1996.
Tom Mitchell, Machine
Learning, McGraw Hill, 1997.
K. J. Cios, W. Pedrycz, R. Swiniarski (eds.), Data Mining Methods for
Knowledge Discovery, Kluwer Academic Press, 1998.
Presentations
Each student will make a 15-20 minute presentation of a research paper to
the rest of the class. Presentations count for 15%
(quality of slides: 7% + presentation: 8%) of the final grade.
There will be a list of papers to choose from.
Please send me an email to sign up for a presentation by Feb. 8.
The presentation slides are due one day before the presentation.
Method of evaluation
- Project: 45%
- Paper presentation: 15%
- Homework and class participation: 15%
- Midterm exam: 10%
- Final exam: 15%
Late policy: The project parts and homework assignments are
due in class on the specified due date. No late submissions will be
accepted. For fairness, this policy will be strictly enforced.
Exams: All aids are allowed (open books, open notes, caclulators,
etc).
Project
- The course load involves a project. Students will have the opportunity
to acquire hands-on experience with practical databases and real-world
data mining problems and demonstrate your data-mining skills in the context of
a focused project under close faculty supervision. There will be a
variety of suggested projects to choose from. The selection of a research
project should be done by Feb 9. The project proposal due date is
Feb. 14. Students are expected to write a final report/paper on their
project and present their work in class (10-15 minute presentation) at
the last week of classes. The projects will be carried out in teams of 2 and
may lead to publications. [Project presentation: 5% + (Report, Code, Testing, Documentation): 40%]
Important Dates
- First class: January 17.
- Last day to drop the course (tuition refund available): January 29.
- Last day to withdraw (no refund): March 26.
- Spring break: March 4-11.
- Last Class: April 25.
Honor Code
- All work submitted for credit must be your own.
- You may discuss the homework problems with your classmates,
the teaching assistant, and the instructor. You must acknowledge
the people with whom you discussed your work, and you must write
up your own solutions and code. Any written sources (apart from the text)
used must also be acknowledged; however, you may not consult
any solutions from previous years' assignments whether they are
student or faculty generated.
- Plagiarism will be handled with severe measures.
- Please ask if you have any questions about the Honor Code.
Violations of the honor code will be treated seriously. Please check
the Temple University policy on Plagiarism and Academic Cheating.
Disabilities
I encourage students with disabilities, including "invisible" disabilities
such as chronic diseases and learning disabilities, to discuss with us any
appropriate accomodations that we might make on their behalf. Student must
provide me with a note from the office of Disability Resources and Services
at in 100 Ritter Annex, 215-204-1280, regarding their disability.
Switch to: