Sat Sept 2 11:07:20 EST 2000
A Graduate Seminar in
Knowledge Discovery and Data Mining
Sat Sept 2 15:07:52 EST 2000
Course Information - Topics
Current technology has made available enormous amounts of data. Many
organizations are creating huge databases of business data, such as consumer
data, transaction histories, etc. Scientists and engineers in many fields are
capturing increasingly complex experimental data sets. But why do people
store so much data?
The main objective is to extract (or mine) interesting patterns, associations,
rules, changes, anomalies, and general regularities from the data to improve
the process of decision making.
In this course we will study the tools needed for efficient inference of
these types of knowledge from massive data sets.
Topics covered include:
Special emphasis will be given to multimedia/medical databases.
- association-rule mining, sequence mining, web and text mining,
- data warehousing,
- information filtering,
- classification and clustering analysis,
- Bayesian and neural networks,
- classification and regression trees,
- hypotheses evaluation,
- feature extraction,
- dimensionality reduction, singular value decomposition,
- data compression and reconstruction,
- visualization of large data sets,
- fractals in databases, and
- indexing methods that support efficient data mining and queries by
- Vasilis Megalooikonomou, email: email@example.com
- Office: 314 Wachman Hall, phone: 215-204-5774
- Office Hours: M 3-5pm, Tu 1-2pm, or by appointment
- The course will be run in a research seminar format as a mixture of
lectures, student presentations, discussions, and research projects.
- Meets: Tu 7:25-9:55pm?, room: Tuttleman Learning Center 302
- programming skills in C and/or C++
- basic knowledge in database systems
- basic statistics, graph theory, and linear algebra
The mailing list cis595-F00 has been created for the class and it will be
used to send messages about the presentations, projects, and other
topics. I expect
that you read your email regularly. Otherwise you may miss important
The web page for the course is
http://www.cis.temple.edu/~vasilis/Courses/CIS595. This page
will contain handouts, and other information related to CIS 595.
Han and Kamber, Data Mining: Concepts and Techniques, Morgan
Kaufmann Publishers, Aug. 2000.
Christos Faloutsos, Searching Multimedia Databases by Content, Kluwer
Academic Press, 1996.
Tom Mitchell, Machine
Learning, McGraw Hill, 1997.
K. J. Cios, W. Pedrycz, R. Swiniarski (eds.), Data Mining Methods for
Knowledge Discovery, Kluwer Academic Press, 1998.
- Other recommended texts:
A. Silberschatz, H.F. Korth, and S. Sudarshan, Database System Concepts,
3nd edition, McGraw Hill Inc.
W. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery, Numerical
Recipes in C, Cambridge University Press, 1992.
Michael Stonebraker: Readings in database systems
- Each student will present at least one research paper to the rest of
the class. There will be a list of papers to choose from.
Homework, class preparation and participation
- Before each paper presentation students are expected to read the
assigned paper(s) and prepare a one page summary (homework) of the
paper(s) along with comments and criticisms. During the presentations
students are expected to participate in and better still lead the discussions.
The summaries and participation will be graded.
- The course load involves a project. Students will have the opportunity
to acquire hands-on experience with practical databases and real-world
data mining problems and demonstrate your data-mining skills in the context of
a focused project under close faculty supervision. There will be a
variety of suggested projects to choose from. Students are expected to
write a final report/paper on their projects and present their work in
class.The projects will be carried out in teams of 2 and may lead to
Method of evaluation
- Presentation: 25%
- Homework, class preparation and participation: 25%
- Project: 50%