The file contains information about all datasets in the archive Name size URL sunspot.dat 2899x1 http://xweb.nrl.navy.mil/timeseries/multi.diskette darwin.dat 1400x1 http://www.stat.duke.edu/~mw/ts_data_sets.html spot_exrates.dat 2567x12 http://www.stat.duke.edu/data-sets/mw/ts_data/all_exrates.html power_data.dat 35040x1 http://www.win.tue.nl/~vanwijk/clv.pdf EEG_heart_rate.dat 7200x2 http://reylab.bidmc.harvard.edu/DynaDx/case-study/seizure/menu.html ERP_data 31x6400 http://www.cnl.salk.edu/~scott/ica-download-form.html pgt50_alpha.dat 990x18 http://arep.med.harvard.edu/timewarp/supplement.htm pgt50_cdc15.dat 990x24 http://arep.med.harvard.edu/timewarp/supplement.htm shuttle.dat 1000x6 http://www-aig.jpl.nasa.gov/public/mls/time-series/ tickwise.dat 279113x1 http://www.stern.nyu.edu/~aweigend/Time-Series/Data/ water.dat 2191x3 ftp://ftp.stat.duke.edu/pub/bats/ chaotic.dat 1800x1 http://cns-web.bu.edu/pub/cn550/project98/data/time_series_specification.html tide.dat 8746x1 http://lib.stat.cmu.edu/jasadata/percival-m ocean.dat 4096x1 http://www.ms.washington.edu/courses/stat530/data/msp-data-4096 steamgen.dat 9600x4 http://www.esat.kuleuven.ac.be/~tokka/daisydata.html cstr.dat 7500x3 http://www.esat.kuleuven.ac.be/~tokka/daisydata.html winding.dat 2500x7 http://www.esat.kuleuven.ac.be/~tokka/daisydata.html dryer2.dat 867x6 http://www.esat.kuleuven.ac.be/~tokka/daisydata.html phdata.dat 2001x3 http://www.esat.kuleuven.ac.be/~tokka/daisydata.html evaporator.dat 6305x6 http://www.esat.kuleuven.ac.be/~tokka/daisydata.html powerplant.dat 2400x1 http://www.esat.kuleuven.ac.be/~tokka/daisydata.html glassfurnace.dat 1247x9 http://www.esat.kuleuven.ac.be/~tokka/daisydata.html flutter.dat 1024x2 http://www.esat.kuleuven.ac.be/~tokka/daisydata.html robot_arm.dat 1024x2 http://www.esat.kuleuven.ac.be/~tokka/daisydata.html foetal_ecg.dat 2500x9 http://www.esat.kuleuven.ac.be/~tokka/daisydata.html tongue.dat 50x14 http://www.esat.kuleuven.ac.be/~tokka/daisydata.html ballbeam.dat 1000x2 http://www.esat.kuleuven.ac.be/~tokka/daisydata.html wind.dat 6574x15 http://lib.stat.cmu.edu/datasets/ balloon.dat 2001x2 http://lib.stat.cmu.edu/datasets/ wool.dat 310x9 http://lib.stat.cmu.edu/datasets/ standardandpoor500.dat 17610x2 http://lib.stat.cmu.edu/datasets/ speech.dat 1020x1 http://lib.stat.cmu.edu/general/tsa/tsa.html earthquake.dat 4096x1 http://lib.stat.cmu.edu/general/tsa/tsa.html soiltemp.dat 2304x1 http://lib.stat.cmu.edu/general/tsa/tsa.html buoy_sensor.dat 13991x4 http://ccs.ucsd.edu/zoo/ http://ccs.ucsd.edu/zoo/camp/current_meters/first_set/ infrasound_beamd.dat 8192x1 http://lib.stat.cmu.edu/general/tsa/tsa.html network.dat 18000x1 burst.dat 9382x1 eeg.dat 512x22 koski_ecg.dat 44002x1 http://www2.cs.utu.fi/staff/antti.koski/abs.html random_walk.dat 65536x1 synthetic.dat 100001x10 http://kdd.ics.uci.edu/ greatlakes 984x5 {Ivan Popivanow} leleccum 4320x1 A test time series that come free with matlab attas.dat 1024x2 {Frank Hoppner} packets 360000x1 From M Faloutsos burstin 50000x1 http://cs.nyu.edu/cs/faculty/shasha/papers/burst.d/burst.html phone1.txt 1708x8 http://www.teco.edu/tea/datasets/phone1.xls motorCurrent.dat 420x1500 http://povinelli.eece.mu.edu/ dataset_kalpakis.zip ECGdata http://www.csee.umbc.edu/~kalpakis/TS-mining/ts-datasets.html Income data Konstantinos Kalpakis, Dhiral Gada, and Vasundhara Puttagunta, Population data "Distance Measures for Effective Clustering of ARIMA Time-Series". Temp data In the Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM'01) Fluid_dynamics.dat 10000x1 http://www.cs.utah.edu/techreports/2003/pdf/UUCS-03-021.pdf GunX 100x151 http://www.cs.ucr.edu/~eamonn/ Trace 200x275 http://www.cs.ucr.edu/~eamonn/ Leaf_all 442*150 stocks_sigmod2004_paper 108 megs Huanmei Wu TWO-PAT dataset 5000*128 http://www.montefiore.ulg.ac.be/~geurts/thesis.html laser 1000*1 http://www-psych.stanford.edu/~andreas/Time-Series/SantaFe.html Physiological_data_B1 17000*3 http://www-psych.stanford.edu/~andreas/Time-Series/SantaFe.html Physiological_data_B2 17000*3 http://www-psych.stanford.edu/~andreas/Time-Series/SantaFe.html Astrophysical data 27704*1 http://www-psych.stanford.edu/~andreas/Time-Series/SantaFe.html ECG_znorm205.txt 56*206 cam_mouse_*** 2*1719 Nasa_valve {archive} http://cs.fit.edu/~mmahoney/nasa/ HydroData A few short River-Level datasets (Nile, Senegal). Probably too short for data miners... gait_time Gait in Aging and Disease Database physiodata DETAILED NOTES -------------------------------------------------------------------------------- SUNSPOT: Every month from Jan 1749 to July 1990 SOLAR CYCLES IN SUNSPOT NUMBERS From: http://xweb.nrl.navy.mil/timeseries/multi.diskette Periodicities in sunspot numbers are well established with periods of 11 years and 27 days. The 11 year cycle is the period at which the Sun's magnetic field reverses and regenerates itself. The 27 day period is the Sun's rotation period. Establishing the existence of other real periods in solar observational data has long been of interest because the cluse provide insight into the mechanisms of solar variability. -------------------------------------------------------------------------------- DARWIN: Monthly values of the Darwin SLP series, from 1882 to 1998. This series is a key indicator of climatological patterns and has been used in a range of studies related to El Nino and the SOI. -------------------------------------------------------------------------------- SPOT_EXRATES: Two files contain the spot prices (foreign currency in dollars) and the returns for daily exchange rates of the following currencies relative to the US dollar AUD Australian Dollar BEF Belgian Franc CAD Canadian Dollar FRF French Franc DEM German Mark JPY Japanese Yen NLG Dutch Guilder NZD New Zealand Dollar ESP Spanish Peseta SEK Swedish Krone CHF Swiss Franc GBP UK Pound There are 2567 (work-)daily spot prices, and so 2566 daily returns for each of these 12 currencies, over the period of about 10 years -- 10/9/86 to 8/9/96 -------------------------------------------------------------------------------- POWER_DATA: Cluster and Calendar-based Visualization of Time Series Data http://www.win.tue.nl/~vanwijk/clv.pdf "I appreciate your interest in our paper and time series analysis tool CLVIEW. Of course we are willing to contribute to your collection of tools. Please find attached the file with time series data. It is an ascii file containing the 15 minutes averaged values of power demand for our research center in the full year 1997. The first value is at 00.15 am, January 1st, 1997. These data were used for our InfoVis 1999 paper. You can include the file in your web site. Regards, Edward van Selow" --------------------------------------------------------------------------------- EEG_heart_rate.dat: Time is in first column, Heart Rate in second At time zero there was a Epileptic seizure --------------------------------------------------------------------------------- ERP_data: Grand mean ERP data analyzed in: Makeig, S., Westerfield, M., Townsend, J., Jung, T-P, Courchesne, E. and Sejnowski, T. J., Functionally independent components of early event-related potentials in a visual spatial attention task. Philosophical Transactions of the Royal Society: Biological Sciences 354:1135-44, 1999. Notice: These files are copyright 1999, Eric Courchesne, La Jolla CA. The data were collected by Marissa Westerfield and Jeanne Townsend in the laboratory of Dr. Eric Courchesne, UCSD. All rights to use of this data other than for personal educational use are reserved. No other uses are permitted without the written permission of Dr. Courchesne (eric@nodulus.extern.ucsd.edu). Data structure: Grand mean visual evoked responses in 25 task conditions averaging data from 20 subjects. Data matrix: 31 channels by 256 points by 25 conditions [size data = (31,25*256)] Epoch offset: -100 ms to 900 ms w.r.t. stimulus onset Sampling rate: 256 Hz Units: uV ----------------------------------------------------------------------------------- pgt50_alpha.dat and pgt50_cdc15.dat Aach, J and Church, GM (2001) Aligning gene expression time series with time warping algorithms. Bioinformatics 17:495-508. This files conatin gene expression time series (one gene per row). The two *.dat files contain just the data, the *.lxs files also contian the gene names. ----------------------------------------------------------------------------------- shuttle.dat This dataset has been used in many on the time series papers by Eamonn Keogh et al. ----------------------------------------------------------------------------------- Tickwise.dat Orignal file name was SFR-USD.Tickwise.gz ----------------------------------------------------------------------------------- water.dat ftp://ftp.stat.duke.edu/pub/bats/ Dozens of time series used in the BATS software and Bayesian time series analysis and forecasting books are available at the ISDS BATS ftp site http://www.stat.duke.edu/~mw/ts_data_sets.html ------------------------------------------------------------------------------------ chaotic.dat The time series f(t) to be predicted is a modification of the official data set for the Time Series Prediction Competition to be held at Katholieke Universiteit in Leuven, Belgium, July 8-10, 1998. ------------------------------------------------------------------------------------ tide.dat (*) Time series used in "Analysis of Subtidal Coastal Sea Level Fluctuations Using Wavelets" by D. B. Percival and H. O. Mofjeld, JASA September 1997, Vol. 92, No. 439, 868-880. (*) The time series is a low-passed version of hourly water level observations (subsampled from 6-min data) made by the NOAA tide gauge at Crescent City, California (41 deg 44.7 min North, 124 deg 11.0 min West). The water level observations were made inside the stilling well of the tide gauge to eliminate high-frequency wind waves and swell. After the data were demeaned, a Kaiser low-pass filter was used to remove the tides from the hourly observations. The series was then subsampled every 0.5 days. While the original water levels were recorded in feet, they were converted to centimeters during processing. Since these observations were collected and processed by US Federal employees supported by Federal funds, the series may be used freely without regard to copyright restrictions. Questions concerning the series should be send to Hal Mofjeld (mofjeld@pmel.noaa.gov). ------------------------------------------------------------------------------------- ocean_shear.dat (the sampling interval for this series is 0.1 meters; the starting depth is 350.0 meters; and the last depth is 1037.4 meters); ------------------------------------------------------------------------------------- steamgen.dat [98-003] Model of a steam generator at Abbott Power Plant in Champaign IL Contributed by: Jairo Espinosa ESAT-SISTA KULEUVEN Kardinaal Mercierlaan 94 B-3001 Heverlee Belgium jairo.espinosa@esat.kuleuven.ac.be Description: The data comes from a model of a Steam Generator at Abbott Power Plant in Champaign IL. The model is described in the paper of Pellegrineti [1]. Sampling: 3 sec Number: 9600 Inputs: u1: Fuel scaled 0-1 u2: Air scaled 0-1 u3: Reference level inches u4: Disturbance definde by the load level Outputs: y1: Drum pressure PSI y2: Excess Oxygen in exhaust gases % y3: Level of water in the drum y4: Steam Flow Kg./s References: [1] G. Pellegrinetti and J. Benstman, Nonlinear Control Oriented Boiler Modeling -A Benchamrk Problem for Controller Design, IEEE Tran. Control Systems Tech. Vol.4No.1 Jan.1996 [2] J. Espinosa and J. Vandewalle Predictive Control Using Fuzzy Models Applied to a Steam Generating Unit, Submitted to FLINS 98 3rd. International Workshop on Fuzzy Logic Systems and Intelligent Technologies for Nuclear Science and Industry Properties: To make possible the open loop identification the wter level was stabilized by appliying to the water flow input a feedforward action proportional to the steam flow with value 0.0403 and a PI action with values Kp=0.258 Ti=1.1026e-4 the reference of this controller Column 1: output drum pressure Column 2: output excess oxygen Column 3: output water level Column 4: output steam flow ------------------------------------------------------------------------------------------ cstr.dat Continuous stirred tank reactor Contributed by: Jairo ESPINOSA ESAT-SISTA KULEUVEN Kardinaal Mercierlaan 94 B-3001 Heverlee Belgium espinosa@esat.kuleuven.ac.be Description: The Process is a model of a Continuous Stirring Tank Reactor, where the reaction is exothermic and the concentration is controlled by regulating the coolant flow. Sampling: 0.1 min Number: 7500 Inputs: q: Coolant Flow l/min Outputs: Ca: Concentration mol/l T: Temperature Kelvin degrees References: J.D. Morningred, B.E.Paden, D.E. Seborg and D.A. Mellichamp "An adaptive nonlinear predictive controller" in. Proc. of the A.C.C. vol.2 1990 pp.1614-1619 G.Lightbody and G.W.Irwin. Nonlinear Control Structures Based on Embedded Neural System Models, IEEE Tran. on Neural Networks Vol.8 No.3 pp.553-567 J.Espinosa and J. Vandewalle, Predictive Control Using Fuzzy Models, Submitted to the 3rd. On-Line World Conference on Soft Computing in Engineering Design and Manufacturing. Properties: Columns: Column 1: time-steps Column 2: input q Column 3: output Ca Column 4: output T ----------------------------------------------------------------------------------------------- winding.dat Data from a test setup of an industrial winding process Contributed by: Favoreel KULeuven Departement Electrotechniek ESAT/SISTA Kardinaal Mercierlaan 94 B-3001 Leuven Belgium wouter.favoreel@esat.kuleuven.ac.be Description: The process is a test setup of an industrial winding process. The main part of the plant is composed of a plastic web that is unwinded from first reel (unwinding reel), goes over the traction reel and is finally rewinded on the the rewinding reel. Reel 1 and 3 are coupled with a DC-motor that is controlled with input setpoint currents I1* and I3*. The angular speed of each reel (S1, S2 and S3) and the tensions in the web between reel 1 and 2 (T1) and between reel 2 and 3 (T3) are measured by dynamo tachometers and tension meters. We thank Th. Bastogne from the University of Nancy for providing us with these data. We are grateful to Thierry Bastogne of the Universite Henri Point Care, who provided us with these data. Sampling: 0.1 Sec Number: 2500 Inputs: u1: The angular speed of reel 1 (S1) u2: The angular speed of reel 2 (S2) u3: The angular speed of reel 3 (S3) u4: The setpoint current at motor 1 (I1*) u5: The setpoint current at motor 2 (I3*) Outputs: y1: Tension in the web between reel 1 and 2 (T1) y2: Tension in the web between reel 2 and 3 (T3) References: - Bastogne T., Identification des systemes multivariables par les methodes des sous-espaces. Application a un systeme d'entrainement de bande. PhD thesis. These de doctorat de l'Universite Henri Poincare, Nancy 1. - Bastogne T., Noura H., Richard A., Hittinger J.M., Application of subspace methods to the identification of a winding process. In: Proc. of the 4th European Control Conference, Vol. 5, Brussels. Properties: Columns: Column 1: input u1 Column 2: input u2 Column 3: input u3 Column 4: input u4 Column 5: input u5 Column 6: output y1 Column 7: output y2 ----------------------------------------------------------------------------------- dryer.dat Data from an industrial dryer (supplied by Cambridge Control Ltd) This file describes the data in the dryer.dat file. 1. Contributed by: Jan Maciejowski Cambridge University, Engineering Department Trumpington Street, Cambridge CB2 1PZ, England. jmm@eng.cam.ac.uk 2. Process/Description: Data from an industrial dryer (by Cambridge Control Ltd) 3. Sampling time: 10 sec 4. Number of samples: 867 samples 5. Inputs: a. fuel flow rate b. hot gas exhaust fan speed c. rate of flow of raw material 6. Outputs: a. dry bulb temperature b. wet bulb temperature c. moisture content of raw material 7. References: a. Maciejowski J.M., Parameter estimation of multivariable systems using balanced realizations, in: Bittanti,S. (ed), Identification, Adaptation, and Learning, Springer (NATO ASI Series), 1996. b. Chou C.T., Maciejowski J.M., System Identification Using Balanced Parametrizations, IEEE Transactions on Automatic Control, vol. 42, no. 7, July 1997, pp. 956-974. 8. Known properties/peculiarities: ----------------------------------------------------------------------------------------- phdata Simulation data of a pH neutralization process in a stirring tank Contributed by: Jairo Espinosa K.U.Leuven ESAT-SISTA K.Mercierlaan 94 B3001 Heverlee Jairo.Espinosa@esat.kuleuven.ac.be Description: Simulation data of a pH neutralization process in a constant volume stirring tank. Volume of the tank 1100 liters Concentration of the acid solution (HAC) 0.0032 Mol/l Concentration of the base solution (NaOH) 0,05 Mol/l Sampling: 10 sec Number: 2001 Inputs: u1: Acid solution flow in liters u2: Base solution flow in liters Outputs: y: pH of the solution in the tank References: T.J. Mc Avoy, E.Hsu and S.Lowenthal, Dynamics of pH in controlled stirred tank reactor, Ind.Eng.Chem.Process Des.Develop.11(1972) 71-78 Properties: Highly non-linear system. Columns: Column 1: input u1 Column 2: input u2 Column 3: output y -------------------------------------------------------------------------------------------- evaporator Data from an industrial evaporator Contributed by: Favoreel KULeuven Departement Electrotechniek ESAT/SISTA Kardinaal Mercierlaan 94 B-3001 Leuven Belgium wouter.favoreel@esat.kuleuven.ac.be Description: A four-stage evaporator to reduce the water content of a product, for example milk. The 3 inputs are feed flow, vapor flow to the first evaporator stage and cooling water flow. The three outputs are the dry matter content, the flow and the temperature of the outcoming product. Sampling: Number: 6305 Inputs: u1: feed flow to the first evaporator stage u2: vapor flow to the first evaporator stage u3: cooling water flow Outputs: y1: dry matter content y2: flow of the outcoming product y3: temperature of the outcoming product References: - Zhu Y., Van Overschee P., De Moor B., Ljung L., Comparison of three classes of identification methods. Proc. of SYSID '94, Vol. 1, 4-6 July, Copenhagen, Denmark, pp.~175-180, 1994. Properties: Columns: Column 1: input u1 Column 2: input u2 Column 3: input u3 Column 4: output y1 Column 5: output y2 Column 6: output y3 --------------------------------------------------------------------------------------------- powerplant.dat NB, the data was originaly 200 by 12, I concatenated it to 2400x1 (ek) This file describes the data in the powerplant.dat file. 1. Contributed by: Peter Van Overschee K.U.Leuven - ESAT - SISTA K. Mercierlaan 94 3001 Heverlee Peter.Vanoverschee@esat.kuleuven.ac.be 2. Process/Description: data of a power plant (Pont-sur-Sambre (France)) of 120 MW 3. Sampling time 1228.8 sec 4. Number of samples: 200 samples 5. Inputs: 1. gas flow 2. turbine valves opening 3. super heater spray flow 4. gas dampers 5. air flow 6. Outputs: 1. steam pressure 2. main stem temperature 3. reheat steam temperature 7. References: a. R.P. Guidorzi, P. Rossi, Identification of a power plant from normal operating records. Automatic control theory and applications (Canada, Vol 2, pp 63-67, sept 1974. b. Moonen M., De Moor B., Vandenberghe L., Vandewalle J., On- and off-line identification of linear state-space models, International Journal of Control, Vol. 49, Jan. 1989, pp.219-232 -------------------------------------------------------------------------------------------- glassfurnace This file describes the data in the glassfurnace.dat file. 1. Contributed by: Peter Van Overschee K.U.Leuven - ESAT - SISTA K. Mercierlaan 94 3001 Heverlee Peter.Vanoverschee@esat.kuleuven.ac.be 2. Process/Description: Data of a glassfurnace (Philips) 3. Sampling time 4. Number of samples: 1247 samples 5. Inputs: a. heating input b. cooling input c. heating input 6. Outputs: a. 6 outputs from temperature sensors in a cross section of the furnace 7. References: a. Van Overschee P., De Moor B., N4SID : Subspace Algorithms for the Identification of Combined Deterministic-Stochastic Systems, Automatica, Special Issue on Statistical Signal Processing and Control, Vol. 30, No. 1, 1994, pp. 75-93 b. Van Overschee P., "Subspace identification : Theory, Implementation, Application" , Ph.D. Thesis, K.U.Leuven, February 1995. --------------------------------------------------------------------------------------------- flutter Contributed by: Favoreel KULeuven Departement Electrotechniek ESAT/SISTA Kardinaal Mercierlaan 94 B-3001 Leuven Belgium wouter.favoreel@esat.kuleuven.ac.be Description: Wing flutter data. Due to industrial secrecy agreements we are not allowed to reveal more details. Important to know is that the input is highly colored. Sampling: Number: 1024 Inputs: u: Outputs: y: References: Feron E., Brenner M., Paduano J. and Turevskiy A.. "Time-frequency analysis for transfer function estimation and application to flutter clearance", in AIAA J. on Guidance, Control & Dynamics, vol. 21, no. 3, pp. 375-382, May-June, 1998. Properties: Columns: Column 1: input u Column 2: output y ---------------------------------------------------------------------------------------------- robot_arm Contributed by: Favoreel KULeuven Departement Electrotechniek ESAT/SISTA Kardinaal Mercierlaan 94 B-3001 Leuven Belgium wouter.favoreel@esat.kuleuven.ac.be Description: Data from a flexible robot arm. The arm is installed on an electrical motor. We have modeled the transfer function from the measured reaction torque of the structure on the ground to the acceleration of the flexible arm. The applied input is a periodic sine sweep. Sampling: Number: 1024 Inputs: u: reaction torque of the structure Outputs: y: accelaration of the flexible arm References: We are grateful to Hendrik Van Brussel and Jan Swevers of the laboratory of Production Manufacturing and Automation of the Katholieke Universiteit Leuven, who provided us with these data, which were obtained in the framework of the Belgian Programme on Interuniversity Attraction Poles (IUAP-nr.50) initiated by the Belgian State - Prime Minister's Office - Science Policy Programming. Properties: Columns: Column 1: input u Column 2: output y ------------------------------------------------------------------------------------------- foetal_ecg Cutaneous potential recordings of a pregnant woman Contributed by: Lieven De Lathauwer lieven.delathauwer@esat.kuleuven.ac.be Description: cutaneous potential recordings of a pregnant woman (8 channels) Sampling: 5 sec Number: 2500 x 8 Inputs: Outputs: 1-5: abdominal 6,7,8: thoracic References: Dirk Callaerts, "Signal Separation Methods based on Singular Value Decomposition and their Application to the Real-Time Extraction of the Fetal Electrocardiogram from Cutaneous Recordings", Ph.D. Thesis, K.U.Leuven - E.E. Dept., Dec. 1989. L. De Lathauwer, B. De Moor, J. Vandewalle, "Fetal Electrocardiogram Extraction by Source Subspace Separation", Proc. IEEE SP / ATHOS Workshop on HOS, June 12-14, 1995, Girona, Spain, pp. 134-138. Jean-Francois Cardoso, "Multidimensional independent component analysis", Proc. ICASSP '98. Seattle, 1998 Available on the net: ftp://sig.enst.fr/pub/jfc/Papers/icassp98.ps More details: http://sig.enst.fr/~cardoso/RRicassp98.html Homepage author: http://sig.enst.fr/~cardoso/stuff.html Properties: Columns: Column 1: time-steps Column 2-9: observations -------------------------------------------------------------------------------------------- tongue Contributed by: De Lathauwer Lieven K.U.Leuven, E.E. Dept.- ESAT K. Mercierlaan 94 B-3001 Leuven (Heverlee) Belgium Lieven.DeLathauwer@esat.kuleuven.ac.be Description: The dataset is a real-valued (5x10x13)-array; basically, it was obtained as follows. High-quality audio recordings and cine-fluorograms were made of five English-speaking test persons while saying sentences of the form ``Say h({\em vowel})d again'' (substitution: ``heed, hid, hayed, head, had, hod, hawed, hoed, hood, who'd''). For each of these 10 vowels an acoustic reference moment was defined and the corresponding 5 (corresponding to the different speakers) frames in the film located. Next, speaker-dependent reference grids, taking into account the anatomy of each test person, were defined and superimposed on the remaining x-ray images. The grids consisted of 13 equidistant lines, in the region from epiglottis to tongue tip, more or less perpendicular to the midline of the vocal tract. The array entries now consist of the distance along the grid lines between the surface of the tongue and the harder upper surface of the vocal tract. The values are given in centimeters and have been measured to the nearest 0.5 mm. For a more extensive description of the experiment, we refer to [1]. Sampling: Number: Inputs: Outputs: x: speakers y: vowels z: positions References: [1] R. Harshman, P. Ladefoged, L. Goldstein, ``Factor Analysis of Tongue Shapes'', J. Acoust. Soc. Am., Vol. 62, No. 3, Sept. 1977, pp. 693-707. [2] L. De Lathauwer, Signal Processing based on Multilinear Algebra, Ph.D. Thesis, K.U.Leuven, E.E. Dept., Sept. 1997. Properties: Columns: Column 1: (speaker number - 1) x 10 + vowel number Columns 2-14: displacement values ------------------------------------------------------------------------------------------- ballbeam Data of the ball-and-beam setup in SISTA This file describes the data in the ballbeam.dat file. 1. Contributed by: Peter Van Overschee K.U.Leuven - ESAT - SISTA K. Mercierlaan 94 3001 Heverlee Peter.Vanoverschee@esat.kuleuven.ac.be 2. Process/Description: Data of a the ball and beam practicum at ESAT-SISTA. 3. Sampling time 0.1 sec. 4. Number of samples: 1000 samples 5. Inputs: a. angle of the beam 6. Outputs: a. position of the ball 7. References: a. Van Overschee P., "Subspace identification : Theory, Implementation, Application" , Ph.D. Thesis, K.U.Leuven, February 1995, pp. 200-206 8. Known properties/peculiarities -------------------------------------------------------------------------------------------- wind daily average wind speeds for 1961-1978 at 12 synoptic meteorological stations in the Republic of Ireland (Haslett and raftery 1989). These data were analyzed in detail in the following article: Haslett, J. and Raftery, A. E. (1989). Space-time Modelling with Long-memory Dependence: Assessing Ireland's Wind Power Resource (with Discussion). Applied Statistics 38, 1-50. Each line corresponds to one day of data in the following format: year, month, day, average wind speed at each of the stations in the order given in Fig.4 of Haslett and Raftery : RPT, VAL, ROS, KIL, SHA, BIR, DUB, CLA, MUL, CLO, BEL, MAL Fortan format : ( i2, 2i3, 12f6.2) The data are in knots, not in m/s. Permission granted for unlimited distribution. Please report all anomalies to fraley@stat.washington.edu Be aware that the dataset is 532494 bytes long (thats over half a Megabyte). Please be sure you want the data before you request it. ----------------------------------------------------------------------------------------------- ballon The data consist of 2001 observations taken from a balloon about 30 kilometres above the surface of the earth. In the section of the flight shown here the balloon increases in height. As radiation increases with height there is a non-decreasing trend in the data. The outliers are caused by the fact that the balloon slowly rotates, causing the ropes from which the measuring instrument is suspended to cut off the direct radiation from the sun. The first column contains the raw data, the second column the residuals after the removal of a non-decreasing trend. Reference: Davies, L. and Gather, U. (1993), "The Identification of Multiple Outliers" (discussion paper), to appear in JASA. Mailing address: Laurie Davies Universitaet-Gesamthochschule Essen Fachbereich 6 Mathematik Universitaetsstrasse 3 D-4300 Essen 1 Germany ---------------------------------------------------------------------------------------------- wool DATA-SETS FROM DIGGLE, P.J. (1990). TIME SERIES : A BIOSTATISTICAL INTRODUCTION. Oxford University Press. ---------------------------------------------------------------------------------------------- standardandpoor500 Standard and Poor's 500 Index closing values from 1926 to 1993. First column contains the date (yymmdd), second column contains the value. These data are used in: E.Ley (1996): "On the Peculiar Distribution of the U.S. Stock Indices;" forthcoming in The American Statistician. ---------------------------------------------------------------------------------------------- speech ---------------------------------------------------------------------------------------------- earthquake 4096x1 http://lib.stat.cmu.edu/general/tsa/tsa.html ----------------------------------------------------------------------------------------------- soiltemp 2304x1 http://lib.stat.cmu.edu/general/tsa/tsa.html ----------------------------------------------------------------------------------------------- buoy_sensor.dat start_time npts samp_interv depth lat lon yymmddhhmmss #_pts seconds meters deg deg 920404080000 13991 1800.00 14.00 34.502 120.718 north east temp salinity m_s m_s deg_C Psu These are the ascii timeseries files for the CAMP current meter moorings First deployed in April 1992. Filename convention. camp_pr_.014 ^ ^ ^ ^ | | | |_____ Depth of Instrument in meters | | | | | |_______ Flag: "_" indicates full series, otherwise part "a" or "b" | | | |_________ Deployment Set. | pr = Primary Mooring | s1 = Seconday Mooring, 1st Deplyoment | s2 = Seconday Mooring, 2nd Deplyoment | s3 = Seconday Mooring, 3rd Deplyoment | |______________ Project name: camp = California Monitoring Program ---- Listing of files and instrument type used: SACM = Smart Acoustic Current Meter GOMKII = General Oceanics Mark II Current Meter ---- camp_pr_.014 sacm camp_pr_.054 gomkII camp_pr_.126 gomkII/sacm - This is a concatenation of next two files camp_pra.126 gomkII -1st Half camp_prb.126 sacm -2nd Half camp_s1_.015 gomkII camp_s1_.054 gomkII camp_s1_.126 gomkII camp_s2_.014 gomkII camp_s2_.087 gomkII camp_s3_.014 gomkII camp_s3_.054 gomkII ----------------------------------------------------------------------------------------------- infrasound_beamd Infrasound from explosions. 3 channels and the beam - 2048 points each in one column. ----------------------------------------------------------------------------------------------- network, from M Faloutsos > The dataset is packet Round Trip Time (RTT) delay. Packets were sent from > UCR to CMU and the values describe the RTT delay of each of the packets. If > there are zeros in the dataset this means that some packets were lost. For > the specific dataset, I think that the sending rate was 20msec and the > packet size 400 bytes (8kbs total sending rate). ----------------------------------------------------------------------------------------------- burst ----------------------------------------------------------------------------------------------- eeg 21 lead eeg, first column is just time axis. Japanese healthy adult female. ----------------------------------------------------------------------------------------------- koski_ecg.dat http://www2.cs.utu.fi/staff/antti.koski/abs.html This dataset contains all 5 signals at the above url concatenated together See On Structural Recognition and Analysis Methods Applied to ECG Signals Antti Koski (Research Reports R-97-1, ISBN 951-29-0885-9, ISSN 1235-6727) Computer Science, University of Turku Lemminkäisenkatu 14 A, 20520 Turku FINLAND ----------------------------------------------------------------------------------------------- random_walk classic random walk data x(j) = x(j-1) + randn; See Keogh, E,. Chakrabarti, K,. Pazzani, M. & Mehrotra (2000) Dimensionality reduction for fast similarity search in large time series databases. Journal of Knowledge and Information Systems. Park, S., Lee, D., & Chu, W. (1999). Fast retrieval of similar subsequences in long sequence databases. In 3 rd IEEE Knowledge and Data Engineering Exchange Workshop. Wang, C. & Wang, S. (2000). International Conference on Scientific and Statistical Database Management. Yi, B,K., & Faloutsos, C.(2000). Fast time sequence indexing for arbitrary Lp norms. Proceedings of the 26 st International Conference on Very Large Databases, Cairo, Egypt. ----------------------------------------------------------------------------------------------- greatlakes 984x5 {Ivan Popivanow} in order from left to right, they are erie,huron,ontario,stclair,superior ------------------------------------------------------------------------------------------------ leleccum 4320x1 A test time series that come free with matlab --------------------------------------------------------------------------------------------------- attas.dat 1024x2 {Frank Hoppner} Taken from a device outside the hull of an airplane (ATTAS, experimental aircraft of German Aerospace Center). See "Learning Dependencies in Multivariate Time Series" by Frank Hoppner --------------------------------------------------------------------------------------------------- packets 360000x1 From M Faloutsos --------------------------------------------------------------------------------------------------- burstin 50000x1 http://cs.nyu.edu/cs/faculty/shasha/papers/burst.d/burst.html Yunyue Zhu Courant Institute of Mathematical Sciences Department of Computer Science New York University yunyue@cs.nyu.edu http://cs.nyu.edu/yunyue/index.html ------------------------------------------------------------------------------------------------------ phone1.txt Data from a Europen project "Technology for Enabling Awareness" Sensors attached to a phone. From left to right X Acc Y Acc Light1 Light2 Touch Microp1 Microp2 Temp The title of the data is "Picking up phone and laying it down" __________________________________________________________________________________________________________ The motorCurrent.dat file contains 21 classes of signals with 20 examples of each signal class for a total of 420 signals. The descriptions corresponding to each class are as follows: 0 - healthy 1 - 1 broken bar 2 - 2 broken bars 3 - 3 broken bars 4 - 4 broken bars 5 - 5 broken bars 6 - 6 broken bars 7 - 7 broken bars 8 - 8 broken bars 9 - 9 broken bars 10 - 10 broken bars 11 - 1 broken connector 12 - 2 broken connectors 13 - 3 broken connectors 14 - 4 broken connectors 15 - 5 broken connectors 16 - 6 broken connectors 17 - 7 broken connectors 18 - 8 broken connectors 19 - 9 broken connectors 20 - 10 broken connectors The signals were generated using a Time Stepping Coupled Finite Element - State Space simulation. Each signal is 1500 samples of the A phase current in amps. The sampling rate was 33.3kHz. The details of the simulation method are found in N. A. O. Demerdash and J. F. Bangura, "Characterization of induction motors in adjustable-speed drives using a time-stepping coupled finite element state-space method including experimental validation," IEEE Transactions On Industry Applications, vol. 35, pp. 790-802, July/Aug. 1999. Signal classification results using this dataset and slight variants of this dataset have been reported in the following publications: Richard J. Povinelli, Michael T. Johnson, Andrew C. Lindgren, Jinjin Ye. (in press) "Time Series Classification using Gaussian Mixture Models of Reconstructed Phase Spaces," IEEE Transactions on Knowledge and Data Engineering. John F. Bangura, Richard J. Povinelli, Nabeel A.O. Demerdash, Ronald H. Brown. (2003). "Diagnostics of Eccentricities and Bar/End-Ring Connector Breakages in Polyphase Induction Motors through a Combination of Time-Series Data Mining and Time-Stepping Coupled FE-State Space Techniques," IEEE Transactions On Industry Applications, vol. 39, no. 4, July/August, 1005-1013. Richard J. Povinelli, John F. Bangura, Nabeel A.O. Demerdash, Ronald H. Brown. (2002). "Diagnostics of Bar and End-Ring Connector Breakage Faults in Polyphase Induction Motors Through a Novel Dual Track of Time-Series Data Mining and Time-Stepping Coupled FE-State Space Modeling," IEEE Transactions on Energy Conversion, vol. 17, no. 1, March 2002, 39-46. Richard J. Povinelli, Michael T. Johnson, Nabeel A.O. Demerdash, John F. Bangura. (2002). "A Comparison of Phase Space Reconstruction and Spectral Coherence Approaches for Diagnostics of Bar and End-Ring Connector Breakage and Eccentricity Faults in Polyphase Induction Motors using Motor Design Particulars," IEEE Industry Applications Society Annual Meetings, 1541-1547. Richard J. Povinelli, John F. Bangura, Nabeel A.O. Demerdash, Ronald H. Brown. (2001). "Diagnostics of Faults in Induction Motor ASDs Using Time-Stepping Coupled Finite Element State-Space and Time Series Data Mining Techniques," Third Naval Symposium on Electric Machines (EM2000). If you publish results using this dataset, we would greatly appreciated learning of your publication. You may contact us at richard.povinelli@marquette.edu. ------------------------------------------------------------------------------------------------------------------------ dataset_kalpakis.zip See Konstantinos Kalpakis, Dhiral Gada, and Vasundhara Puttagunta, "Distance Measures for Effective Clustering of ARIMA Time-Series". In the Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM'01), San Jose, CA, November 29-December 2, 2001, pp. 273-280. ECG Temperature Population Per Capita Income ------------------------------------------------------------------------------------------------------------------------------ Fluid_dynamics.dat Brijesh Garabadu, Cindi Thompson, Gary Lindstrom, Joe Klewicki (2003). Fast and Accurate NN Approach for Multi-Event Annotation of Time Series . UUCS-03-021. ------------------------------------------------------------------------------------------------------------------------------------------------------------- GunX 100*151 http://www.cs.ucr.edu/~eamonn/ See paper by Chotirat Ann Ratanamahatana and Eamonn Keogh in SDM04 GUN / POINT This 2-class dataset comes from the video surveillance domain. The dataset has two classes, each containing 100 instances. All instances were created using one female actor and one male actor in a single session. The two classes are: · Gun-Draw: The actors have their hands by their sides. They draw a replicate gun from a hip-mounted holster, point it at a target for approximately one second, then return the gun to the holster, and their hands to their sides. · Point: The actors have their hands by their sides. They point with their index fingers to a target for approximately one second, and then return their hands to their sides. For both classes, we tracked the centroid of the right hand in both the X- and Y-axes; however, this dataset only contain X-axis for simplicity. Each instance has the same length of 150 data points (plus the class label), and is z-normalized (mean = 0, std = 1). Below are some examples of the dataset: (left) GUN, (right) POINT. Classification Error Rates (%): Euclidean: 5.50% DTW with 10% warping window size: 4.50% DTW with the best (3.25%) uniform warping window size: 1.00% ----------------------------------------------------------------------------------------------------------------------------------------------------- TRACE See paper by Chotirat Ann Ratanamahatana and Eamonn Keogh in SDM04 This 4-class dataset is a subset of the Transient Classification Benchmark (trace project). It is a synthetic dataset designed to simulate instrumentation failures in a nuclear power plant, created by Davide Roverso. The full dataset consists of 16 classes, 50 instances in each class. Each instance has 4 features. The TRACE subset only uses the second feature of class 2 and , and the third feature of class 3 and 7. Hence, this dataset contains 200 instances, 50 for each class. All instances are linearly interpolated to have the same length of 275 data points, and are z-normalized. Classification Error Rates (%): Euclidean: 11.00% DTW with 10% warping window size: 0.00% DTW with the best (3.375%) uniform warping window size: 0.00% ----------------------------------------------------------------------------------------------------------------------------------------------------- There're 442 instances total (442 rows), 151 columns (150 data points for each time series, the first column is the class labels: 1-6) Class1: 1-66 Class2: 67-150 Class3: 151-225 Class4: 226-322 Class5: 323-404 Class6: 405-442 Created by Chotirat Ann Ratanamahatana and Eamonn Keogh from images by Thomas G. Dietterich and Ashit Gandhi ----------------------------------------------------------------------------------------------------------------------------------------------------- 1. There are 30 files, each file for one stock. The file name format: symbol_1.txt For example: msft_1.txt which contains 1 minute information of stock symbol 'msft' (Microsoft Corp). 2. For each stock, the open, high, low, close prices for each minute, from Nov. 11th, 2002 to Sep. 12th, 2003, are accumulated. 3. For each record of a file, the format is: date, time, open, high, low, close, volume 4. For example, for one record: 102-10-11, 12:11,54.3100,54.3500,54.3000,54.3200,1692 where date is 102-10-11, which means November 11th, 2002. Year: 102 means 2002, while 103 means 2003. Month: 0 means January, ...., 11 means December. Time is: 12:11, which means 11 past 12'clock. Open price: 54.3100 High price: 54.3500 Low price: 54.3000 Close price: 54.3200 Volume is 1692, which mean 1692 shares of stock have been traded during that minute. ----------------------------------------------------------------------------------------------------------------------------------------------------- ; TWO-PAT dataset - 5000 cases - 128 time steps - 4 classes ; ; For more information about this dataset, see: ; P.Geurts. Contributions to decision tree induction: bias/variance tradeoff and time series classification. ; PhD thesis, Department of Electrical Engineering, University of Liege, Belgium, may 2002. ; http://www.montefiore.ulg.ac.be/~geurts/thesis.html ; class labels: 1: down-down (1306 cases) 2: up-down (1248 cases) 3: down-up (1245 cases) 4: up-up (1201 cases) ----------------------------------------------------------------------------------------------------------------------------------------------------- Laser.txt [top of document]; [next data set]; This is a univariate time record of a single observed quantity, measured in a physics laboratory experiment. Full Description: The data was contributed by Udo Huebner, Phys.-Techn. Bundesanstalt, Braunschweig, Germany, and were collected primarily by N. B. Abraham and C. O. Weiss. These data were recorded from a Far-Infrared-Laser in a chaotic state; here is the description from Dr. Huebner: The measurements were made on an 81.5-micron 14NH3 cw (FIR) laser, pumped optically by the P(13) line of an N2O laser via the vibrational aQ(8,7) NH3 transition. The basic laser setup can be found in Ref. 1. The intensity data was recorded by a LeCroy oscilloscope. No further processing happened. The experimental signal to noise ratio was about 300 which means slightly under the half bit uncertainty of the analog to digital conversion. The data is a cross-cut through periodic to chaotic intensity pulsations of the laser. Chaotic pulsations more or less follow the theoretical Lorenz model (see References) of a two level system. The data was analyzed. References are e.g.: 1. U. Huebner, N. B. Abraham, and C. O. Weiss: ``Dimensions and entropies of chaotic intensity pulsations in a single-mode far-infrared NH3 laser.'' Phys. Rev. A 40, p. 6354 (1989) 2. U. Huebner, W. Klische, N. B. Abraham, and C. O. Weiss: ``On problems encountered with dimension calculations.'' Measures of Complexity and Chaos; Ed. by N. B. Abraham et. al., Plenum Press, New York 1989, p. 133 3. U. Huebner, W. Klische, N. B. Abraham, and C. O. Weiss: ``Comparison of Lorenz-like laser behavior with the Lorenz model.'' Coherence and Quantum Optics VI; Ed. by J. Eberly et. al., Plenum Press, New York 1989, p. 517 ----------------------------------------------------------------------------------------------------------------------------------------------------- Physiological_data_B1 Physiological_data_B2 Original Description: This is a multivariate data set recorded from a patient in the sleep laboratory of the Beth Israel Hospital in Boston, Massachusetts (data submitted by David Rigney and Ary Goldberger). The file has been split into two sequential parts, Physiological_data_B1 and Physiological_data_B2 the lines in the files are spaced by 0.5 seconds. The first column is the heart rate, the second is the chest volume (respiration force), and the third is the blood oxygen concentration (measured by ear oximetry). The heart rate was determined by measuring the time between the QRS complexes in the electrocardiogram, taking the inverse, and then converting this to an evenly sampled record by interpolation. There were no premature beats - sudden changes in the heart rate are not artifacts. The respiration and blood oxygen data are given in uncalibrated A/D bits; these two sensors slowly drift with time (and are therefore occasionally rescaled by a technician) and can be detached by the motion of the patient, hence their calibration is not constant over the data set. They were converted from 250 Hz to 2 Hz data by averaging over a 0.08 second window at the times of the heart rate samples. Between roughly 4 hours 30 minutes and 4 hours 34 minutes from the start of the file the sensors were disconnected. The following table gives the times and stages of sleep , as determined by a neurologist looking at the EEG (W = awake, 1 and 2 = waking/sleep stages, R = REM sleep): 2:00: W, 2:30: 1, 3:30: W, 9:30: 1, 10:00: W, 11:00: 1, 12:00: W, 15:30: 1, 16:00: 2, 36:30: 1, 38:30: W, 39:30: 1, 42:30: 2, 44:00: 1, 44:30: 2, 45:00: W, 46:00: 1, 47:00: W, 47:30: 2, 48:30: 1, 50:00: 2, 50:30: 1, 51:00: 2, 51:30: 1, 52:00: 2, 52:30: W, 53:00: 1, 53:30: W, 55:00: 1, 56:00: 2, 1:21:30: W, 1:22:30: 1, 1:25:00: W, 1:30:00: 1, 1:30:30: W, 1:31:00: 1, 1:31:30: W, 1:34:00: 1, 1:35:00: W, 1:38:30: 1, 1:39:00: W, 1:40:00: 1, 1:40:30: W, 1:42:00: 1, 1:42:30: 2, 1:44:00: 1, 1:50:30: 2, 2:04:30: R, 2:21:00: W, 2:22:00: 1, 2:22:30: W, 2:25:00: 1, 2:43:30: W, 2:47:30: 1, 2:48:30: W, 2:50:00: 1, 2:57:30: W, 2:58:30: 1, 2:59:00: W, 3:00:00: 1, 3:00:30: W, 3:01:00: 1, 3:05:00: W, 3:17:30: 1, 3:18:00: 2, 3:21:00: W, 3:21:30: 1, 3:22:00: W, 3:43:00: 1, 4:11:00: W, 4:11:30: 1, 4:12:00: W, 4:25:00: 1, 4:27:00: W, 4:27:30: 1, 4:28:00: W, 4:43:30: 1, 4:44:00: 2, 4:44:30: 1, 4:45:00: 2, 4:47:00: 1, 4:47:30: 2, 4:48:30: 1, 4:49:00: 2, 4:49:30: 1, 4:50:00: 2, 4:52:00: 1, 4:52:30: 2, 4:54:00: 1, 4:54:30: 2, 4:57:30: 1, 4:58:00: 2 This patient shows sleep apnea (periods during which he takes a few quick breaths and then stops breathing for up to 45 seconds). Sleep apnea is medically important because it leads to sleep deprivation and occasionally death. There are three primary research questions associated with this data set: 1. Can part of the temporal variation in the heart rate be explained by a low-dimensional mechanism, or is it due to noise or external inputs? 2. How do the evolution of the heart rate, the respiration rate, and the blood oxygen concentration affect each other? (a correlation between breathing and the heart rate, called sinus arrhythmia, is almost always observed). 3. Can the episodes of sleep apnea (stoppage of breathing) be predicted from the preceding data? ----------------------------------------------------------------------------------------------------------------------------------------------------- Astrophysical data 27704*1 Original Description: (27704 points) This is a set of measurements of the light curve (time variation of the intensity) of the variable white dwarf star PG1159-035 during March 1989. It was recorded by the Whole Earth Telescope (a coordinated group of telescopes distributed around the earth that permits the continuous observation of an astronomical object) and submitted by James Dixson and Don Winget of the Department of Astronomy and the McDonald Observatory of the University of Texas at Austin. The telescope is described in an article in The Astrophysical Journal (361), p. 309-317 (1990), and the measurements on PG1159-035 will be described in an article scheduled for the September 1 issue of the Astrophysical Journal. The observations were made of PG1159-035 and a nonvariable comparison star. A polynomial was fit to the light curve of the comparison star, and then this polynomial was used to normalize the PG1159-035 signal to remove changes due to varying extinction (light absorption) and differing telescope properties. The samples in the files are all integrations spaced at 10 second intervals. The number of points and starting times of the parts are part points start time ---- ------ ---------- 1, 618, 521048.7 2, 1256, 526881.9 3, 1222, 539951.9 4, 980, 550941.5 5, 550, 559402.8 6, 1554, 566422.8 7, 1937, 585517.5 8, 2496, 613164.2 9, 1941, 633834.8 10, 1472, 647065.1 11, 2605, 671536.7 12, 1549, 699206.4 13, 2568, 707915.6 14, 2602, 731247.2 15, 673, 764048.0 16, 1512, 774058.0 17, 1669, 794053.6 where the times are in seconds from the beginning of the observational run. The intensity variations of the star arise from the excited modes, which are spherical harmonics. For a given mode Y_(klm), for each l value there will be 2l+1 m modes. For a fixed star the m modes have the same frequency; rotation of the star and magnetic fields split this degeneracy. ----------------------------------------------------------------------------------------------------------------------------------------------------- ECG_znorm205.txt this is a two class problem (normal & abnormal), the first column is the class label. This is not an interesting dataset, since euclidean distacne gets 100% S. Kim, P. Smyth, and S. Luther(2004) Modeling waveform shapes with random effects segmental hidden Markov models. Technical Report UCI-ICS 04-05, March 2004. (shorter version will appear in Proceedings of the 20th International Conference on Uncertainty in AI , July 2004. ----------------------------------------------------------------------------------------------------------------------------------------------------- cam_mouse_x cam_mouse_y The classic camaera mouse datasets (broken into x and y vectors cam_mouse_x_znorm cam_mouse_y_znorm both are normalized to the same length of 1719 data points (the longest time series). ----------------------------------------------------------------------------------------------------------------------------------------------------- Nasa_valve There is a readme file with details in the zip file. Thanks to Matt Mahoney and Philip Chan S. Salvador, P. Chan, J. Brodie, Learning States and Rules for Time Series Anomaly Detection, Florida Tech TR-CS-2003-05, 2003. M. Mahoney, Instantaneous Compression for Anomaly Detection in NASA Value Solenoid Current Traces, 2003. S. Salvador, P. Chan, J. Brodie, Learning States and Rules for Time Series Anomaly Detection, Florida Tech TR-CS-2003-05, 2003. ----------------------------------------------------------------------------------------------------------------------------------------------------- HydroData A few short River-Level datasets (Nile, Senegal). Probably too short for data miners... ----------------------------------------------------------------------------------------------------------------------------------------------------- gait_interval http://www.physionet.org/physiobank/database/gaitdb/ Walking stride interval time series included are from 15 subjects: From left to right 5 healthy young adults (23 - 29 years old), 5 healthy old adults (71 - 77 years old) and 5 older adults (60 - 77 years old) with Parkinson's disease. ----------------------------------------------------------------------------------------------------------------------------------------------------- physiodata The data was provided by: Dr. J. Rittweger Institute for Physiology Free University of Berlin Arnimallee 22 14195 Berlin Germany 1. EEG from position O1, sampling rate 100 Hz; 15 files -> npo1?? 2. band-pass filtered EOG (0.05-1Hz), sampling rate 10 Hz; 15 files -> npog??.bp 3. respiration (thorax extension), sampling rate 10 Hz; 15 files -> nprs?? 4. manual segmentation by a medical expert, sampling rate 10 Hz; 15 files -> npsegm?? meaning: 4 = eyes open, awake; 3 = eyes closed, awake; 2 = sleep stage I; 1 = sleep stage II; -1 = no assessment; -2 = artifact in EOG; (?? means ) see Kohlmorgen, J., Müller, K.-R., Rittweger, J., Pawelzik, K. (2000), Identification of Nonstationary Dynamics in Physiological Recordings, Biological Cybernetics 83 (1), 73-84, Springer Berlin Heidelberg. You can download the paper here: http://www.first.gmd.de/persons/Kohlmorgen.Jens/publications.html ___________________________________________________________ Dr. Jens Kohlmorgen GMD - FIRST e-mail: jek@first.gmd.de Kekuléstraße 7 phone: +49 30 6392-1875 D-12489 Berlin fax: +49 30 6392-1805 Germany http://www.first.gmd.de/persons/Kohlmorgen.Jens.html ___________________________________________________________ Institute for Computer Architecture and Software Technology of the German National Research Center for Computer Science ___________________________________________________________