datasets for the course book
Dekking, F.M., Kraaikamp, C., Lopuhaa, H.P., Meester, L.E., A Modern Introduction to Probability and Statistics. Second Edition. Springer 2007
ISBN: 978-1-85233-896-1
M4.1. Let Z be the number of times a 6 appeared in five independent throws of a die.
Describe the probability distribution of Z by
(1) plotting the probability mass function p_Z and
(2) plotting the cumulative distribution function F_Z of Z.
M4.2.
(a) Plot the probability mass function and the cumulative distribution function of a binomial distribution for a few different values of the parameter p.
How does their shapes changes as the function of p?
(b) Plot the probability mass function and the cumulative distribution function of a geometric distribution for a few different values of the parameter p.
How does their shapes changes as the function of p?
You do not need to submit the Matlab files for this assignment, just a report in MSWord showing a few figures and your conclusions.
M6.1.
Generate 10,000 samples of a random variable with exponential distribution using simulation method.
The exponential random variable is a standard one with a mean of 10.
Plot the distribution function of the samples by summing the number of samples less than certain value.
Plot also the distribution function of the exponential distribution random variable using its mathematical equation.
As the third figure, plot both functions together in one figure.
M6.2 With reference to Chapter 6.4, generate a figure that plots the average waiting times at the well for pump capacities of 1 and 4 for n=1:20.
Generate a second figure with graphs for work in system for pump capacities of 1 and 4 for time t=0:50.
Submit a Matlab function or a script that generates the two figures.
M7.1 Approximate the expectation and the variance of a random variable using simulations.
We are able to approximate the parameters of a random variable through simulations.
For example, by sampling n values of a uniform variable, we can get the approximation of the expectation by averaging all the samples.
Similarly, we can get a approximation to the variance.
Your work in this lab assignment is to investigate the relationship between the approximation ratio and the sample quantity.
As an example consider a random variable with a normal distribution. Choose the parameters yourself. Then, you need to generate 10, 100, 1,000, 10,000 samples.
For each sample set, compute its corresponding approximations to the expectation and the variance.
Plot one figure for expectation and one for variance to show the effect of different numbers of samples on the approximation ratio.
(Note that you need also to plot the theoretical true values in the same figures).
M10.1 We have a real dataset aaup.data, which records statistics concerning US universities and colleges.
Each column records a certain kind of property of each school.
Right now, you may find it's totally mysterious what the meaning of each number.
However, we are only interested in the correlation between data in different columns.
In this experiment, you are going to determine which columns are positively correlated, negatively correlated, or uncorrelated.
To achieve this, first, you need to choose several columns, then compute their covariance and correlation coefficients.
You should pick at least 3 pairs of columns for this experiment.
You only need to submit a report (MSWord or pdf) containing Matlab figures and your explanation.
In your report, show your results and the scatterplots (like in 10.15). Be sure to list which columns you have considered.
In order to read the data into Matlab, use dlread() function. First, change the current directory to where the aaup.data resides.
Then type "m = dlmread('aaup.data', ' ')". Now all data is in the matrix m. You can access any column at your own will.
M15 is composed of 3 problems listed below.
Submit one report in MSWord or pdf showing the solutions to the 3 problems.
M15.1 Compute
histograms of the Old Faithful data (oldfaithful.txt)
with 3 different bin widths.
Which bin width is the most suitable in your opinion?
M15.2 Compute and plot kernel density estimates of the Old Faithful data with Epanechnikov kernel and with normal kernel.
For both plots determine reasonable values of the bandwidth parameters.
M15.3
Exercise 15.4
on p. 227. The dataset is software.txt,
additionally plot the histogram.
M16 is composed of 3 problems listed below.
M16.1 Exercise 16.1 on p. 240.
The dataset is software.txt
M16.2 Exercise 16.2 on p. 240.
The dataset is oldfaithful.tx
M16.3 We have a
dataset of daily average temperature of Philadelphia since Jan. 1st 1995.
The first column is the month, the second is the day, the third column is the
year, and the forth column is the actual temperature in Fahrenheit.
In this assignments, we are going to practice all the numerical summaries we
learned in Chapter 16.
1. Compute each year's average daily temperature and the
corresponding standard deviation. Draw a figure to represent the yearly average
daily temperature.
2. On one figure, draw box plots for each year. The year is on x-axis. What can
you tell from this figure?
3. Select your favorite month and analyze its temperature in different years.