datasets for the course book
Dekking, F.M., Kraaikamp, C., Lopuhaa, H.P., Meester, L.E., A Modern Introduction to Probability and Statistics. Second Edition. Springer 2007
ISBN: 978-1-85233-896-1
M3.1. Implement a Matlab function that computes P(B_n) for a given n,
where P(B_n) is the probability of no coincident birthdays in a group of n arbitrarily chosen people, p. 28 in the book.
Create a file in Matlab with the name "Prob_no_coincident_birthdays.m" that defines a function with input n and output P(B_n).
M4.1. Let Z be the number of times a 6 appeared in five independent throws of a die.
Describe the probability distribution of Z by
(1) plotting the probability mass function p_Z and
(2) plotting the cumulative distribution function F_Z of Z.
Hint: compare to Ex. 4.1.
M4.2.
(a) Plot the probability mass function and the cumulative distribution function of a binomial distribution for a few different values of the parameter p.
How does their shapes changes as the function of p?
(b) Plot the probability mass function and the cumulative distribution function of a geometric distribution for a few different values of the parameter p.
How does their shapes changes as the function of p?
You do not need to submit the Matlab files for this assignment, just a report in MSWord showing a few figures and your conclusions would do.
M6.1.
Generate 10,000 samples of a random variable with exponential distribution using simulation method.
The exponential random variable is a standard one with a mean of 10.
Plot the distribution function of the samples by summing the number of samples less than certain value.
Plot also the distribution function of the exponential distribution random variable using its mathematical equation.
As the third figure, plot both functions together in one figure.
Submit a Matlab function or a script that generates the three figures.
Your function or script must contain comments that explain your steps.
M6.2 With reference to Chapter 6.4, generate a figure that plots the average waiting times at the well for pump capacities of 1 and 5 for n=1:50.
Generate two figures with graphs for work in system for pump capacities of 1 and 5 for time t=1:100.
Submit a Matlab function or a script that generates the three figures.
M7.1 Approximate the expectation and the variance of a random variable using simulations.
We are able to approximate the parameters of a random variable through simulations.
For example, by sampling n values of a uniform variable, we can get the approximation of the expectation by averaging all the samples. Similarly, we can get a approximation to the variance.
Your work in this lab assignment is to investigate the relationship between the approximation ratio and the sample quantity.
Two types of random variables are considered: Par() and Exp(). Choose the parameters yourself. Then, we need to generate 10, 100, 1,000, 10,000 samples.
For each sample set, compute its corresponding approximations to the expectation and the variance.
Plot a figure to show the effect of different numbers of samples on the approximation ratio. (note that you need also to plot the theoretical values in the same figure).
M10.1 We have a real dataset aaup.data, which records statistics concerning US universities and colleges.
Each column records a certain kind of property of each school.
Right now, you may find it's totally mysterious what the meaning of each number.
However, we are only interested in the correlation between data in different columns.
In this experiment, you are going to determine which columns are positively correlated, negatively correlated, or uncorrelated.
To achieve this, first, you need to choose several columns, then compute their covariance and correlation coefficients.
You should pick at least 3 pairs of columns for this experiment.
In your report, show your results and the scatterplots (like in 10.15). Be sure to list which columns you have considered.
In order to read the data into Matlab, use dlread() function. First, change the current directory to where the aaup.data resides.
Then type "m = dlmread('aaup.data', ' ')". Now all data is in the matrix m. You can access any column at your own will.
M15 is composed of 4 problems listed below.
Submit one report showing the solutions to the 4 problems.
M15.1 Compute
histograms of the Old Faithful data (oldfaithful.txt)
with 3 different bin widths.
Which bin width is the most suitable in your opinion?
M15.2 Exercise 15.2 on p. 226. The
dataset is challenger23.txt, additionally plot the
histogram.
M15.3 Compute and plot kernel density estimates of the Old Faithful data with
Epanechnikov kernel and with normal kernel.
For both plots determine reasonable values of the bandwidth parameters.
M15.4
Exercise 15.4
on p. 227. The dataset is software.txt,
additionally plot the histogram.
M16 is composed of 3 problems listed below.
Submit one report showing the solutions to the 3 problems.
M16.1 Exercise 16.1 on p. 240.
The dataset is software.txt
M16.2 Exercise 16.2 on p. 240.
The dataset is oldfaithful.tx
M16.3 We have a
dataset of daily average temperature of Philadelphia since Jan. 1st 1995.
The first column is the month, the second is the day, the third column is the
year, and the forth column is the actual temperature in Fahrenheit.
In this assignments, we are going to practice all the numerical summaries we
learned in Chapter 16.
1. Compute each year's average daily temperature and the
corresponding standard deviation. Draw a figure to represent the yearly average
daily temperature.
2. On one figure, draw box plots for each year. The year is on x-axis. What can
you tell from this figure?
3. Select your favorite month and analyze its temperature in different years.