Math3033 - Matlab Assignments

Fall 2009

Dr. Longin Jan Latecki        TA: SOLOMON A JONES    Grader: YAXIONG ZHAO

 

datasets for the course book

Dekking, F.M., Kraaikamp, C., Lopuhaa, H.P., Meester, L.E., A Modern Introduction to Probability and Statistics. Second Edition. Springer 2007

ISBN: 978-1-85233-896-1

 

 

 

M3.1. Implement a Matlab function that computes P(B_n) for a given n,

where P(B_n) is the probability of no coincident birthdays in a group of n arbitrarily chosen people, p. 28 in the book.

Create a file in Matlab with the name "Prob_no_coincident_birthdays.m" that defines a function with input n and output P(B_n).

 

M4.1. Let Z be the number of times a 6 appeared in five independent throws of a die.

Describe the probability distribution of Z by

(1) plotting the probability mass function p_Z and

(2) plotting the cumulative distribution function F_Z of Z.

Hint: compare to Ex. 4.1.

 

M4.2. Plot the probability mass function and the cumulative distribution function of a binomial distribution for a few different values of the parameter p.

How does their shapes changes as the function of p?

 

M4.3. Plot the probability mass function and the cumulative distribution function of a geometric distribution for a few different values of the parameter p.

How does their shapes changes as the function of p?

 

M6.1.

Generate 10,000 samples of a random variable with exponential distribution using simulation method. 

The exponential random variable is a standard one with a mean of 10. 

Plot the distribution function of the samples by summing the number of samples less than certain value.

Plot also the distribution function of the exponential distribution random variable using its mathematical equation.

Compare the results, draw some useful conclusions.

Write a short report to explain how do you derive the sampling method. A short discussion on the experiment results is also needed.

 

M6.2 With reference to Chapter 6.4, generate a figure that plots the average waiting times at the well for pump capacities of 1 and 5 for n=1:50.

Generate two figures with graphs for work in system for pump capacities of 1 and 5 for time t=1:100.

Submit a Matlab function or a script that generates the three figures.

 

M7.1 Approximate the expectation and the variance of a random variable using simulations.

We are able to approximate the parameters of a random variable through simulations.

For example, by sampling n values of a uniform variable, we can get the approximation of the expectation by averaging all the samples. Similarly, we can get a approximation to the variance. 

Your work in this lab assignment is to investigate the relationship between the approximation ratio and the sample quantity.

Two types of random variables are considered: Par() and Exp(). Choose the parameters yourself. Then, we need to generate 10, 100, 1,000, 10,000 samples.

For each sample set, compute its corresponding approximations to the expectation and the variance.

Plot a figure to show the effect of different numbers of samples on the approximation ratio. (note that you need also to plot the theoretical values in the same figure).

 

M10.1 We have a real dataset aaup.data, which records statistics concerning US universities and colleges.

Each column records a certain kind of property of each school.

Right now, you may find it's totally mysterious what the meaning of each number.

However, we are only interested in the correlation between data in different columns.

In this experiment, you are going to determine which columns are positively correlated, negatively correlated, or uncorrelated.

To achieve this, first, you need to choose several columns, then compute their covariance and correlation coefficients.

You should pick at least 3 pairs of columns for this experiment.

In your report, show your results and the scatterplots (like in 10.15). Be sure to list which columns you have considered.

In order to read the data into Matlab, use dlread() function. First, change the current directory to where the aaup.data resides.

Then type "m = dlmread('aaup.data', ' ')". Now all data is in the matrix m. You can access any column at your own will.

 

M15.1 Compute histograms of the Old Faithful data (oldfaithful.txt) with 5 different bin widths. Which bin width is the most suitable in your opinion?

M15.2 Exercise 15.2 on p. 226. The dataset is challenger23.txt

M15.3 Compute and plot kernel density estimates of the Old Faithful data with Epanechnikov kernel and with normal kernel.
For both plots determine reasonable values of the bandwidth parameters.

M15.4 Exercise 15.4 on p. 227. The dataset is software.txt

 

M16.1 Exercise 16.1 on p. 240. The dataset is software.txt

M16.2 Exercise 16.2 on p. 240. The dataset is oldfaithful.txt

M16.3 We have a dataset of daily average temperature of Philadelphia since Jan. 1st 1995.
The first column is the month, the second is the day, the third column is the year, and the forth column is the actual temperature in Fahrenheit.
In this assignments, we are going to practice all the numerical summaries we learned in Chapter 16.
1. Compute each year's average daily temperature and the corresponding standard deviation. Draw a figure to represent the yearly average daily temperature.
2. On one figure, draw box plots for each year. The year is on x-axis. What can you tell from this figure?
3. Select your favorite month and analyze its temperature in different years.