TA's note on the birthday problem (section 3.2 of Dekking's book)
We'll see that it isn't difficult at all to find the probability of the event that no two students' birthdays in a class of size n ( we've got n about 25-30 people here) fall on the same day. The probability of the complement of this event ( i.e of the event that the birthdays of at least two people here coincide) may surprise you. We'll give intuition later as to why this may not be as surprising as it at first may seem.
Contents
A way of generating random birthdays
Let's ask matlab to pick 23 birthdates at random. We'll imagine the days in a year are numbered 1 through 365 (no Feb 29). For example, Feb 8, the day when your next assignment is due is represented by the number 39 - 39-th day of the year.
days = 365; num_of_students = 23; birth_dates = ceil(rand(1, num_of_students) * days) % pressing the F1 key with the cursor over % 'ceil' or 'rand' pops up a little help % window telling you more about % these functions
birth_dates = Columns 1 through 5 138 44 279 191 247 Columns 6 through 10 102 236 345 170 78 Columns 11 through 15 265 19 191 161 119 Columns 16 through 20 196 129 62 363 89 Columns 21 through 23 198 201 46
A little simulation
Imagine a world in which CIS 2033 is taught all over the place, in 10 000 places in fact ( if that sounds like a lot, allow for different semesters). Furthermore, imagine that for each of these CIS-2033s we find 23 brave enough students to take it. Let's see in how many of these 10 000 classes/cases we observe that the birthdays of all 23 students taking it fall on different days. What do you think it might give us? (Remember we mentioned this frequentist approach to defining probability? If not, it's OK.)
format rat reps = 10000; % number of repetitions (num of times CIS 2033 is taught) rec = NaN(1,reps); % for each repetition we'll keep a record (see loop below) days = 365; num_of_students = 23; for k = 1:reps birth_dates = ceil(rand(1, num_of_students)*days); rec(k) = length(unique(birth_dates)); end % how often (out of 10000) do we get to see coincidences in the birthdays % of the 23 students? freq gives the answer: freq = sum(rec < num_of_students)/reps
freq = 2483/5000
So how could we compute B_n of page 28 of the book?
Let's see if we can get matlab to plot the results similar to those you see in Chapter 3, Fig 3.1. Look at page 28 of the book. First, how does one generate B_n? Let's start small - B_5 - what's the probability that the birthdays of 5 randomly chosen students differ?
num_of_students = 5; days = 365; ones_over_num_of_days = ones(1, num_of_students - 1) * 1/days % a vector of 1/365s numers = (num_of_students - 1: -1: 1) % vector of numerators - see last equation on page 28 of Dekking B_n = ones(1, num_of_students - 1) - numers.*ones_over_num_of_days prob_student_birthdays_differ = prod(ones(1, num_of_students - 1) - numers.*ones_over_num_of_days) % Try varying num_of_students. Do the results make sense to you?
ones_over_num_of_days = 1/365 1/365 1/365 1/365 numers = 4 3 2 1 B_n = 361/365 362/365 363/365 364/365 prob_student_birthdays_differ = 968/995
Can I see a picture?
Yes, we'd like to - some people say it's worth a lot of words. Let's compute a bunch of B_n s and plot the the results:
students = 1:100; days = 365; prob_B_n = []; for k = 2:length(students) % may not make sense to start at k = 1,but matlab won't complain if you do ones_over_num_of_days = ones(1, k - 1) * 1/days; numers = (k - 1: -1: 1); (k - 1: -1: 1).*ones_over_num_of_days; prob_B_n(k) = prod(ones(1, k - 1) - numers.*ones_over_num_of_days); end prob_B_n; plot(students, prob_B_n, 'b.')