Simulation Project Document

The material in this web page is not in your textbook and it will be represented in the upcoming Exam.

This document will continue to be updated as issues arise. At this time, the requirements for part 1 and the research component are final, but additional hints may be added. Parts 2 and 3 I'm still working on. SWN 10/22

Added notes to the end of the theory section and finalized part 2. SWN 10/29

Simulation Project

We use simulation when there is no direct mathematical solution readily available for a problem. One important class of simulation problems is dynamic systems with random disruptions. Another class is discrete event systems, in which only the random events are important. In any case, we perform simulation in order to gain an understanding of the likely behavior of a system subject to random influences. We may be interested in statistical properties like the average cost of repairing an automobile when a random part fails or we may wish to investigate how likely a random error is to cause a spacecraft to lose control of its trajectory.

Simulations of the kind we will study depend on a source of random numbers to model the random events, inputs, or errors. Most computers lack any source of truly random numbers so algorithms known as pseudo-random number generators are used. In practice they work quite well for most applications.

When we assume that a value is random we (implicitly or explicitly) assume a distribution for that value. We will limit our study to two distributions, uniform and normal (or Gaussian). Uniform random numbers model situations in which any value in some range is (assumed to be) equally likely. Normal random numbers model errors of the kind typically encountered in measurements. The distances of individual projectiles from the center of a target is likely to be normally distributed, for instance.

The built-in function rand() provides uniform random numbers in C++. I've stolen and adapted a function I call frand() for generating standard normal random numbers; you can steal it from me here.

Uniform values over a range [a,b] can be generated by linearly transforming the result of rand(): a + rand() * (b-a)/double(RAND_MAX). Normally distributed values are characterized by mean (average) and standard deviation (how spread-out the values are). A similar transform, mean + frand()*std_dev, gives appropriately distributed results.

The heart of a simulation program is a model of the system being studied. You have to design a set of variables that keep track of all the important features of the system. For example, a projectile might have position, orientation, and velocity variables. If your model does not involve the orientation (say, you assume a spherical projectile) then you would not need to keep track of orientation.

You then write code to model events and/or the passage of time in your system. Each event or time unit causes the state of your model to evolve.

When the simulation performs to your satisfaction you instrument your model by adding code and variables to keep tract of whatever it is you want to measure. For instance, if you are modeling accuracy of a drill press, you need to keep tract of the positions of the holes it drills, but you do not need to store every variable that determined those positions.

You then must run the simulations many times using a loop in the driver program. The statistical properties of the instrumentation variables over many simulated runs provide an estimate of the statistical properties of corresponding variables in the real system.

Note: The Microsoft random number generator you are using gives exactly the same sequence of numbers each time you run your program. Executing the following line of code once at the beginning of main() should make each run give different numbers:


Note: You can compute the standard deviation of a sequence of numbers without storing them in an array. The formula

sqrt(n sum x^2 + (sum x)^2 )/n

is equivalent to the Standard Deviation formula given in the textbook. We went through the algebra in class.

Note: Several people have asked for clarification on the idea of a simulation `run'. When you simulate a process from start to finish using random numbers, that can be thought of as a run. Since the result is at least somewhat random, you want to simulate that process more than once and compute statistical measures each time you run the overall simulation program. Which meaning the word run has should be clear from context. If it isn't, ask.

Note: 1,000,000 runs if probably a bit too much. Work up from 100 simulated lifes until your program takes about 30 seconds to run. If your results haven't stabilized by then, something's wrong.

Part 1: A simple dynamic system

A blind man lives in a 21st floor condominium in downtown Tulsa. He has a 15x15 foot room with a 5 foot window in the center of the North wall. Each day, for exercise, he enters the room, closes the door, and finds his pogo stick where he left it the previous day. He then jumps on the pogo stick until he hits a wall, at which time he stops for the day and leaves the stick there. On each jump he moves exactly one foot North, South, East or West, with each direction equally likely.

If our athletic blind man ever hits the window he will fall through it and plunge to his death. Concerned about this, he as come to your company for $100,000 worth of life insurance. In order to determine the premium (price) of the insurance, you decide to simulate his exercise regime to see how long he is likely to live.

Write a program to simulate the exercise regime described above. Document any additional assumptions you need to make. Instrument the simulation to output the life expectancy of your customer (in days) as well as the standard deviation of the simulated lifetimes and likelihood of dying on any given day.

There will be three more parts; by the time we are done, this will be a program of respectable size. The code you write for part one will be a component of the complete project, so do yourself a favor and design it cleanly.

This part should be completed in the lab as soon as possible. With the possible exception of the standard deviation you should be able to get this running this Thursday.

Your program should prompt for the number of times to run the simulation. Run the program several times each with 10, 1000, and 1,000,000 iterations. Note the degree to which the answers agree or disagree among the runs. How many iterations are needed to get a reasonably stable answer?

Submit your working program and convergence analysis to Parijat for Lab credit. Keep a copy of your solution to Part 1 to submit to me as part of your project package.

Research Component

The relationship between simulation statistics and the statistical properties of the real system is an interesting one. Mathematically it is connected to the idea of Monte Carlo integration, but it requires an understanding of probability theory to make the connection. I don't recommend that you go there.

For some aspect of M.C. Integration and/or simulation, find at least two published papers and write a 3-5 page (with illustrations and bibliography) report. This will be the theory paper to accompany your program on this project.

Essentially any coherent report on anything connected to random number in computer programming is acceptable. Examples include:

These are just examples. I do not expect extensive research or laborious writing for this. I want you to begin with concept of simulation (or Monte Carlo integration) and find a couple of sources that interest you. Then write a simple report that conveys your interest in the topic and what you learned from those sources. Two pages of text, a couple of illustrative diagrams, and a 2-4 entry bibliography are all I expect here.

The research component is due with your final submission package for this project.

Part 2: Financial Simulation

The system we simulated in part one has a simple analytical solution if we change the rules slightly. You should have found that the likelihood of our customer dying on any given day is about 1/40 or 0.025. If you have him begin each day in a random location you will find that probability is 1/12. This is because exactly 1/12 of his wall is window. One must be careful reasoning in this way, since the dynamics of an apparently simple system can cause strange statistical effects. On the other hand, when a slight modification makes it possible to predict the result mathematically, it is a very useful test of the rest of your program!

Our finance department predicts that they will earn about 7.5% per year interest on invested funds in the months to come. Since your customer probably won't last long enough for that forecast to change we will assume the actual rate earned varies randomly from week to week around that value.

Assume the interest is earned every day based on the weekly rate. You will need to model the beginning and ending of weeks: document any assumptions you make. The rate is 7.5% plus a standard normal random number (mean zero and s.d. = 1%) per year. Assuming 365 days/year the daily rate is current_rate/100/365.

If the policy is written (sold), the customer will pay the premium amount at the beginning of each week and your finance department will immediately invest that money. On the day the customer dies, the company pays his estate $100,000. The difference between the accumulated premiums with interest earned and the benefit paid out is the profit (or loss) to your company for that run.

This version of your program should accept both an iteration count and a proposed premium (price) for the insurance policy. It should output the expected financial gain or loss to the company (and its standard deviation) along with the life expectancy data as collected in Part 1.

Part 3

Suppose that insurance regulations require that unique policies like this one be priced to generate a certain expected return.

Assume that we are allowed to charge a premium that is expected to generate 15% expected profit on the face value of the policy.

Convert your program from Part 2 into a function that takes the inputs as parameters rather than prompting for them. Choose a root-finding algorithm suitable for this problem and use it to identify a fair premium for this proposed insurance policy.

Note: We will have a discussion in class about specializing an algorithm to this problem.