Skip to content

Ultra Large Scale Monte-Carlo Simulation

For our Concurrency class (which is more of a High Performance Computing class) Liam Kiemele and I are implementing a Monte-Carlo simulation that can deal with an insane amount of iterations (relative to the computational complexity for each iterations) as well as dealing with huge amounts of input data to build empirical distribution functions. Since this is a course project where everyone needs to get exposure to programming concurrent/parallelizable applications, and nicely so this application has two parts that need to be parallelized, but let me get back to that.

First of all, why would we choose to look into Monte-Carlo simulations? Through the course of the class we had several guest speakers, some talking about their experiences on research in concurrency/high performance computing, other brought their problems to us in the search for help. Neptune Canada was among those looking for help. They are working on a system for near-field Tsunami detection which requires Monte-Carlo simulation to estimate the likelihood and height of a possible Tsunami. Bottom line, we might see someone actually use our program!

Some of their problems arise from their plan to potentially use millions (or even billions) of input variables. This means two things, (1) they will need more simulation runs and (2) a way to create massive amounts of random input that is based on the distribution of for each individual input variable. Liam is currently taking care of (1) and I will be implementing (2). If you are interested in more details you can visit our project on github.

Let me ramble on a bit about the challenges with the part of generating random numbers. Generating large amount of numbers from a well known distribution such as a Gaussian distribution is easy since the random function can be presented in a very compact way. The issue is, that most of the distributions that underly all variables is not known and the only a massive number of observations for each variable is available. Therefore we will need to build an empirical distribution functions for each input variable, which includes holding as much data for each variable in memory making it necessary to distribute over a set of nodes to allow efficient generation of random numbers.

I will later talk more about the random number generation strategy we choose for our Monte-Carlo simulation as well as how we structured the whole system.

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: