Skip to content

Using PostGreSQL databases with WESTGRID

03/01/2013
During my phd I used westgrid clusters to do some of my computations. Often I needed to interact with a database. It took me a while to get the whole thing working so I thought to share the script with you. Some nice features are that it is trying to figure out an open port that you can run the database on. You never know if something else is running on the port on the node you got assigned already.

Have fun with it.

What is Your Favorite Data Format?

09/10/2012
I recently started working on a paper for the MSR-data track and started wondering what data format people would prefer.

Personally I usually use four types of data representation the most:

  • XML
  • PostGreSQL database tables
  • MySQL database tables
  • CSV
  • R-workspace

I attached a poll, so that you can vote. Of course I am aware that the list is not exhaustive so if you feel strongly enough about your favorite data format that you work with whenever possible let me know in the comments section.
On a related note, if you feel that such a question does not have a general answer but highly depends on the data that you are dealing with, in that case let’s stay practical and consider the data I want to submit to the MSR-data track.

I plan on submitting call-graphs of a java program created for every single commit while marking the methods in that call-graph that have been changed.

Or do you want all of them, at least in the form of one concrete data source bundled with scripts to convert one source into another?

Interacting with RTC using OSLC, Python, and requests lib

30/08/2012
I recently needed to store all comments from a list of all work items from a project I was hosting on jazz.net/hub the predecessor of hub.jazz.net and always wanted to write something similarly to git to interact with jazz (started working on it on github (project link)).

I first tried to emulate the interaction as shown on their github integrator which is overly complicated then I stumbled across a library by Kenneth Reitz for doing http requests.
Using that lib the whole authentication and interactions becomes very easy:

Have fun.

Book: God is not Great – How Religion Poisons Everything

04/04/2012
In God is not great (associate link) the late Christopher Hitchens embarks on a quest to short the other side of religion and in a way becomes more concrete in what Jarred Diamond talked about in Guns, Germs, and Steel (associate link) that religion is often used to justify the rule of certain people. Hitchens goes to great length to show how phony religions are and to what length they go to validate their being.

Besides the countless examples of discrimination against people whose ancestors are not from the regions the religion originated and all the animals that are considered either as dirty or holy without due cause, I was very intrigued with the story about Mother Teresa. If you want to know how the advances in by Kodak with respect to a role of film made Mother Teresa holy, just google for it or read the book, it is just plain hilarious and frightening at the same time.

Interestingly enough Christopher Hitchens holds Dr. Martin Luther King Jr, but not for his involvement with the church but rather for his courage to go against apartheid which is nothing more than a remnant of the church approved (and actually in the bible demanded) slavery.

If you are up to it and a bit sceptical about religion I recommend reading the book, it contains some pretty fascinating stories.

Book: Yes – 50 Scientifically Proven Ways to Be Persuasive

28/03/2012
I had the pleasure to read Robert Caldini’s previous book Influence: The Psychology of Persuasion (associate link) where he talks about what influences people to be more willing to give you your way. In Yes!: 50 Scientifically Proven Ways to Be Persuasive Rob gives concrete tips on how to be more persuasive. Just a small disclaimer the tips are not about a sure way of persuading other but are increasing the odds that they will do what you want. Here my two favorite tips:

  • KISS: Keep it simple sweety and people will trust more because they have the feeling they understand what you are saying.
  • Bless people with a real smile: People with positive feelings toward you are more likely to listen, but don’t try to trick them with a fake Pan American smile.

What did you find works for you to persuade others?

Ultra Large Scale Monte-Carlo Simulation

20/03/2012
For our Concurrency class (which is more of a High Performance Computing class) Liam Kiemele and I are implementing a Monte-Carlo simulation that can deal with an insane amount of iterations (relative to the computational complexity for each iterations) as well as dealing with huge amounts of input data to build empirical distribution functions. Since this is a course project where everyone needs to get exposure to programming concurrent/parallelizable applications, and nicely so this application has two parts that need to be parallelized, but let me get back to that.

First of all, why would we choose to look into Monte-Carlo simulations? Through the course of the class we had several guest speakers, some talking about their experiences on research in concurrency/high performance computing, other brought their problems to us in the search for help. Neptune Canada was among those looking for help. They are working on a system for near-field Tsunami detection which requires Monte-Carlo simulation to estimate the likelihood and height of a possible Tsunami. Bottom line, we might see someone actually use our program!

Some of their problems arise from their plan to potentially use millions (or even billions) of input variables. This means two things, (1) they will need more simulation runs and (2) a way to create massive amounts of random input that is based on the distribution of for each individual input variable. Liam is currently taking care of (1) and I will be implementing (2). If you are interested in more details you can visit our project on github.

Let me ramble on a bit about the challenges with the part of generating random numbers. Generating large amount of numbers from a well known distribution such as a Gaussian distribution is easy since the random function can be presented in a very compact way. The issue is, that most of the distributions that underly all variables is not known and the only a massive number of observations for each variable is available. Therefore we will need to build an empirical distribution functions for each input variable, which includes holding as much data for each variable in memory making it necessary to distribute over a set of nodes to allow efficient generation of random numbers.

I will later talk more about the random number generation strategy we choose for our Monte-Carlo simulation as well as how we structured the whole system.

Why you should let your Grad Students review for you

13/03/2012
I recently reviewed a couple of papers for my supervisor, and I must say it is always valuable to go through this process as a grad-student for … reasons:

  • What is publishable? I know that as a grad-student it is very important to publish for various reasons but I still find myself to figure out what presents a unit of publishable work, looking at conference publications gives me only one side of the picture but through reviewing I find myself getting a better understanding on what is good enough and what is not.
  • Getting exposure to one aspect of what a faculty member does? I guess every grad-student is contemplating a career in academia at one point in time, and reviewing is one aspect.

Evil Tip
I get very excited when I review papers that cite my work, I am pretty sure others feel the same. Therefore to increase chances to get into a conference not only cite the pc’s work but especially their students work.

Edit
Chris Corley@excsc pointed out it is also useful for every grad student’s CV if you are acknowledged as a co-reviewer. And I totally agree!