Skip to content

Using PostGreSQL databases with WESTGRID

During my phd I used westgrid clusters to do some of my computations. Often I needed to interact with a database. It took me a while to get the whole thing working so I thought to share the script with you. Some nice features are that it is trying to figure out an open port that you can run the database on. You never know if something else is running on the port on the node you got assigned already.

#PBS -S /bin/bash
##PBS -l procs=1
#PBS -l nodes=1:ppn=1
#PBS -l walltime=48:00:00
#PBS -m bae
#PBS -l mem=6gb
#PBS -l file=5gb
# you or a wrapper script should fill in the details
DB_NAME=yyy # db created
# database set-up
# init db
mkdir -p $LOCAL_DB_DIR
$PSQL_BIN/initdb –encoding=UTF8 –locale=en_US.UTF8 -D $LOCAL_DB_DIR
# start db
# and trying to find a usable port
while [ `netstat -an | grep tcp | awk '{if ($6 == "LISTEN") print $4}' | awk -F: '{print $2}' | grep $DB_PORT | wc -l` -lt 1 ]
DB_PORT=$(( $DB_PORT + 1 ))
$PSQL_BIN/postgres -D $LOCAL_DB_DIR –checkpoint_completion_target=0.9 –checkpoint_segments=256 –checkpoint_timeout=300 –autovacuum=TRUE –fsync=FALSE -p $DB_PORT&
# wait until postgres is started
while [ `psql -p $DB_PORT -l | wc -c` -lt 1 ]
sleep 1
$PSQL_BIN/createuser -p $DB_PORT -s -d -r -l $DB_USER
$PSQL_BIN/createdb -p $DB_PORT $DB_NAME
# you should do your stuff here
# $DB_PORT is the port the database runs on
# After run clean up
# dump db
$PSQL_BIN/pg_dump -p $DB_PORT –no-owner –file=$DB_NAME.dump $DB_NAME
# stop db
$PSQL_BIN/pg_ctl -D $LOCAL_DB_DIR stop
# copy db to home
mv -f $DB_NAME.dump $DUMP_DEST
# cleanup
rm -rf $TMPDIR

view raw
hosted with ❤ by GitHub

Have fun with it.

What is Your Favorite Data Format?

I recently started working on a paper for the MSR-data track and started wondering what data format people would prefer.

Personally I usually use four types of data representation the most:

  • XML
  • PostGreSQL database tables
  • MySQL database tables
  • CSV
  • R-workspace

I attached a poll, so that you can vote. Of course I am aware that the list is not exhaustive so if you feel strongly enough about your favorite data format that you work with whenever possible let me know in the comments section.
On a related note, if you feel that such a question does not have a general answer but highly depends on the data that you are dealing with, in that case let’s stay practical and consider the data I want to submit to the MSR-data track.

I plan on submitting call-graphs of a java program created for every single commit while marking the methods in that call-graph that have been changed.

Or do you want all of them, at least in the form of one concrete data source bundled with scripts to convert one source into another?

Interacting with RTC using OSLC, Python, and requests lib

I recently needed to store all comments from a list of all work items from a project I was hosting on the predecessor of and always wanted to write something similarly to git to interact with jazz (started working on it on github (project link)).

I first tried to emulate the interaction as shown on their github integrator which is overly complicated then I stumbled across a library by Kenneth Reitz for doing http requests.
Using that lib the whole authentication and interactions becomes very easy:

import requests
user= 'your jazz username'
pw = 'your jazz password'
host = 'https://jazzhost:port/something'#/ROOTSERVICES
resource_uri = 'https://jazzhost:port/some_uri.json'
r = requests.get(host + '/authenticate/identidty', headers={'Accept':'application/xml'}, allow_redirects=True, verify=False)
r = + '/j_security_check', allow_redirects=True, verify=False, data={'j_username':user,'j_password':pw}, cookies=r.cookies)
r = requests.get( resource_uri, allow_redirects=True, verify=False, cookies=r.cookies)
r.json# contains the json response as dictionary of lists/diciotnaries

view raw
hosted with ❤ by GitHub

Have fun.

Book: God is not Great – How Religion Poisons Everything

In God is not great (associate link) the late Christopher Hitchens embarks on a quest to short the other side of religion and in a way becomes more concrete in what Jarred Diamond talked about in Guns, Germs, and Steel (associate link) that religion is often used to justify the rule of certain people. Hitchens goes to great length to show how phony religions are and to what length they go to validate their being.

Besides the countless examples of discrimination against people whose ancestors are not from the regions the religion originated and all the animals that are considered either as dirty or holy without due cause, I was very intrigued with the story about Mother Teresa. If you want to know how the advances in by Kodak with respect to a role of film made Mother Teresa holy, just google for it or read the book, it is just plain hilarious and frightening at the same time.

Interestingly enough Christopher Hitchens holds Dr. Martin Luther King Jr, but not for his involvement with the church but rather for his courage to go against apartheid which is nothing more than a remnant of the church approved (and actually in the bible demanded) slavery.

If you are up to it and a bit sceptical about religion I recommend reading the book, it contains some pretty fascinating stories.

Book: Yes – 50 Scientifically Proven Ways to Be Persuasive

I had the pleasure to read Robert Caldini’s previous book Influence: The Psychology of Persuasion (associate link) where he talks about what influences people to be more willing to give you your way. In Yes!: 50 Scientifically Proven Ways to Be Persuasive Rob gives concrete tips on how to be more persuasive. Just a small disclaimer the tips are not about a sure way of persuading other but are increasing the odds that they will do what you want. Here my two favorite tips:

  • KISS: Keep it simple sweety and people will trust more because they have the feeling they understand what you are saying.
  • Bless people with a real smile: People with positive feelings toward you are more likely to listen, but don’t try to trick them with a fake Pan American smile.

What did you find works for you to persuade others?

Ultra Large Scale Monte-Carlo Simulation

For our Concurrency class (which is more of a High Performance Computing class) Liam Kiemele and I are implementing a Monte-Carlo simulation that can deal with an insane amount of iterations (relative to the computational complexity for each iterations) as well as dealing with huge amounts of input data to build empirical distribution functions. Since this is a course project where everyone needs to get exposure to programming concurrent/parallelizable applications, and nicely so this application has two parts that need to be parallelized, but let me get back to that.

First of all, why would we choose to look into Monte-Carlo simulations? Through the course of the class we had several guest speakers, some talking about their experiences on research in concurrency/high performance computing, other brought their problems to us in the search for help. Neptune Canada was among those looking for help. They are working on a system for near-field Tsunami detection which requires Monte-Carlo simulation to estimate the likelihood and height of a possible Tsunami. Bottom line, we might see someone actually use our program!

Some of their problems arise from their plan to potentially use millions (or even billions) of input variables. This means two things, (1) they will need more simulation runs and (2) a way to create massive amounts of random input that is based on the distribution of for each individual input variable. Liam is currently taking care of (1) and I will be implementing (2). If you are interested in more details you can visit our project on github.

Let me ramble on a bit about the challenges with the part of generating random numbers. Generating large amount of numbers from a well known distribution such as a Gaussian distribution is easy since the random function can be presented in a very compact way. The issue is, that most of the distributions that underly all variables is not known and the only a massive number of observations for each variable is available. Therefore we will need to build an empirical distribution functions for each input variable, which includes holding as much data for each variable in memory making it necessary to distribute over a set of nodes to allow efficient generation of random numbers.

I will later talk more about the random number generation strategy we choose for our Monte-Carlo simulation as well as how we structured the whole system.

Why you should let your Grad Students review for you

I recently reviewed a couple of papers for my supervisor, and I must say it is always valuable to go through this process as a grad-student for … reasons:

  • What is publishable? I know that as a grad-student it is very important to publish for various reasons but I still find myself to figure out what presents a unit of publishable work, looking at conference publications gives me only one side of the picture but through reviewing I find myself getting a better understanding on what is good enough and what is not.
  • Getting exposure to one aspect of what a faculty member does? I guess every grad-student is contemplating a career in academia at one point in time, and reviewing is one aspect.

Evil Tip
I get very excited when I review papers that cite my work, I am pretty sure others feel the same. Therefore to increase chances to get into a conference not only cite the pc’s work but especially their students work.

Chris Corley@excsc pointed out it is also useful for every grad student’s CV if you are acknowledged as a co-reviewer. And I totally agree!

Book: How to Break Software

Finally I got around to read a book that is actually related to my area of study, computer science. Although software testing is not my main area of expertise but it never hurts to freshen up on practical skills such as testing.

How to Break Software is a very practical guide to go all out Godzilla on software and break it in every way possible. In his book James Whitaker gives very practical advice on how to systematically and methodologically dissect the inner works of software.

One very good advice Whitaker gives is to very carefully plan testing goals and how much effort should be expended before calling it tested enough and then rigorously work towards the defined testing goals.

So next time you are testing some peace of software set clear attainable goals and then test the crap out of that software.

Is an iPhone enough for a CS conference?

Short answer no. Short justification, the battery does not last long enough. But let me first start with some of the typical things I do during a CS conference such as ICSE or CSCW and then go over the pro’s and con’s of using an iPhone. And here is the list in no particular order:

  • Attend talks/keynotes.
  • Attend conference breaks.
  • Give talks/presentations.
  • Tweet.
  • Take pictures.
  • Check stuff on the web.
  • E-Mail.
  • Take notes.
  • Exchange contact information.

Attend talks/keynotes

Yes, I still go to conference to actually attend talks, not just because of the location and the opportunity to go on relatively cheap vacations. And attending talks often means walking from room to room, to maximize the gain from listening to talks that are relevant to me and my research.

Believe me the power outlets are always positioned in such a way that you will inconvenience half the room when you want to leave a session. Think front most on either side of the room. Thus not relying on anything that needs a table or power outlet is a win. And the additional benefit of an iPhone is it fits in your pockets, unlike an iPad or a netbook (tried not to mention the Mac Book Air on purpose, darn), which occupies a hand or need to be fit into a backpack, both decreasing maneuverability around other attendants.

Attend conference breaks

We all get hungry at one point and for the small hungers and thirst (in the morning often dominated by the need for coffee) we got those awesome breaks with refreshments. But at conferences with several 100 attendees this can get very crammed and the less you need to carry in your hands and the smaller your profile, meaning no backpack better than having one, the better. You don’t run into people, you don’t drop your coffee and you can get also something to eat with your coffee.

Give talks/presentations

At this years CSCW I was fortunate to give a talk, for more information see my last post. I was considering to use my iPhone to give my presentation but decided against it for two reasons. (1) I don’t have the latest version of Apple Keynote and thus would have needed to revert to doing a pdf presentation without animations, and (2) most importantly there is no remote that you can use to control your presentation and thus limiting you maneuverability (yes I know you can use another iOS device using the KeynoteRemote app, but believe me conference wireless at academic CS conferences is something you really shouldn’t rely upon).


There are often folks of mine that are left at home, such as fellow students that couldn’t find the funding to attend a conference, but still would like to know what is going on at the conference in particular at the talks. The by far easiest thing is to tweet bits and peaces as they happen (it is also a good way to pay attention to a talk). But depending on you fellow audience members the typing speed you can reach as an untrained iPhone typist you are reduce to re-tweeting.

Take pictures

… for my blog, like here. I never bothered to invest my meager savings into a high end camera and for the low end the iPhone actually does produce some decent pictures, especially if you know a little bit about photo post-processing.

Check stuff on the web

Ever had the feeling that you need to check a claim of a presenter on the internet? Well, I certainly had and the iPhone is easy to pull out at a moments notice and to check something wether you sit in a talk on a chair or need to stand. Try checking something quickly getting you netbook from you backpack not being able to put it down somewhere to type.


E-mail, easiest thing. My pants are always on vibrate, sometimes annoying but always up to date. Sending work as well plus it helps you keep your responses brief due to the more tedious typing.

Take notes

Note taking is an art with the iPhone, but once you got the hang of using the keyboard it is a very handy way to actually write down your thoughts, you just need to get comfortable with the keyboard.

Exchange contact information

Bump is cool but not available to everyone, email on the iPhone awesome. But nothing has yet surpassed the business card.


Bottom line, I enjoyed to only run around with my iPhone and my charger, both fit neatly in my pockets and I was more agile in jumping from session to session. Plus I did not feel to temped to surf the web and instead listened to the talks more (sadly it made me realize even more how bad many of them really are). If you are not afraid give it a try (just don’t forget to take your wall-charger with you).

(btw I did my iPhone “only” test run at CSCW 2012)

Let me know how your smart phone only conference went!

CSCW Summary

Last week I was in Seattle-Bellevue for CSCW (Conference on Computer Supported Collaborative Work). If you had the pleasure to follow me on twitter you might have seen my updates and retweet of events tagged with #cscw2012. But before the main track (or better the six parallel main tracks) started I had the pleasure to attend the Workshop on the Future of Collaborative Software Engineering.

Workshop on Future of Collaborative Software Engineering

The workshop started with two rounds of poster presentations, where everybody had a chance to give a quick one minute madness introduction to his/her poster to attract people to discuss it in greater “length” (well it was maybe 10 minutes) with other that were interested in the topic. I think that was a great start to get the creative juices flowing and was a good way to get a nice overview of what others had done.

The second part was somewhat disappointing. We had a set of speakers that where either presenting their own paper or a summary of the papers assigned to their session. The speakers were good and the topics interesting but it took up the rest of the workshop.

I would have loved to see us sit down and actually spend half of the workshop discussing and synthesizing the gathered insights and ideas into a set of future trends that we think will be most likely to occur. At the end of the workshop I somewhat felt left hanging without having derived a conclusion about the future outlooks.

CSCW main conference

The main conference was interesting but sadly offered only a small number of sessions that looked interesting and most of them where in parallel. But this gave me the opportunity to explore different tracks and although not directly (or even indirectly related) to my research area there are some remarkable things going on. One talk that in particular caught my eye was about eye tracking. This research group is able to track at which line of code participants are looking on the screen, super cool for program comprehension studies.

I also had the chance to contribute a talk to the main talk, here are my slides on, note I am currently working on webinizeing the slides so that they actually make sense. In the mean time you can get the paper from

The talk went well, I stayed on time, got some major laughs and got some good questions. The only down side was that it was the first on the first day after lunch. Which brings me to the following, never organize a conference with lunch on your own. Especially at the first day people had trouble finding a place close enough serving food fast enough and had trouble keeping track on time as a result many people came in late to the session, almost until the end of my talk people kept trickling in.

Oh well, the next day I had fun at the town-hall meeting by calling bullshit on some discussion that seems to happen at about every conference. With the new review process the acceptance rate got much higher and raising concern as to the worth of a CSCW publication and as always all the “senior” profs stated that it is an insufficient measure and we shouldn’t worry about, but somehow those same profs sit on the hiring and funding committees, strange. After I posed the question about “who then is using this measure if everyone agrees that it is inappropriate” and I got a lame excuse that we only use it because other research fields are relying on Journals and don’t think conference publications have any worth. Oh well, I guess after decades of dealing with that issue no one ever bothered to change the process to first submit to a CSCW journal and then invite the accepted papers to the conference, and be done with this pathetic crusade to convince others that our system is equivalent to the more traditional journal based systems.

Let’s move to some more fun things. The same day of the town-hall we had the conference banquette. It was at an Arcade with Bowling alley, sweet. All games were free it was just plain amazing. Bowling with friends check, racing friends in Ferrari simulator check, racing friends in Need for Speed check, Dance Dance Revolution check, and Air Hockey check. Man it was just plain cool, that event alone made the conference so worth it. Sorry, didn’t had the chance to make pictures and due to the dark nature I don’t think my iPhone would have produced anything of use.

Anther cool thing at CSCW was DoTastic a social to do app. You can assign tasks to yourself and to other. When you assign tasks to others you can give them points for completing them or you can chip in points to get your friends going on other tasks as well. During CSCW Microsoft had a competition running on who can gather the most points to win a Kinect and look you won (picture to the right).