Saturday, March 29, 2008

Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications


Note that all of these standard generators have been shown to have serious defects:
Standard Perl rand
C-library rand()
Matlab’s rand
Mathematica’s SWB generator

George Marsaglia's KISS algorithm is given the thumbs up, Microsoft dotNet's Random class isn't mentioned almost certainly because it isn't used much in academia - and as I've already discussed here, that algorithm cannot be re-initialised quickly and KISS is alomst certainly going to be faster anyway. Though not quite as fast as XOR-Shift.

Fractional Brownian Motion

Click on the images for larger versions.

Brownian Noise

Fractional Brownian Noise

Brownian Motion

Fractional Brownian Motion

Each plot shows 8000 samples. As the number of samples increases (as we zoom out) the brownian noise plot looks increasingly like a solid bar with straight edges, whereas the Fractional Brownian Motion(FBM) plot exhibits self-affinity, meaning it looks the same at any magnification (barring magnifications below a lower threshold).

FBM data was generated using the Hosking method with a Hurst parameter (H) of 0.7. For source code and an excellent thesis on FBM generation see:

The Hosking method is an exact method and as such has time complexity of O(N^2). The reason for this is that FBM exhibits long term dependence and therefore each new sample is dependent on all previous samples. Faster approximate methods exist, which is largely the subject of the linked thesis.

The hosking.c file was ported to C# and ranlib was substituted with this project which contains a NormalDistribution class:

Thursday, March 27, 2008

World Community Grid Research + Climate Rant

Nice to see some research papers using data from WCG projects:

HPF2 Progress and News

I've always been a bit skeptical about the quality of the research behind some of these projects. E.g. a seemingly knowledgable poster on slashdot claimed the 'Help Conquer Cancer' project were using a grossly inefficient and sloppy technique. I was sufficiently convinced to switch off that particular project.

Despite my doubts I still feel overall that this sort of research (protein folding and interaction predictions) is a better use of CPU resources than the SETI or climate change modelling. Having said that I like the idea that SETI is running, but maybe the balance is wrong.

As for climate change, well I have many issues here. But ultimately I think we need to just look at the big picture (at the political level) and say, y'know we don't and can't know what is happening and to what degree CO2 and other emissions are responsible. We should (I would suggest) be attempting to minimize our modification of the atmosphere and thus minimize the probability that we invoke some change with detrimental consequences.

I seriously doubt that progress will be made on CO2 emmissions, and having spent the last few weeks reading about fractional brownian motion I'm also beginning to realise that the data we have is a very small set of samples from which very little can be derived mathematically. What we do know is that billions of tonnes of CO2 will have *an* effect and anything that rocks the boat is probably going to be bad for something or someone given that so much life on this planet, human or otherwise, teeters on a knife edge.

Sunday, March 23, 2008

UK House Prices as Multiple of Average Earnings

Ok I keep going on about this metric so I had a dig around the internet for the raw data and here then is the finished product.

Surprisingly then the multiple is currently aprox. 7.38, wow. Also I hand't realsied that the graph for the UK has yet to start falling. The data here goes up to the end of 2007 at which time the price was still rising. This graphs looks pretty 'toppy' though to my eyes.

A similar, but not directly comparable graph GMO drew for the US in Januray shows median home price over median family income. That multiple has a mean at 2.8, peaked at 3.9 in 2005 and had fallen to 3.7 by the end of 2007. So the fall has been going on sometime across the pond.

Note however that the national US picture doesn't do justice to the extent of the bubble in specific regions, or specifically what Paul Krugman refers to as zoned areas (where land isn't in inexaustible supply) in this video...

Along with the TED spread this is one of those graphs that will take pride of place as part of my 'economic radar'.

Earnings data going back to 1952 was found here:

Unfortunately the numbers were rebased to 1913=100, so then I had to dig out some figures dealing in real unadjusted pounds from the ONS, e.g. from here.

And the house price data going back to 1952 came from this lovely resource at the Nationwide:

Idle Thoughts on Correlations With House Prices

I have a few ideas I want to write down with the intention of filling out details in later posts.

1) Residential Property Prices as a Multiple of Income.

Jeremy Grantham, Chairman at GMO ( uses this metric often in his 'Letters to the Investment Committee'. In the UK the mean residential property price is currently about 6x mean salary. The long term multiple is around 3.5 and plotted on a graph this metric screams bubble and has been doing for the last few years.

The thinking behind this metric is simple - that the value of a house is strongly related to (and ultimately bound by) how much people can afford. That said, multiples greater than the number of years typically worked can't be completely ruled out if we allow mortgages to be handed down through generations. OK, but the only reason to do that is if the holder thinks that house prices are going to substantially increase in value over time, over and above inflation and personal earnings growth. Since we can always build new houses (where there is land available) that doesn't seem to hold - house prices will always be related to replacement cost.

A casual observation here is that the multiple of six is close to the reported multiple of 5x salary used by some mortgage lenders up to the sub-prime crisis kicking off. Typically a lender will calculate the maximum amount they are willing to lend using a multiple of 2.5 to 3. The gradual shift up to 5 was symptomatic of the disconnection going on in the debt markets between percieved or calculated risk and actual risk.

Now that salary multiple is a pretty crude calculation anyway. At the very least they could subtract tax and some basic cost of living from the gross salary and then assign a new multiple to what remains. In fact a glance over some mortgage web sites shows what they actually do is use different multiples depending on a small set of salary ranges. Better but still pretty crude considering this is a fudge to avoid doing some basic algebra here guys. No doubt more accurate calculations are performed in the 'back office'. Well maybe.

2) Employment Rates.

I need to check this out in more detail but I've read that another correlation exists between employment rate and property prices. Easy to see for sure, the economy slows, people lose jobs and ability to pay a mortgage. Or at least there are less new buyers around. Perhaps a stronger correlation would be found between total employment level rather than employment rate, since we are currently (in the UK) taking in many overseas workers, all of which add to demand for existing housing stock. You could argue that the construction industry expands along with the working population and demand, sure, but there is some lag there I would say. It also raises the possibility of recent immigrants swiftly departing the UK on an economic downturn, reducing demand for accomodation fairly sharply.

3) GDP per head.

This is related to the above comments about employment rate versus employment level. Last week The Economist discussed how GDP per head is often a very different number from overall GDP. E.g. Japan's GDP per head is actually very strong due to a shrinking population, The US on the other is already in a recession on the per head scale because of a rising population through immigration.

Tuesday, March 18, 2008

Debt Not Correlated With Economic Growth: News at 11

More wisdom from one of my favourite economic commentators...

Interview with Jeremy Grantham, Chief Investment Strategist, GMO

So if debt doesn't create growth then why the heck have we been allowing debt to grow so substantially? Where is the intellectual rigour in these decisions? either to actively engage in the issuing of more debt (banks) or to sit by and allow it to happen (governments)?

The UK government is one of the most indebted as measured in % of GDP despite record economic activity, bumper tax income and super low unemployment rates. They have inflated the public sector's slice of GDP and (assuming a recession is nigh) will have to either (A) take on much more debt to pay all of those public workers in the next few years or (B) make massive public sector cuts. They have absolutely no cushion for this potential recession, as far as I'm concerned they have been grossly irresponsible. No doubt they will say that no-one could have predicted the coming recession, well, umm actually, dunno if you'd noticed chaps but they kinda happen every few years.

3684 Days to The Robot Uprising

From the 'very cool but ever-so-slightly unnerving' deptartment...

Something about the leg shape and the buzzing noise from the (I assume) petrol engine just made me think - giant mutant wingless fly . Probability that this will feature in a nightmare tonight, ummm about 95%.

Sunday, March 2, 2008

Selective Acquisition of Knowledge

Experiment. Take a 'basic' piece of knowledge, oh I don't know, let's say the fact that the Moon revolves around the Earth. Where did this knowledge come from? And how widely dispersed/propagated is it through, umm let's say the population of France?

Who wants to be a millionaire clip

To summarise, the guy is asked "Qu'est-ce qui gravite autour de la Terre?" (What revolves around the Earth?), The Moon, The Sun, Mars or Venus. He answers The Sun after an audience poll yielding 56% for the Sun, 42% Moon and 2% for Mars.

Those numbers need qualifying and cleaning up a little. First up, we have to consider that more than 58% didn't actually know the correct answer, there may have been some who did not know and just picked an answer at random, some right, some wrong. Note however that the quizmaster says not to feel obliged to answer. On the other hand there is pressure in situations like this to not look stupid - so you just press a button anyway (potentially).

I also just want to address the more pedantic readers out there in internetland - it is of course more correct say that the Moon and Earth rotate around their common center of gravity(barycenter), but that center is within the body of the Earth and so it's not an unreasonable question to ask in a general purpose quiz. OK? Certainly this guy wasn't struggling trying to work out what is meant by 'revolve' - does that word have a different meaning in real space-time than in Euclidean space? hmmm.

So anyway, at least 58% of that audience were not in possesion of this particular piece of knowledge. It doesn't appear to be a core, commonly known fact in French society (and mostly likely others), like say 1+1=2 (this is an assumption). And yet there is this large shimmering, fairly prominent silk cresent in the sky. It wouldn't be unreasonable to have assumed that most people would have looked at it at some time and wondered what it was and made enqiries. Consider this versus some obscure fact deep in some specialist research paper, why would most people know about that? What would the hook or lead be that led them on an a path of enquiry to it? The path to the fact is more tenuous and arduous and therefore we could expect most people to be oblivious to it.

So what is it that determines which pieces of knowledge become widely known, given that the moon is misunderstood despite being so prominent? Most probably it comes down to the fact that a given human's primary purpose in life, whether they know this or not, is to propagate his/her genetic makeup into the future. Learning all of this pesky knoweldge stuff is just a necessary cost of achieving that goal, and the knowledge that assists most on that quest is that related to society - it's structure, who has what power and skills(rather than aquiring all of the skills yourself), who to trust, and of course evaluating potential mates - in the traditional sense of the word :)

We are bombarded with senory input conveying vast quantities of information from which we could infer and deduce vast quantities of knowledge if we had the time and perseverance, but doing so would be sub-optimal with regard to the task of propagating our genetic sequences into the future. We are programmed to notice some features and to think about some aspects of the physical world moreso than others.