From Correlation to Causation through stories and math

Correlation and causation are two concepts that people often mixup in their minds. I must admit that I myself have been guilty about this, and it unlikely that I would ever entirely grow out of it as it is wired deeply into our psychology. Let me use this article to briefly emphasise what the concepts of correlation and causation means, some interesting stories that have emerged from people misunderstanding these concepts and an algorithm that attempts to find causal relationship using correlation information. Here is a story that I heard a professor of mine, Prof. Dr. Ernst-Jan Camiel Wit, tell us during a lecture. There was a school that was involved in a study to see if providing free mid-day meals to students, which they could choose to be subscribed to this or not. At the end of the study, both the students who subscribed to it and did not where tested for different health indicators. It was observed that the students who chose to have meals from the programme had poorer health

Extracting Stock Price Data Trend from Google Search to Train LSTM Network

I am not very good with designing my own neural networks. I have attempted to create a few in the past, some worked out fine for proving certain points that I wanted to make, but whenever I tried to make something to participate in a competition, things were not very well. In this post, I do not intent on creating an LSTM based neural network to predict stock prices, rather simply use Google as a tool to extract the stock trends from the graphs.

To get started, search for "Amazon stock price" on google. You would be able to see a pretty nice graph. On right clicking on the graph and clicking on Inspect and reading through Elements in the Developer tools, it can be observed that the graph is rendered using Scalable Vector Graphics or SVG. This is an XML-based vector image format and the data required to create such a graphic would be available in a format that we'll be able to read. I also observed that the required SVG image has a class name uch-psvg and there is only one element with that class name.

Let us start observing the data inside the SVG image. It can be seen that identical data is being stored in the first two path tags inside the SVG. This represents the data trend. To train an LSTM, you don't necessarily need the data with right numbers; you just need the data with the right trend. Let us extract this data into variables named xValues and yValues.

svg = document.getElementsByClassName('uch-psvg')[0];
pathStr = svg.getElementsByTagName('path')[1].outerHTML;
valueStr = pathStr.split('d="M ')[1].split('"')[0];
valueStrSplit = valueStr.split(" L ");
var xValues = [];
var yValues = [];
for(var i = 0; i < valueStrSplit.length; i++){
    xy = valueStrSplit[i].split(" ");
    yValues.push(-1 * parseFloat(xy[1]));

It can be observed that xValues are just equidistant values and from analyzing a trend perspective, it would not provide a lot of information. Let us ignore that.

The reason for adding a -1 multiplier to yValues is because in browsers, while rendering, the coordinate axes start from the top left corner of your screen and positive Y-direction is downward and hence SVG would have values adjusted accordingly. We are only interested in the right trend and hence to flip it, we simply have to add a negative sign.

You can use this method to create a training data with trends from different stock prices and create a huge training data and train an LSTM network.


Popular posts from this blog

First impression of Lugano - Mindblowing

Started a blog under HexHoot

Thinking about developing an opensource P2P social network