Extending the Trade War (App) – Further Explorations with Google Trends and the Market

These guys…

This is a continuation of the original article describing the app : 
https://kerpanic.wordpress.com/2018/12/20/trading-the-trade-war-sentiment-based-trading-using-google-trends/
If you have not read the previous article, I highly recommend to read that first as it serves as an introduction to the web app at https://tradewargoogletrends.herokuapp.com/

New Features and Parameters

Since the original web app was released, some requests and suggestions have come in, and some new world events like the increase in the federal reserve rates and several Trump antics have occurred. Here are the all the updates to the web app.

  • Using Google Trend Topics instead of Search Terms: Topics in Google Trends are more all-encompassing as they cover all the terms related to the word topic you chose. Hence, to cover more ground I chose to convert getting the search terms previously to topics. A detailed difference between search terms and topics is listed here
  • Other Keywords : In the previous version, the keyword was limited to “Trade War”. Now the new keywords include : “recession”, “financial crisis”, “debt”, “federal reserve system” and also something to satisfy these two best friends (look for it!). These are terms which are related somewhat to the trade war, and allows us to expand upon the analysis of how Google Trend sentiment affects the market prices. It is interesting to see how these terms correlate to the ups and downs of the individual stocks, especially now that we are in a volatile period of time. I personally like “financial crisis” or “recession” as a “chicken-little” warning indicator – try it out! 
  • Wider Range for Parameters: In order to deal with the behaviour of different keywords, a wider percentage range for “Upper Sell Threshold” and “Lower Buy Threshold” has been included.
  • More Tickers : There are more tickers now of popular stocks, including the S&P500 and Dow indexes. I’ve also added the tickers of those stocks I have a personal interest in.
  • Lighter red/blue line to indicate triggers on partial data: Google trend values actually measure the proportion of that particular keyword search in relation to all the other searches in Google. This means when Google Trends states that the latest data is partial data, we cannot reliably say that it is a buy or sell trigger should it cause a trigger. We can only say it is “headed that way” and we can only confirm it when the end of the week it is measuring is reached. Thus, a lighter blue/red line indicates such a situation.
  • Date Picker for Date From Which to Start Calculations: This states the point in time at which we start the buy/sell calculations. It will come in useful if you want to test the Google Trend strategy from a different point in history. 
  • Addressing Heroku App Startup Times: If you are on the free tier of Heroku, it seems that the app automatically sleeps if it is idle for 30 minutes, after which there is a 10 second startup time for the first time the app is accessed. By pinging it at regular intervals with https://uptimerobot.com you can keep it alive to maintain the responsiveness of the app. I am really cheap.
aapl_recession_trend
Oh, if only I had sold …

Thoughts

The more I mess around with the web app, the more I believe in its ability to draw from the wisdom of the crowd. That said, not all the keywords presented seem effective, and I leave it to your own judgement to see which ones are. Personally I think “trade war” , “financial crisis” and “recession” seem to correlate best with trends in the market, but YMMV. Perhaps a reason for this is that these are terms most people can relate to and search for.

Essentially, the Google searches represent the sentiment of the people, albeit on a weekly aggregation. At the very least, I will look at what these graphs say before I make any buys or sells, especially in these turbulent conditions to gauge how as a whole all of us are thinking. 

One drawback though, is that the Google trends are coming in on a weekly basis, which may be too slow compared to the volatile market in this climate. Perhaps a further step could be to get wiki search trends (these are available daily, I think), or to analyse trump’s incessant tweets to further confirm the positive/negative sentiments at that point of time. The possibilities are endless, and I am positive that the professionals quantitative analysts out there are doing more than this.

All that said, if you mess around with the app enough, you’ll find that buy-and-hold is still a pretty solid strategy!

Trading the Trade War – Sentiment-Based Trading using Google Trends

You can find the web application, made using Dash here : https://tradewargoogletrends.herokuapp.com/

Prospect Theory and Loss Aversion

Prospect Theory is a theory in cognitive psychology that describes the way people choose between probabilistic alternatives that involve risk, where the probabilities of outcomes are uncertain. The theory states that people make decisions based on the potential value of losses and gains rather than the final outcome, and that people evaluate these losses and gains using some heuristics. A layman way to think of Prospect Theory is an analysis of decision under risk.

One easy illustration of loss aversion in prospect theory is when we are faced with two choices:

  1. Get $50 dollars straightaway
  2. Flip a coin – get $100 if heads, nothing if tails.

Most people go for choice (1), because we can’t bear the thought of getting nothing should we choose (2) and the coin says tails. We are loss-avoiding creatures, even if mathematically (1) and (2) are the same.

Relating Prospect Theory to Google Trends

Dr. Tobias Preis from Warwick Business School suggested in 2013 that Google Trends could be used to predict stock movements in his paper “Quantifying Trading Behavior in Financial Markets Using Google Trends”. 

Here is a link to his presentation. A simple trading strategy proposed by him is as follows:

Why does this strategy work? Apparently, according to prospect theory, we tend to search more when bad news happens. We over-react to bad news (searching frantically and worrying) and under-react to routine news like reliable growth in a company. So, when something bad happens, a lot of people search for it, and cause a upwards spike in Google Trends, and that is generally when it is time to sell. Conversely, when the data point in Google trend drops, it means less folks are searching for the topic and hence no bad sentiment is evident. Arguably, it is a better time to buy in “calmer” times.

Of course, for the individual investor, there are a number of issues with the trading strategy above, even if the paper claims it to be profitable.

  1. The number of trading transactions will be too large if we are doing a transaction every week. If we are charged $20 per transaction, the transaction costs will balloon and prove to be too much for the individual investor.
  2. Individual investors do not have the advantage of speed – by the time it is his/her turn to buy/sell, the effect may already be priced in.

Hence, for the individual investor, we can only make use of this knowledge in the broad sense, perhaps as a danger alert indicator for bad times ahead. In this article, we will investigate how we can make use of this knowledge and how it may be applied to the individual retail investor’s trading activities.

Defining the Terms for the Experiment

We will use some popular stocks like AAPL and GOOG to run the experiment. Before we begin, we have to define some terms that we will use throughout the article as well as the web application, as follows:

  • Upper Sell Threshold : This is the percentage increase of the google trend from one data point to another beyond which we sell. Default for the program developed is set at 45%.
  • Lower Buy Threshold : This is the percentage decrease of the google trend from one data point to another beyond which we buy. Default for the program developed is set at -40%.
  • Keyword : This is the keyword to be entered for the Google Trends. The default keyword we use here is “Trade War” as a topic in Google Trends.
  • Shares to Buy: This is the number of shares to buy for each transaction.
  • Shares to Sell: This is the number of shares to sell for each transaction.
  • Initial Money: We also define in the program the initial amount of money that we have to do the transactions.

Rules of the Game

There are some other important rules to consider in our experiment as well:

  • Bench-marking against Buy-and-Hold: Naturally, we would benchmark this trading strategy against buy-and-hold, where we buy the stock at the first buying opportunity (same first buying opportunity as the Google Trends strategy) and hold it till the end of the experimental period. We will then check to see if the strategy has earned more or less money than buy-and-hold.
  • Number of transactions are recorded and transaction costs are accounted for: The cost of each transaction is set at $20, and is multiplied by the number of transactions that took place as a result of the strategy. This is for added realism with the individual retail investor in mind.
  • If Not Enough Funds to Buy: If we do not have “cash” on hand to buy more shares, we will buy the maximum number of shares that we can afford.
  • If Not Enough Shares to Sell: If we do not have the stipulated number of shares to sell, we will sell whatever remaining shares we have. In this sense, if we have not bought anything yet, we will not be able to sell anything, even if it is a “sell” line.
  • Period of Testing: The experiment is set over the Trade War timeline, from early 2017 to the current date. The web app will be kept alive and running throughout the trade war.

Conclusion

You can find the web application, made using Dash here : https://tradewargoogletrends.herokuapp.com/

Running the experiment, we are able to “devise” a strategy that out-performs the buy-and-hold strategy. This shows that there is some truth to the claim that Google Trends is capable of aiding investment strategies.

In the web app, the red dotted lines are “sell” actions and the blue dotted lines are”buy” actions. You are able to choose your own parameters to run the experiment to your liking.

Disclaimer: Do not blame me for any loss of money should you decide to follow this strategy. This is a purely academic endeavour that explores the link between prospect theory, trends and the stock market.

That said, there is a emotional hurdle here to overcome if we are to abide fully with the strategy. For example, when the algorithm tells me to sell when I will clearly lose money. Greed is also something I had to overcome, when the algorithm told me to sell when it is clear the trend is going up.

I will keep the web application alive on Heroku so that it can serve as a continuous point of reference for this article. It can be a little slow on first access as it is a free app hosting tier and this is the default behaviour (I’m cheap) – the app has to start itself up. For me personally, this app is useful as a “chicken-little” early warning signal. It’ll be interesting to see how the future plays out!

Let me know at madstrum@gmail.com what you think! I would be interested in ideas and suggestions for the web app. Thanks!

Lastly, I would like to thank Mr. Eric Tham from the NUS-ISS Sentiment Mining course for introducing us to this as well as other finance related topics. 

You can find Part 2 of this article here, where I describe some extensions I made to the app, and some findings. 

Image result for chicken little the sky is falling

FIFA World Cup 2018 (Part 2) – Quarters Predictions

Quarter_Predictions

Previous article here : https://kerpanic.wordpress.com/2018/06/27/world-cup-2018-the-final-16-predictions/

Data obtained from here : https://www.kaggle.com/agostontorok/soccer-world-cup-2018-winner/data

All code here : https://github.com/lppier/fifa18_final16

Enhancements

  • Matches that resulted in a draw are no longer considered, as in the final 16 knockout stage, there is no draw.
  • One hot encoding is used to remove possibility that model might consider the team number a ranked value.
  • Added latest results from the new matches played so far into the data!
  • Added an ensemble voting classifier aggregating results between kNN, Adaboost and Neural Networks

Metrics from kNN Classifier Model

area under curve: 0.8912393162393163
accuracy: 0.8893333333333333
precision: [ 0.84711779  0.93732194]
recall: [ 0.93888889  0.84358974]
fscore: [ 0.89064559  0.8879892 ]
kNN Quarters Prediction
1 means a win for the 1st country.
France vs Argentina : 0
Uruguay vs Portugal : 0
Spain vs Russia : 1
Croatia vs Denmark : 1
Brazil vs Mexico : 1
Belgium vs Japan : 1
Sweden vs Switzerland : 0
Colombia vs England : 0

Metrics from Adaboost Model

area under curve: 0.8912393162393163
accuracy: 0.8893333333333333
precision: [ 0.84711779  0.93732194]
recall: [ 0.93888889  0.84358974]
fscore: [ 0.89064559  0.8879892 ]
Adaptive Boosting Prediction
1 means a win for the 1st country.
France vs Argentina : 0
Uruguay vs Portugal : 0
Spain vs Russia : 1
Croatia vs Denmark : 0
Brazil vs Mexico : 0
Belgium vs Japan : 1
Sweden vs Switzerland : 0
Colombia vs England : 1

Metrics from Neural Networks Model

area under curve: 0.8912393162393163
accuracy: 0.8893333333333333
precision: [ 0.84711779  0.93732194]
recall: [ 0.93888889  0.84358974]
fscore: [ 0.89064559  0.8879892 ]

Neural Networks Quarters Prediction
1 means a win for the 1st country.
France vs Argentina : 0
Uruguay vs Portugal : 0
Spain vs Russia : 1
Croatia vs Denmark : 0
Brazil vs Mexico : 1
Belgium vs Japan : 1
Sweden vs Switzerland : 0
Colombia vs England : 0

Voting Ensemble Results

Basically, among the three classifiers, majority wins. It’s interesting to note that Adaptive Boosting actually predicted that Brazil will lose, but the majority votes Brazil will win.

A similar situation is in Croatia vs Denmark. Majority votes are for Denmark to win.

Ensemble Prediction (Voting scheme, followed by averaging among the 3 models)
1 means a win for the 1st country using voting scheme.
France vs Argentina : 0 Probability 0.7537506238216767
Uruguay vs Portugal : 0 Probability 0.5634269455951803
Spain vs Russia : 1 Probability 0.8396692244968408
Croatia vs Denmark : 0 Probability 0.4291791223437185 <- Probability here suggests different result from voting!
Brazil vs Mexico : 1 Probability 0.8330163908699761
Belgium vs Japan : 1 Probability 0.8396997204649211
Sweden vs Switzerland : 0 Probability 0.6988071986353249
Colombia vs England : 0 Probability 0.5637759883599832

It’ll be fun to see how it goes!

World Cup 2018 – Predictions (Part 1)

All code can be found here : https://github.com/lppier/fifa18_final16

2018-06-23t195652z-61914496-rc1d8e49e1c0-rtrmadp-3-soccer-worldcup-ger-swe

I thought that it would be interesting to build a prediction model to predict the results of the clashes between the final 16 FIFA World Cup 2018 teams.

Data obtained from here : https://www.kaggle.com/agostontorok/soccer-world-cup-2018-winner/data

Main python notebook here : https://github.com/lppier/fifa18_final16/blob/master/fifa2018.ipynb

Two datasets were used :

Dataset 1 : A FIFA World Ranking database of all the countries that play soccer competitively.

Dataset 2 : Results of all international soccer matches since 1872.

Data Preparation

  • Considering that line-up changes does affect the odds, I elected to take in data only from the last world cup onwards.
  • Only the 2018 rankings was used, rankings earlier than this was not considered.
  • Only World Cup matches’ data was used (Qualifers, or otherwise), it was found during model exploration that using World Cup data only yielded better accuracies and overall metrics.
  • Data Balancing was done to push up the accuracies of the model.
  • Data was split into 80% training data, 20% test data. The 20% test data was not used to test until the end, after using 5-fold cross-validation to validate the model quality using only the training data.

Model

  • I used Orange to do a quick run of all the Scikit-Learn classifier algorithms, of which Logistic Regression, Random Forest and Naive Bayes emerged as the best classifiers for this particular problem.
  • A random forest model was eventually chosen, and hyper-parameters tweaking was done on it.

Predicted Outcome : Germany Wins the Germany – South Korea match with 77% accuracy.

 

More to come soon after the final 16 are in…