World Cup 2018 – Predictions (Part 1)

All code can be found here :


I thought that it would be interesting to build a prediction model to predict the results of the clashes between the final 16 FIFA World Cup 2018 teams.

Data obtained from here :

Main python notebook here :

Two datasets were used :

Dataset 1 : A FIFA World Ranking database of all the countries that play soccer competitively.

Dataset 2 : Results of all international soccer matches since 1872.

Data Preparation

  • Considering that line-up changes does affect the odds, I elected to take in data only from the last world cup onwards.
  • Only the 2018 rankings was used, rankings earlier than this was not considered.
  • Only World Cup matches’ data was used (Qualifers, or otherwise), it was found during model exploration that using World Cup data only yielded better accuracies and overall metrics.
  • Data Balancing was done to push up the accuracies of the model.
  • Data was split into 80% training data, 20% test data. The 20% test data was not used to test until the end, after using 5-fold cross-validation to validate the model quality using only the training data.


  • I used Orange to do a quick run of all the Scikit-Learn classifier algorithms, of which Logistic Regression, Random Forest and Naive Bayes emerged as the best classifiers for this particular problem.
  • A random forest model was eventually chosen, and hyper-parameters tweaking was done on it.

Predicted Outcome : Germany Wins the Germany – South Korea match with 77% accuracy.


More to come soon after the final 16 are in…

One thought on “World Cup 2018 – Predictions (Part 1)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s