24/05/2019

Introduction

Our Members - Avyav Vatsa, Shu Wei Ng, Zachary Loh, Zihao Zhang

Our Project

  • Digging into the tennis outcome secrets with more efficient, standardised machine learning methods.
  • Improving the accuracy of machine learning on outcomes with manual adjustments.

Our Methods - Gbm, Random Forest, Xgboost(focus)

Methodology

Random Forest - Tuning parameters

XG Boost - Tuning parameters - XG Importance - Removing insignficant variables

Feature engineering: - Variable creation - Variable transformation - Variable elimination

Select model:

  • Unsupervised vs Supervised
  • RandomForest vs Xgboost

Results

## [1]  train-merror:0.136682+0.001801  test-merror:0.181408+0.004704 
## [11] train-merror:0.071784+0.001515  test-merror:0.132951+0.003047 
## [21] train-merror:0.033551+0.000854  test-merror:0.105379+0.002326 
## [31] train-merror:0.020485+0.001238  test-merror:0.099569+0.003852 
## [41] train-merror:0.010843+0.001389  test-merror:0.096188+0.003782 
## [51] train-merror:0.005115+0.000757  test-merror:0.094160+0.004347 
## [61] train-merror:0.002072+0.000278  test-merror:0.094435+0.004267 
## [71] train-merror:0.000645+0.000173  test-merror:0.094310+0.003885 
## [81] train-merror:0.000269+0.000090  test-merror:0.093809+0.004355 
## [91] train-merror:0.000025+0.000012  test-merror:0.094110+0.004356 
## [100]    train-merror:0.000000+0.000000  test-merror:0.093684+0.004459

Conclusion

  • [Random Forest] - Beginning - 0.90715
  • [Random Forest] - Trees Adjusted - 0.90730
  • [Xgboost] - Original - 0.91859
  • [Xgboost] - Max-depth Adjusted
  • [Xgboost] - Variables Weight Manipulated

With explorations to our dataset, we have tried different models, created several variables, manipulated variable weights and adjusted code structure. Although we get a better result than the default setting, we are still frustrated with improving our model. Hoping we can learn from further study.