An Artificially Intelligent Sports Tipper

Dr Alan McCabe
School of Information Technology
James Cook University of North Queensland

You can now buy Alan's book, containing a detailed discussion of mutiple artificial intelligence models that ultimately gave birth to the MAIT tipping system

Abstract

This paper presents a description of an artificially intelligent model for predicting the outcome of particular sporting contests. It is intended as an interesting case study of using neural networks for predicting probabilistic events in a sporting scenario. The model used is a multi-layer perceptron and it is presented with a number of features representing the performance of various sporting teams. Generally more complex than much of the free stuff available, the outcomes are considerable and pleasing. The system performed quite well and successfully predicted the winner in over 70% of the matches. Results also include a study of which features are most useful, the results of "live" testing so far this year and for interest sake, the amount of money this system would have won if it was placing bets at a sports betting outlet during the season.

Introduction
Modelling the Feature Space
- Neural Networks
Input Data
- Feature Extraction
Experimentation
- Results
  - Previous Seasons
  - Live Testing
Discussion
Future Work
Conclusion
References

Introduction

This paper explores the utility of neural networks, in particular multi-layer perceptrons for predicting the outcome of sporting contests, given only basic information. This is a less traditional application area of neural networks and somewhat of a novelty, however the basic principals of machine learning still apply. Additionally, attaching to predictions an indication of how certain the predictor is, and rewarding such predictions properly, are important issues in many fields.

The data used in this work was taken from a number of different seasons of the National Rugby League (NRL) competition [8], with detailed results presented for the 2002 season. For the previous three years (1999-2001) the competition structure consisted of fourteen teams, each playing exactly one game per week, meaning seven matches were contested each week. A total of twenty-six weeks of competition are played in the regular season totalling one hundred and eighty-two matches. For the 2002 season there were fifteen teams and usually fourteen of these play in any given week (with one team having a "bye"). This season was used for "live" testing with one hundred and eighty-nine predictions made in total.

The data contains noise in that there are details influencing the contest outside of those which are being captured in the feature set. Firstly there is what is often referred to as individual "form" of the players, however it is believed that in most cases, the overall skill level of the team will transcend the bad form of individuals. Secondly there is the fact that the skill level of the team can be affected by the unavailability of "star" players due to injury, suspension or representative duties (players getting called away to play for their state or country). In the most extreme case, one team was left without eleven of their thirteen regular players due to representative duties and this is not reflected in the feature set used (see Section 3.1 for more information about the features).

There was a conscious effort made to ensure that there was no subjectivity in the feature set. For example, it would be desirable to include some measure of "star-availability" in the feature set, but it is almost impossible to accurately quantify the value of an individual player to their team.

The paper is organised as follows: Section 2 gives a background on the neural network engine used to make the predictions; Section 3 describes the raw data used and the feature extraction process; Section 4 details the experiments conducted and the results of work, including comparisons to "expert" tippers and the potential success at sports betting outlets; a breakdown of the utility of different features is presented in Section 5; future work is presented in Section 6 and Section 7 contains the concluding remarks.

Return to index

Modelling the Feature Space

Neural Networks

One of the key features and the main reason for using neural networks (NNs) to model the feature space is the ability of NNs to learn the relationship between inputs and outputs upon presentation of examples [2]. It is only necessary to provide a set of sample data (also known as training data or training set) to the network and the use of learning (or training) algorithms such as back-propagation perform an adjustment of the network to better model the problem domain.

There are many types of neural networks and one of the most popular models is the multi-layer perceptron (MLP). MLPs associate a weight with each of the input features (see Section 3.1 for a discussion of the features in this domain) according to that features importance in the model (see Figure 1). This type of network topology is imminently suited to the well-defined domain discussed in this paper because several features exist and a weighting must be associated with each according to its contribution to the solution. These weights can be set to specific initial values (possibly to facilitate an intentional bias) or simply randomly assigned. The learning algorithm then adjusts the weights to minimize the error between the target output (the desired output provided in the learning examples) and the observed output (the output as calculated by the MLP).

A basic neural network setup
Figure 1: A simple neural network topology, with features, weights and a single output unit.

There are a number of learning algorithms which can be used to optimize the weights in MLPs, such as back-propagation, conjugate gradient descent and Levenberg-Marquardt [2]. The two learning algorithms concentrated on in this work were back-propagation and the conjugate-gradient method, both classical algorithms which are effective, relatively simple and well understood. Some argue that other learning algorithms often perform faster [3], but as this study deals with only a very small number of features with no requirement for real-time operation, the above-mentioned advantages outweigh any perceived increase in speed.

Back-propagation and the conjugate-gradient method work by iteratively training the network using the presented training data. On each iteration (or epoch), the entire training set is presented to the network, one case at a time. In order to update the weights a cost function is determined in terms of the weights and its derivative (or gradient) with respect to each weight is estimated [10]. Weights are updated following the direction of steepest descent of the cost function. A common cost function and the one used in this work is the root mean squared (or RMS) error. During experimentation with the two methods back-propagation was a little slower to learn that the conjugate-gradient approach, but both methods resulted in almost identical error rates. Back-propagation was slightly more accurate, making one more correct prediction than did conjugate-gradient. Further discussions of MLPs and the two learning algorithms can be found in most neural network texts, for example [11] and [2].

Return to index

Input Data

The raw data for this work was originally provided by TAB Limited in the form of a series of spreadsheets (one per season). These spreadsheets contained the names of the teams competing in each game, the final score for each team and the dollar amount paid by TAB Limited for successfully picking the winner of the game. There are twenty-six weeks of competition in each of the seasons. From this raw data it is possible to determine weekly statistics (features) for each team including their current success rate, their recent performance, the points they've scored in the competition to date (to rate their offensive capabilities) and the points scored against them (to rate their defensive capabilities) and several other indicative features.

Return to index

Feature Extraction

A number of features were extracted from the raw data for use in the automatic tipping process. A conscious effort was made to ensure that no subjective features were included in the feature set. For example, the inclusion of a "star-availability" measure would be desirable, as obviously the unavailability of leading players due to suspension, injury or representative duties can greatly effect the performance of the team as a whole. The problem with this however is that first, it requires human intervention to monitor the domain and determine availability, which goes against the concept of an "automatic" tipper, and also it requires some subjectivity on the part of the overseer. For example, it is impossible to predict with any certainty that player A is more valuable to team X than player B is to team Y, and different individuals will subjectively select different values for this feature.

Features obtained were based solely on details such as scoreline, recent performance and position on the "league ladder" relative to other teams. This removes the need for any human involvement in generation of features and removes any bias or subjectivity on the part of that human. A set of features was obtained for each team, for each week of competition, and are as follows:

Points-for: the total points scored by a team in rugby league games so far this season.
Points-against: the total points scored against a team in games so far this season, expressed as a negative number.
Overall Performance: the performance of a team based on their win/loss record. Two points are awarded for a win, one point for a draw and no points for a loss. Performance is then just the sum of these values for each week of competition so far. This is the same as the means used by the National Rugby League to determine ladder positions for the teams from week to week.
Home Performance and Away Performance: The cumulative performance value calculated using only home games and only away games respectively. The Overall Performance feature can hide specific details such as home-ground performance, for example if a team has a 90% success rate at their home ground and a 10% success rate away from home, then the overall success rate of 50% hides some important information when trying to predict a winner.
Performance in Previous Game: the performance of a team in their most recent game. In the first week this value is undefined as there was no "previous game".
Performance in Previous n Games: the sum of the performance for the most recent n games. Up to five previous games were considered in the feature set. This is an attempt to gauge the recent "form" and take into account whether the team is on a winning or losing streak.
Team Ranking: this refers to the position of the team on the NRL's ladder based on a list of the teams, sorted by their overall Performance value (with one being the highest and fourteen being the lowest). The use of this feature is obvious as, with all other things being equal, a team with a high ranking is expected to defeat a team with a lower ranking.
Points-for in Previous n Games: the sum of the points scored by a team in the most recent n games. Values for n of one to five were used in the feature set.
Points-against in Previous n Games: the sum of the points scored against a team (expressed as a negative number) in the most recent n games. Values for n of one to five were used in the feature set. Points-for and points-against for the previous n weeks are used to obtain the recent attacking and defensive prowess of a team.
Location: a value indicating whether the current game is played at the teams home venue or elsewhere. The value 1 is taken for a home game, and 0 for an away game.

If a team did not play in a given week (this is relevant only to the 2002 season) the feature values from the previous week were brought forward to the current week. Additionally, when required, the feature values were averaged over the number of matches played. This was done so that a meaningful comparison may still be made between two teams which have played a different number of previous matches.

Return to index

Experimentation

The experiments conducted involved firstly extracting the features described in Section 3.1. The next phase was to construct the model. As mentioned in Section 2 the choice was made to use a multi-layer perceptron (MLP) to model the features and back-propagation (which proved slightly more effective than the conjugate-gradient method) to facilitate learning. Specifically, a three-layer MLP was used with nineteen input units (one for each feature), ten hidden units and a single output unit. The output unit is normalised to be a value between zero and one inclusive.

The feature set values were calculated for each team for each week of competition. The MLP was trained using all examples from previous weeks (except for the first week obviously, as there was no previous week). Predictions were made for the current week by using the MLP to calculate an output value for each team based on that teams feature set. An output value of close to one for a particular team indicated a high level of confidence that the team was going to win their upcoming match, and an output value of close to zero indicated a low confidence level.

The output values for each team competing in each game were calculated and the team which had the highest output value (that is, the highest confidence that the team would be victorious) was taken as the predicted winner (or tip) for that match. Success rates calculated include the hit rate, or the proportion of tips for which the predicted winner of a game matched the actual winner.

Return to index

Results

The results are split into two separate parts: the first part shows the performance of the system tested (as described above) on a database of results from the three previous seasons; the second part involves an examination of the system performance in the "live" testing scenario conducted during the 2002 season (details of which are given below).

Return to index

Previous Seasons

It is difficult to measure the success of the system as there are few benchmarks for use of neural networks in this domain. Systems with a similar structure have been used to perform other predictive tasks, such as use of neural networks to predict prices on the stock market [4, 14, 9, 7, 6]. These systems do not typically perform with outstanding success and are rarely more effective than a naive human investor (largely due to the fact that stock market prices are effected by a very large number of variables, many of which are not quantifiable).

A small number of systems exist which attempt to apply neural networks to the sporting arena with varying levels of success. Australian Rules football has been the subject of the most significant effort in computerised sports prediction by a number of groups [12]. The most thorough study into applying neural networks to sports prediction is described in [1] and reports a best-case results of 58% using data from the 1999 National Rugby League season.

Regular (human) winners of tipping competitions (where each participant attempts to pick a winner for every game on a week-by-week basis) typically have success rates of somewhere between 65% to 70%. Given that fact and the success rates of other computerised sports prediction systems, the optimistic aim of the approach described in this paper was a result of over 70% success.

Testing on data from previous seasons yielded quite promising results. The approach was tested on the 1999 season (for the purpose of comparison with other reported results) and exceeded expectations resulting in a success rate of 76.2%. The testing on the 2000 season resulted in exceptional performance at just under 83.0% success rate (although it should be noted that the 2000 season was particular "well behaved" in that the home team won on the vast majority of occasions). The 2001 season resulted in an overall success rate of 71.4%, with results obtained from other human-based competitions suggesting this was a more "difficult" season to predict. The results for the 2002 seasons are discussed below.

Return to index

Live Testing

In 2002, the automatic tipper has undergone a public "live" testing under the name of "McCabe's Artificially Intelligent Tipper" or MAIT, which has received some interest from the media. The live testing environment involves predictions for each week being made at the end of the previous week of competition (that is, several days in advance). These predictions are posted on an internet site [5] and printed in the Townsville Bulletin, a local newspaper. A summary of the results for 2002 appears in Table 1.

MAIT's Rates: The success of the MAIT tipping system during 2002
Table 1: The various performance characteristics for the tipping system using data from the 2002 National Rugby League season.

Return to index

Discussion

It has already been shown that the automatic tipper successfully modelled elements of sporting contests, at least to the point where it was able to outperform most human tippers. It did this by finding a balance of weights to associate with the different features, some clearly less important than others.

In further experimentation, the importance of each of the individual features was considered. Figure 2 is a graphical presentation of the performance of the top performing features for season 2002 and includes the results for the MAIT system. As can be seen, MAIT clearly outperformed all of the simple heuristics.

Figure 2: The performance of individual features in predicting the winner of rugby league games.

Return to index

Future Work

There are a number of directions of possible future work in this area. The natural progression with future work is to assess the effectiveness of the automatic tipper on different seasons in the National Rugby League competition, as well as different leagues (such as the English Rugby League) and indeed different sports. One of the difficulties in training the system on different leagues, seasons and sports is the lack of availability of detailed training data. Without the assistance of other organisations it is often difficult to obtain records from previous seasons.

From a financial point-of-view it would be advantageous to examine more advanced betting strategies to maximize return and minimize the number of bets lost. Additionally the development of a richer feature set to better capture the environment of the sporting contest, such as to account for player availability, performance in different weather conditions etc., should be included.

Return to index

Conclusion

Despite an existing "novelty" value of this work, there is still theoretical interest in the modelling of features in a noisy environment and the use of machine learning techniques to predict probabilistic events. One of the attractions of sport in general is that there are so many factors which can affect the outcome and that on any given day, either side can win a sporting contest. This same fact is what makes the prediction process so difficult and why so much time and money is spent by individuals trying to predict winners in various "tipping competitions" and gambling outlets.

This paper described an attempt to generalize and model the behaviour of teams in sporting contests, using the National Rugby League competition as a specific example. Results were reported for various seasons which compared favourably against other human "expert" rugby league tippers.

Results could have been further improved, taking into account details such as player availability (unavailability of "star" players due to representative duties, injury or suspension are not considered by the neural network). The multi-layer perceptron used managed to adapt very quickly and perform well despite the limited information and the outside influences not included in the feature set.

Return to index

References

1: Baulch, M. "Using Machine Learning to Predict the Results of Sporting Matches". Thesis, University of Queensland, 2001.
2: Bishop, C. "Neural Networks for Pattern Recognition". Oxford University Press, 1995.
3: Hassoun, M. "Fundamentals of Artificial Neural Networks". Massachusetts Institute of Technology Press, 1995.
4: Kalyvas, E. "Using Neural Networks and Genetic Algorithms to Predict Stock Market Returns". Masters Thesis, University of Manchester, 2001.
5: McCabe, A. "McCabe's Artificially Intelligent Tipper (MAIT)". http://www.mymait.com/, 2002.
6: McCann, P. J. and Kalman, B. L. "A Neural Network Model for the Gold Market". http://citeseer.nj.nec.com/308853.html.
7: Merkl, D. "Partially Recurrent Neural Networks in Stock Forecasting". Proc. Artificial Intelligence in Economics and Management (AIEM4), 1996.
8: National Rugby League. http://www.nrl.com.au/.
9: Op 't Landt, F. W. "Stock Price Prediction Using Neural Networks". Maters Thesis, Leiden University, 1997.
10: Pessoa, L. "Multilayer Perceptrons versus Hidden Markov Models: Comparisons and Applications to Image Analysis and Visual Pattern Recognition". Qualifying Examination Report, Georgia Institute of Technology, 1995.
11: Russell, S. and Norvig, P. "Artificial Intelligence - A Modern Approach". Prentice Hall, 1995.
12: Swinburne Sport Statistics. http://www.swin.edu.au/sport/afl.htm, 2002.
13: Tab Limited. http://www.tablimited.com.au/.
14: Toulson, D. L. and Toulson, S. P. "Use of Neural Network Ensembles for Portfolio Selection and Risk Management". NeuroCOLT Technical Report Series, NC-TR-96-046, 1996.

Dr Alan McCabe
alan@mymait.com

An Artificially Intelligent Sports Tipper

Dr Alan McCabe School of Information Technology James Cook University of North Queensland

Abstract

Dr Alan McCabe
School of Information Technology
James Cook University of North Queensland