IPL 2020 has begun in UAE this time with 8 teams and Dubai, Sharjah and Abu Dhabi as hosting stadiums all three being home grounds for all the teams. Prediction of an IPL match depends on various factors and no less than Rocket science involving many parameters.
We used Logistic Regression to the predict winner of every match including play offs and the Final. The features used were Home team, Away team, Ground, Win percentage from 2008 to 2019, toss winner in each match.
Now its time to go through the coding part of our Machine Learning model.
Initially, we shall import all the required libraries needed for our algorithm.
After data cleaning, our teams stats looks something like this(2 tables one below the other). Here, the figures against each team shows the number of matches played by the same as Home Team in the 1st table and as Away Team in the second table. so for example, CSK played 93 matches in Home grounds and 78 matches as Away Team(not Home ground) since 2008 till 2019 season.
Then we find out the win percentage of each team against all the opponent teams since the inception of IPL by considering the historical data.
Then we have to encode the teams to numerical values in order to make the data compatible to Logistic Regression.
We use One-Hot Encoding Technique to do this. If you are new to this technique, please visit my post which gives a detailed explanation on One-Hot Encoding(click here).
final=pd.get_dummies(df,prefix=[‘Home Team’,’Away Team’],columns=[‘Home Team’,’Away Team’])
After this, getting the number of matches won by each team in a seperate column ‘Result’ gets us the below numbers.
We perform Logistic Regression by calling the fit method after splitting the data into training and testing sets. We get an accuracy of around 50%(not bad).
The full code is posted in github repository.