- SEED_DIFF: the difference in seeding between two teams in a matchup.
- ELO_DIFF: the difference in Elo rankings between two teams in a matchup.
- B_DIFF: the difference in Bart Torvik rankings between two teams in a matchup.
- t1_WINNER: one-hot encoded (0 or 1) feature indicating whether “team 1” was the game winner (using this as the target of our ML model training).
Utilizing these features, we’re going to train models to make game predictions and then evaluate the success of those models against the “baseline” model of simply choosing each game winner based on the team’s seeding (picking the team with the better seed) to see how well these models can perform.
Seeds of success: Establishing the baseline
In order to establish a baseline of success, we will start by evaluating how many games we’d pick correctly from 2019-2024 if we always picked the team with the better seed. As in the feature creation steps of the previous article, we will accomplish this by creating a new Dataiku formula in a Prepare step.











