How Accurate Is Our IVF Prediction Model?

August 26, 2020 Heather Holland

How accurate is our IVF prediction model? How many times more accurate is it compared to an age-based prediction?

Accuracy and quality of prediction (also called predictive power) may seem like one and the same in many people’s minds, but in fact, they are two different measures of how well a prediction model performs.

Example: Let’s say an IVF prediction test predicts that the probability of success is 40% for a particular patient. Take 100 of such patients who are all given a personalized success rate of 40%. Well, since no one will have 40% of a baby, each patient either has a baby or doesn’t. If this IVF prediction test has 100% accuracy (and 0% error), then 40 of these 100 patients will have a baby.

How is this different from predictive power? If the conventional age group method also predicted that each of these 100 women had a 40% chance of success, then the IVF prediction test would have 100% accuracy but a PLORA (posterior probability of log-odds ratio compared to age- see part 3) of 0, meaning zero value add in terms of predictive power, when compared to age.

In reality, even our large, multi-center data sets would not contain 100 women with a 40% success rate, another 100 women with a 39% success rate, another 100 women with a 38% success rate, and so on.

If we were to personalize the accuracy as well as the actual predicted probability, the test data set (see previous blog post for what a “test data set”) would need to have ~ 100 patients for each predicted probability percentage point.

That means if you want to provide prediction to patients with success rates ranging from 10% to 60%, you would need 5,000 (e.g. 50 percentage points x 100 IVF cycles) test cases that are evenly spread out to represent 10% to 60% success rates.

And that’s just the test set, which does not even include the training data set. (Even if we were to use a data set comprising 10,000 cases, it’s unlikely that there is an equal distribution of patients for each predicted probability percentage point.)

Therefore, even in personalized medicine, measuring the accuracy of this personalized prediction requires the grouping of patients.

The grouping of patients allows us to measure accuracy with a data set much smaller than 10,000 cases.

Example: Let’s say we have 1,000 patients in the IVF prediction test data set. We divide these patients into groups, each of which roughly represents a fifth (or 20%) or so of the whole group. The top group may comprise 200 patients who have the highest range of predicted probabilities; the next 200 patients have the next highest range of predicted probabilities, etc. (The actual grouping may have slightly more or fewer than 200 patients.)

The prediction error of each group is the difference between the average predicted probability (e.g. the expected probability) and the average probability of that group (e.g. the observed probability).

About the Author:

Mylene Yao, M.D. | Co-Founder and CEO of Univfy®

Dr. Mylene Yao is a board-certified OB/GYN with more than 20 years of experience in clinical and reproductive medicine research. Prior to founding Univfy, she was on the faculty at Stanford University, where she led NIH-funded fertility and embryo genetics research and developed The Univfy AI Platform with the academic founding team. See her full bio here.