In previous posts, I discussed criteria essential to a IVF prediction test, such as validation, predictive power, and accuracy/prediction error. Today, I’d like to discuss what makes a test useful to the clinician. Usefulness for clinicians and patients, or clinical utility, means that the IVF prediction test can identify patients with a certain prognosis, and this information would not be available without the IVF prediction test. Clinical utility can benefit patients directly by providing prognostic information that would otherwise not be available, or it can benefit patients indirectly by giving providers special insight to support more effective and less risky optimization of clinical protocols.
We can think of clinical utility in terms of whether a new IVF prediction test reclassifies the patients to have a different probability of IVF success, compared to estimates of their success without using the new prediction test. Of course, predicting a different probability of success is not valuable unless the prediction itself is proven to be more accurate. In addition, being able to distinguish patients based on their percentile ranking makes a prediction test clinically useful. I will discuss the potential use of percentile ranking in another post. Here, I will illustrate the concept of reclassification with research findings that we recently reported.
Close to a decade of research culminated last month in the publication of research findings validating the usefulness of two of our prediction models: Choi et al, Fertil Steril 2013 (PreIVF-D) and Choi et al. presented at the Society for Gynecologic Investigations 60th Annual Scientific Meeting, Orlando Fl., 2013 (PredictIVF-D). (The PreIVF-D model is the basis for the Univfy PreIVF test, and the PredictIVF-D model is the basis for the Univfy PredictIVF test.)
The PreIVF-D model, which predicts the probability of live birth in the first IVF cycle, was tested retrospectively to determine its clinical utility. We wanted to create a model that could be used by the vast majority of clinics and so did not want the model to depend on very specific types of patient populations, payment systems, or clinical protocols. Therefore, data from three university-affiliated outpatient IVF clinics in three different countries was analyzed in the study to ensure that diverse sampling of data was represented.
Data from over 13,000 first IVF cycles was used to build a primary model, which was then ‘trained” to create the PreIVF Diversity (PreIVF-D) model applying a statistical method called boosted tree to analyze data from over 1,000 first IVF cycles. The resulting PreIVF-D model was independently validated using another set of more than 1,000 first IVF cycles.
Live birth probability as predicted using the PreIVF-D model was compared to the probability based on maternal age alone, as age is the prevailing method or primary predictor used in most reports in the infertility medical literature and websites. In 86% of cases studied, PreIVF-D generated significantly different probabilities of success than those based on age, and more than half had higher live birth probabilities than predicted by age. Over 40% of the patients had a personalized predicted success rate of greater than 45%, whereas the age-control model could not differentiate these patients from others. Most importantly, PreIVF-D had more than a 1,000-fold greater predictive power than the age-control model (likelihood scale), which represented a 36% improved log-likelihood in the prediction.
In another research project focusing on chances of IVF success after one or more prior IVF treatments, PredictIVF Diversity (PredictIVF-D) showed similar improved live birth predictive power compared to age based prediction. Boosted tree methods were used to analyze live birth outcomes from over 20,000 IVF cycles to create the PredictIVF-D model. This preliminary model was then “trained” using seven years of data and validated using a separate data set derived from more than 1000 IVF cycles. In 73% of cases, PredictIVF-D live birth probabilities were different than those generated by the age-based control model. PredictIVF-D showed a 75% improved predictive power and increased the predictive dynamic range from four discrete age-based probabilities to a continuous probability spectrum from 7% to 60%.
This greater predictive power over a wider probability spectrum means that both patients who are more likely and those who are less likely to conceive from IVF can be identified. When IVF procedures are better targeted to patients, all patients benefit; both those who can get pregnant sooner using IVF and those who can choose other paths to parenthood sooner, knowing that IVF is highly unlikely to work for them. Validation of these IVF prediction models is a major milestone in our efforts to provide new tools for better patient-specific counseling and thus better outcomes for patients.