OBJECTIVE:
Knee osteoarthritis (OA) is among the higher contributors to global disability. Despite its high prevalence, currently, there is no cure for this disease. Furthermore, the available diagnostic approaches have large precision errors and low sensitivity. Therefore, there is a need for new biomarkers to correctly identify early knee OA.

METHOD:
We have created an analytics pipeline based on machine learning to identify small models (having few variables) that predict the 30-months incidence of knee OA (using multiple clinical and structural OA outcome measures) in overweight middle-aged women without knee OA at baseline. The data included clinical variables, food and pain questionnaires, biochemical markers (BM) and imaging-based information.

RESULTS:
All the models showed high performance (AUC > 0.7) while using only a few variables. We identified both the importance of each variable within the models as well its direction. Finally, we compared the performance of two models with the state-of-the-art approaches available in the literature.

CONCLUSIONS:
We showed the potential of applying machine learning to generate predictive models for the knee OA incidence. Imaging-based information were found particularly important in the proposed models. Furthermore, our analysis confirmed the relevance of known BM for knee OA. Overall, we propose five highly predictive small models that can be possibly adopted for an early prediction of knee OA.