Taking time seriously when evaluating predictions in Binary-Time-Series-Cross-Section-Data

with Gokhan Ciflikli, Sigrid Weber and Nils W. Metternich

Abstract. Efforts to predict civil war onset, its duration, and subsequent peace have dramatically increased. Nonetheless, by standard classification metrics the discipline seems to have made little progress. Although some remedy is promised by particular cross-validation strategies and machine learning tools, which increase accuracy rates substantively, pre-dictions over time remain challenging. In this research note we provide evidence that the predictive performance of conflict models is plagued by temporal residual error. We demonstrate that standard classification metrics for binary outcome data are prone to underestimate model performance in a Binary-Time-Series-Cross-Section context when temporal prediction error is high. We approach this problem as a Modifiable Temporal Unit Problem and propose to evaluate the predictive performance of this type of model in differently sized temporal windows. While retaining the ability of models to leverage disaggregated data for prediction, we provide a parsimonious aggregation approach that allows researchers to evaluate the time frame in which predictive models perform best.We demonstrate this procedure in Monte Carlo experiments and with existing empirical studies.

Read the latest draft here.