Resumen
The prognosis of diffuse large B-cell lymphoma (DLBCL) is heterogeneous. Therefore, we aimed to highlight predictive biomarkers. First, artificial intelligence was applied into a discovery series of gene expression of 414 patients (GSE10846). A dimension reduction algorithm aimed to correlate with the overall survival and other clinicopathological variables; and included a combination of Multilayer Perceptron (MLP) and Radial Basis Function (RBF) artificial neural networks, gene-set enrichment analysis (GSEA), Cox regression and other machine learning and predictive analytics modeling [C5.0 algorithm, logistic regression, Bayesian Network, discriminant analysis, random trees, tree-AS, Chi-squared Automatic Interaction Detection CHAID tree, Quest, classification and regression (C&R) tree and neural net)]. From an initial 54,613 gene-probes, a set of 488 genes and a final set of 16 genes were defined. Secondly, two identified markers of the immune checkpoint, PD-L1 (CD274) and IKAROS (IKZF4), were validated in an independent series from Tokai University, and the immunohistochemical expression was quantified, using a machine-learning-based Weka segmentation. High PD-L1 associated with poor overall and progression-free survival, non-GCB phenotype, Epstein?Barr virus infection (EBER+), high RGS1 expression and several clinicopathological variables, such as high IPI and absence of clinical response. Conversely, high expression of IKAROS was associated with a good overall and progression-free survival, GCB phenotype and a positive clinical response to treatment. Finally, the set of 16 genes (PAF1, USP28, SORT1, MAP7D3, FITM2, CENPO, PRCC, ALDH6A1, CSNK2A1, TOR1AIP1, NUP98, UBE2H, UBXN7, SLC44A2, NR2C2AP and LETM1), in combination with PD-L1, IKAROS, BCL2, MYC, CD163 and TNFAIP8, predicted the survival outcome of DLBCL with an overall accuracy of 82.1%. In conclusion, building predictive models of DLBCL is a feasible analytical strategy.