REVISTA
Applied Sciences

TODAS

Redirigiendo al acceso original de articulo en 24 segundos...

Inicio / Applied Sciences / Vol: 9 Par: 20 (2019) / Artículo

ARTÍCULO

TITULO

Sample Reduction Strategies for Protein Secondary Structure Prediction

Sema Atasever

Zafer Aydin

Hasan Erbay and Mostafa Sabzekar

Resumen

Predicting the secondary structure from protein sequence plays a crucial role in estimating the 3D structure, which has applications in drug design and in understanding the function of proteins. As new genes and proteins are discovered, the large size of the protein databases and datasets that can be used for training prediction models grows considerably. A two-stage hybrid classifier, which employs dynamic Bayesian networks and a support vector machine (SVM) has been shown to provide state-of-the-art prediction accuracy for protein secondary structure prediction. However, SVM is not efficient for large datasets due to the quadratic optimization involved in model training. In this paper, two techniques are implemented on CB513 benchmark for reducing the number of samples in the train set of the SVM. The first method randomly selects a fraction of data samples from the train set using a stratified selection strategy. This approach can remove approximately 50% of the data samples from the train set and reduce the model training time by 73.38% on average without decreasing the prediction accuracy significantly. The second method clusters the data samples by a hierarchical clustering algorithm and replaces the train set samples with nearest neighbors of the cluster centers in order to improve the training time. To cluster the feature vectors, the hierarchical clustering method is implemented, for which the number of clusters and the number of nearest neighbors are optimized as hyper-parameters by computing the prediction accuracy on validation sets. It is found that clustering can reduce the size of the train set by 26% without reducing the prediction accuracy. Among the clustering techniques Ward?s method provided the best accuracy on test data.

Palabras claves

protein secondary structure prediction - support vector machine - bayesian network - stratified sampling - hierarchical clustering

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 9 Parte: 20 (2019)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Water
Information
Aerospace

DOI

https://doi.org/10.3390/app9204429

Artículos similares

Cross-Scale Reliability Analysis Framework for LNG Storage Tanks Considering Concrete Material Uncertainty

Acceso

Fupeng Liu, Jiandong Ma, Zhongzhi Ye, Lijia Wang, Yu Sun, Jianxing Yu, Yuliang Qin, Dongliang Zhang, Wengang Cai and Hao Li

The reliability of liquefied natural gas (LNG) storage tanks is an important factor that must be considered in their structural design. Concrete is a core component of LNG storage tanks, and the geometric uncertainty of concrete aggregate material has a ... ver más

Revista: Journal of Marine Science and Engineering

Fault Detection Algorithm Based on Dynamic Global?Local Preserving Projection

Acceso

Wenbiao Wang, Qianqian Zhang and Kai Zheng

Industrial system operations usually have dynamic characteristics. If these characteristics are ignored, the performance of fault detection degrades. Herein, the fault-detection algorithm of dynamic global?local preserving projection (DGLPP) is employed ... ver más

Revista: Applied Sciences

Radio-Frequency Identification Traceability System Implementation in the Packaging Section of an Industrial Company

Acceso

Hermenegildo Gomes, Francisco Navio, Pedro D. Gaspar, Vasco N. G. J. Soares and João M. L. P. Caldeira

In recent years, radio-frequency identification (RFID) has aroused significant interest from industry and academia. This demand comes from the technology?s evolution, marked by a reduction in size, cost, and enhanced efficiency, making it increasingly ac... ver más

Revista: Applied Sciences

Bimetallic Gold?Iron Oxide Nanoparticles as Carriers of Methotrexate: Perspective Tools for Biomedical Applications

Acceso

Tsvetelina Batsalova, Alexander Vasil?kov, Dzhemal Moten, Anastasiia Voronova, Ivanka Teneva, Alexander Naumkin and Balik Dzhambazov

Bimetallic nanoparticles (BMNPs) combine unique and synergistic properties of two metals, allowing new specific applications. In this study, bimetallic AuFe nanoparticles and their conjugates with methotrexate (MTX) were obtained with an environmentally ... ver más

Revista: Applied Sciences

Severity of Temporomandibular Joint Disc Displacement and Generalized Joint Hypermobility in Growing Patients: A Cross-Sectional Magnetic Resonance Image Study

Acceso

Adriana Assunta De Stefano, Ana Maria Boboc, Martina Horodynski, Alessandra Impellizzeri, Emanuela Serritella and Gabriella Galluccio

Background: This study aimed to investigate the association between the internal derangement of the temporomandibular joint (TMJ), in particular the severity of disc displacement (DD), and the presence of generalized joint hypermobility (GJH) in growing ... ver más

Revista: Applied Sciences

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas