..

Zeitschrift für Informatik und Systembiologie

Manuskript einreichen arrow_forward arrow_forward ..

Volumen 2, Ausgabe 2 (2009)

Forschungsartikel

Combination of Ant Colony Optimization and Bayesian Classification for Feature Selection in a Bioinformatics Dataset

Mehdi Hosseinzadeh Aghdam, Jafar Tanha, Ahmad Reza Naghsh-Nilchi and Mohammad Ehsan Basiri

Feature selection is widely used as the first stage of classification task to reduce the dimension of problem, decrease noise, improve speed and relieve memory constraints by the elimination of irrelevant or redundant features. One approach in the feature selection area is employing population-based optimization algorithms such as particle swarm optimization (PSO)-based method and ant colony optimization (ACO)-based method. Ant colony optimization algorithm is inspired by observation on real ants in their search for the shortest paths to food sources. Protein function prediction is an important problem in functional genomics. Typically, protein sequences are represented by feature vectors. A major problem of protein datasets that increase the complexity of classification models is their large number of features. This paper empowers the ant colony optimization algorithm by enabling the ACO to select features for a Bayesian classification method. The naive Bayesian classifier is a straightforward and frequently used method for supervised learning. It provides a flexible way for dealing with any number of features or classes, and is based on probability theory. This paper then compares the performance of the proposed ACO algorithm against the performance of a standard binary particle swarm optimization algorithm on the task of selecting features on Postsynaptic dataset. The criteria used for this comparison are maximizing predictive accuracy and finding the smallest subset of features. Simulation results on Postsynaptic dataset show that proposed method simplifies features effectively and obtains a higher classification accuracy compared to other feature selection methods.

Forschungsartikel

Does an Improved Understanding of the Nature and Structure of the Physiological Systems Lead to a Better Understanding of the Therapeutic Scope of Complementary & Conventional Medicine?

Ewing G.W and Ewing E.N

Colour perception is associated with the function of the Autonomic Nervous System. This is linked to the function of the physiological systems, organs, cellular and molecular biochemistry. Stress is also linked to the function of the Autonomic Nervous System and influences the stability of the physiological systems. It affects the levels of proteins and their reactive substrates which release biophotons, unique for their colour and level/ intensity, which subsequently influence visual perception. There is therefore a definable relationship between the neurosensory pathways, the autonomic nervous system and all aspects of the body’s function. The consequence is a new generation of medical technologies which regulate the natural physiological mechanisms responsible for health and wellbeing. The understanding that there are physiological systems regulated by the Autonomic Nervous System can be used to provide a viable explanation for the function of many medical techniques including those of conventional or complementary origins.

Forschungsartikel

L1 Least Square for Cancer Diagnosis using Gene Expression Data

Xiyi Hang and Fang-Xiang Wu

The performance of most methods for cancer diagnosis using gene expression data greatly depends on careful model selection. Least square for classification has no need of model selection. However, a major drawback prevents it from successful application in microarray data classification: lack of robustness to outliers. In this paper we cast linear regression as a constrained l1-norm minimization problem to greatly alleviate its sensitivity to outliers, and hence the name l1 least square. The numerical experiment shows that l1 least square can match the best performance achieved by support vector machines (SVMs) with careful model selection.

Forschungsartikel

Matrix Frequency Analysis of Oryza Sativa (japonica cultivar-group) Complete Genomes

K. Manikandakumar, S. Muthu Kumaran and R. Srikumar

The genome sequence information is essential to understand the function of extensive arrangements of genes. It is significant to combine all sequence information in a precise database to provide an efficient manner of sequence similarity search. The complete genome analysis, which is one of the essential steps to know their characteristics, is very important. Complete genome analysis is depends on matrix frequency of sequence residue calculation and CGR analysis. In this study, we select rice as the specimen for complete genome analysis. Rice is one of the most essential cereal crops providing food for more than half of the world’s population. Oryza sativa (japonica cultivar-group) species is an important cereal and model monocot. We have generated a matrix frequency for genetic code analysis, which helps in the study of complete genome residues. Here we report the duplets and triplets codon for genetic code analysis of O. sativa chromosomes. We illustrate a new method of Chaos Game Representation, which produces the objects possessing self-similar structure. As per our findings, the average matrix frequency of stop codons is similar to the matrix frequency of start codon. This average is seems to be similar in the complete genome sequences of every Oryza sativa (japonica cultivar-group) chromosomes.

Forschungsartikel

Structural Modeling, Evolution and Ligand Interaction of KMP11 Protein of Different Leishmania Strains

Ganesh Chandra Sahoo, Mukta Rani, Manas Ranjan Dikhit, Waquar Akhtar Ansari and Pradeep Das

The kinetoplastid-specific KMP11 protein was first described for Leishmania donovani associated with the lypophosphoglycan molecule and is localized mainly around the flagellum and flagellar pocket. This protein is well conserved among kinetoplastids and plays an analogous role in all the flagellates, irrespective of their pathogenicity in humans. The structural elucidation of this important protein may bring about information required to target KMP11 to find valid drug candidates. The atomic-resolution model of KMP11 protein of six different Leishmania strains has been determined from its amino acid sequence by using homology modeling. The stereochemical validation of modeled protein has been done by PROCHECK and Profiles-3D scores. The ligand protein interaction of the KMP11 protein models were carried out with several anti-leishmanial drugs i.e. miltefosine, sitamaquine, pentamidine, amphotericin B, SAG (sodium antimony gluconate), leishmanial peptide, paromomycin and vinblastine and an anticancer compound, sulforaphane. Glutamic acid (E) and lysine (K) of KMP11 are the key amino acids during ligand-receptor interaction. From structural and docking analyses, it is hypothesized that KMP11 of a specific Leishmania strain interacts with a specific anti-leishmanial drug candidate i.e. miltefosine interacts only with KMP11 of L. braziliensis but not with KMP11 of any other Leishmania strain. Highest docking score was found in case of pentamidine. Anticarcinogenic compound, sulphoraphane has shown comparable docking scores and H-bonds with KMP11 protein of six Leishmania strains.

Forschungsartikel

Predicting Type 1 Diabetes Candidate Genes using Human Protein-Protein Interaction Networks

Shouguo Gao and Xujing Wang

Background Proteins directly interacting with each other tend to have similar functions and be involved in the same cellular processes. Mutations in genes that code for them often lead to the same family of disease phenotypes. Efforts have been made to prioritize positional candidate genes for complex diseases utilize the protein-protein interaction (PPI) information. But such an approach is often considered too general to be practically useful for specific diseases. Results In this study we investigate the efficacy of this approach in type 1 diabetes (T1D). 266 known disease genes, and 983 positional candidate genes from the 18 established linkage loci of T1D, are compiled from the T1Dbase (http://t1dbase.org). We found that the PPI network of known T1D genes has distinct topological features from others, with significantly higher number of interactions among themselves even after adjusting for their high network degrees (p<1e-5). We then define those positional candidates that are first degree PPI neighbours of the 266 known disease genes to be new candidate disease genes. This leads to a list of 68 genes for further study. Cross validation using the known disease genes as benchmark reveals that the enrichment is ~17.1 fold over random selection, and ~4 fold better than using the linkage information alone. We find that the citations of the new candidates in T1D-related publications are significantly (p<1e-7) more than random, even after excluding the co-citation with the known disease genes; they are significantly over-represented (p<1e-10) in the top 30 GO terms shared by known disease genes. Furthermore, sequence analysis reveals that they contain significantly (p<0.0004) more protein domains that are known to be relevant to T1D. These findings provide indirect validation of the newly predicted candidates. Conclusion Our study demonstrates the potential of the PPI information in prioritizing positional candidate genes for T1D.

Forschungsartikel

OMICS Techniques and Identification of Pathogen Virulence Genes Application to the Analysis of Respiratory Pathogens

Sergio Hernández, Antonio Gómez, Juan Cedano and Enrique Querol

The advent of genomics should have facilitated the identification of microbial virulence factors, a key objective for vaccine design, especially for live attenuated vaccines. It is generally assumed than when the bacterial pathogen infects the host it expresses a set of genes, a number of them being virulence factors. However, up to now, although several Omics methods have been applied to identify virulence genes, i.e., DNA microarrays, In Vivo Expression Technology (IVET), Signature-Tagged Mutagenesis (STM), Differential Fluorescence Induction (DFI), etc., the results are quite meager. Among the genes identified by these techniques there are many related to cellular stress, basal metabolism, etc., which cannot be directly involved in virulence, or at least cannot be considered useful candidates to be deleted for designing a vaccine. Among the genes disclosed by these methodologies there are a number annotated as being hypothetical or unknown proteins. As these ORFs can hide some true virulence factors, we have selected all of these hypothetical proteins from several respiratory diseases and predicted their biological functions by a careful and in-depth analysis of each one. Although some of the re-annotations match with functions that can be related to microbial virulence, it can be concluded that identification of virulence factors remains elusive.

Indiziert in

arrow_upward arrow_upward