Quantitative structure–activity relationship models (QSAR models) are regression or classification models used in the chemical and biological sciences and engineering. Like other regression models, QSAR regression models relate a set of 'predictor' variables (X) to the potency of the response variable (Y), while classification QSAR models relate the predictor variables to a categorical value of the response variable.
How to be an automobile software engineer — Part 1. Parts of this series will elude the software architecture that solves these problems and will hopefully help aspiring software engineers to. Geomagic Design X is the industry's most comprehensive reverse engineering software, combines history-based CAD with 3D scan data processing so you can create feature-based, editable solid models compatible with your existing CAD software. Specifically, COMBINE-ENGINEER will integrate: 1 COMBINE analysis to compute receptor-based 3D-QSARs; 2 Design of ligands and/or proteins to optimise binding as suggested by the COMBINE QSAR model. For proteins, this requires modelling of mutants, performing conformational searches and energetic optimisation. • Experience with 3D modeling tools (ex: Maya, 3D Max, or Blender) • Experience with Unity3D interfacing other external data elements • Create panels for SIL/SAGE ** Bachelor’s degree in Computer Science or equivalent Engineering program. Day in the Life of a Software Engineer Rockwell Collins. Watch these three share more about their challenging assignments and what life is like for a software engineer in several different. Use of pharmacophoric constraints to facilitate 3D QSAR considering multiple conformations, based on the original molecular field generating software GRID, 51 is provided by a family of programs from Molecular Discovery. 52,53 Cresset software employs extrema in ligand fields as guides in ligand alignment. 54 Schrodinger package offers PHASE as.
In QSAR modeling, the predictors consist of physico-chemical properties or theoretical molecular descriptors of chemicals; the QSAR response-variable could be a biological activity of the chemicals. QSAR models first summarize a supposed relationship between chemical structures and biological activity in a[5][6]
As an example, biological activity can be expressed quantitatively as the concentration of a substance required to give a certain biological response. Additionally, when physicochemical properties or structures are expressed by numbers, one can find a mathematical relationship, or quantitative structure-activity relationship, between the two. The mathematical expression, if carefully validated[7][8][9] can then be used to predict the modeled response of other chemical structures.[10][11]
A QSAR has the form of a mathematical model:
- Activity = f(physiochemical properties and/or structural properties) + error
The error includes model error (bias) and observational variability, that is, the variability in observations even on a correct model.
- 3Types
- 4Modeling
- 6Application
Essential steps in QSAR studies[edit]
Principal steps of QSAR/QSPR including (i) Selection of Data set and extraction of structural/empirical descriptors (ii) variable selection, (iii) model construction and (iv) validation evaluation.'[5]
SAR and the SAR paradox[edit]
The basic assumption for all molecule based hypotheses is that similar molecules have similar activities. This principle is also called Structure–Activity Relationship (SAR). The underlying problem is therefore how to define a small difference on a molecular level, since each kind of activity, e.g. reaction ability, biotransformation ability, solubility, target activity, and so on, might depend on another difference. Good examples were given in the bioisosterism reviews by Patanie/LaVoie[12] and Brown.[13]
In general, one is more interested in finding strong trends. Created hypotheses usually rely on a finite number of chemical data. Thus, the induction principle should be respected to avoid overfitted hypotheses and deriving overfitted and useless interpretations on structural/molecular data.
The SAR paradox refers to the fact that it is not the case that all similar molecules have similar activities.
Types[edit]
Fragment based (group contribution)[edit]
Analogously, the 'partition coefficient'—a measurement of differential solubility and itself a component of QSAR predictions—can be predicted either by atomic methods (known as 'XLogP' or 'ALogP') or by chemical fragment methods (known as 'CLogP' and other variations). It has been shown that the logP of compound can be determined by the sum of its fragments; fragment-based methods are generally accepted as better predictors than atomic-based methods.[14] Fragmentary values have been determined statistically, based on empirical data for known logP values. This method gives mixed results and is generally not trusted to have accuracy of more than ±0.1 units.[15]
Group or Fragment based QSAR is also known as GQSAR.[16] GQSAR allows flexibility to study various molecular fragments of interest in relation to the variation in biological response. The molecular fragments could be substituents at various substitution sites in congeneric set of molecules or could be on the basis of pre-defined chemical rules in case of non-congeneric sets. GQSAR also considers cross-terms fragment descriptors, which could be helpful in identification of key fragment interactions in determining variation of activity.[16]Lead discovery using Fragnomics is an emerging paradigm. In this context FB-QSAR proves to be a promising strategy for fragment library design and in fragment-to-lead identification endeavours.[17]
An advanced approach on fragment or group-based QSAR based on the concept of pharmacophore-similarity is developed.[18] This method, pharmacophore-similarity-based QSAR (PS-QSAR) uses topological pharmacophoric descriptors to develop QSAR models. This activity prediction may assist the contribution of certain pharmacophore features encoded by respective fragments toward activity improvement and/or detrimental effects.[18]
3D-QSAR[edit]
The acronym 3D-QSAR or 3-D QSAR refers to the application of force field calculations requiring three-dimensional structures of a given set of small molecules with known activities (training set). The training set needs to be superimposed (aligned) by either experimental data (e.g. based on ligand-protein crystallography) or molecule superimposition software. It uses computed potentials, e.g. the Lennard-Jones potential, rather than experimental constants and is concerned with the overall molecule rather than a single substituent. The first 3-D QSAR was named Comparative Molecular Field Analysis (CoMFA) by Cramer et al. It examined the steric fields (shape of the molecule) and the electrostatic fields[19] which were correlated by means of partial least squares regression (PLS).
The created data space is then usually reduced by a following feature extraction (see also dimensionality reduction). The following learning method can be any of the already mentioned machine learning methods, e.g. support vector machines.[20] An alternative approach uses multiple-instance learning by encoding molecules as sets of data instances, each of which represents a possible molecular conformation. A label or response is assigned to each set corresponding to the activity of the molecule, which is assumed to be determined by at least one instance in the set (i.e. some conformation of the molecule).[21]
On June 18, 2011 the Comparative Molecular Field Analysis (CoMFA) patent has dropped any restriction on the use of GRID and partial least-squares (PLS) technologies.[citation needed]
Chemical descriptor based[edit]
In this approach, descriptors quantifying various electronic, geometric, or steric properties of a molecule are computed and used to develop a QSAR.[22] This approach is different from the fragment (or group contribution) approach in that the descriptors are computed for the system as whole rather than from the properties of individual fragments. This approach is different from the 3D-QSAR approach in that the descriptors are computed from scalar quantities (e.g., energies, geometric parameters) rather than from 3D fields.
An example of this approach is the QSARs developed for olefin polymerization by half sandwich compounds.[23][24]
Modeling[edit]
In the literature it can be often found that chemists have a preference for partial least squares (PLS) methods,[citation needed] since it applies the feature extraction and induction in one step.
Data mining approach[edit]
Computer SAR models typically calculate a relatively large number of features. Because those lack structural interpretation ability, the preprocessing steps face a feature selection problem (i.e., which structural features should be interpreted to determine the structure-activity relationship). Feature selection can be accomplished by visual inspection (qualitative selection by a human); by data mining; or by molecule mining.
A typical data mining based prediction uses e.g. support vector machines, decision trees, artificial neural networks for inducing a predictive learning model.
![Qsar Qsar](/uploads/1/2/5/1/125179293/278322230.jpg)
Molecule mining approaches, a special case of structured data mining approaches, apply a similarity matrix based prediction or an automatic fragmentation scheme into molecular substructures. Furthermore, there exist also approaches using maximum common subgraph searches or graph kernels.[25][26]
QSAR protocol
Matched molecular pair analysis[edit]
Typically QSAR models derived from non linear machine learning is seen as a 'black box', which fails to guide medicinal chemists. Recently there is a relatively new concept of matched molecular pair analysis[27] or prediction driven MMPA which is coupled with QSAR model in order to identify activity cliffs.[28]
Evaluation of the quality of QSAR models[edit]
QSAR modeling produces predictive models derived from application of statistical tools correlating biological activity (including desirable therapeutic effect and undesirable side effects) or physico-chemical properties in QSPR models of chemicals (drugs/toxicants/environmental pollutants) with descriptors representative of molecular structure or properties. QSARs are being applied in many disciplines, for example: risk assessment, toxicity prediction, and regulatory decisions[29] in addition to drug discovery and lead optimization.[30] Obtaining a good quality QSAR model depends on many factors, such as the quality of input data, the choice of descriptors and statistical methods for modeling and for validation. Any QSAR modeling should ultimately lead to statistically robust and predictive models capable of making accurate and reliable predictions of the modeled response of new compounds.
For validation of QSAR models, usually various strategies are adopted:[31]
- internal validation or cross-validation (actually, while extracting data, cross validation is a measure of model robustness, the more a model is robust (higher q2) the less data extraction perturb the original model);
- external validation by splitting the available data set into training set for model development and prediction set for model predictivity check;
- blind external validation by application of model on new external data and
- data randomization or Y-scrambling for verifying the absence of chance correlation between the response and the modeling descriptors.
The success of any QSAR model depends on accuracy of the input data, selection of appropriate descriptors and statistical tools, and most importantly validation of the developed model. Validation is the process by which the reliability and relevance of a procedure are established for a specific purpose; for QSAR models validation must be mainly for robustness, prediction performances and applicability domain (AD) of the models.[7][8][9][32]
Some validation methodologies can be problematic. For example, leave one-out cross-validation generally leads to an overestimation of predictive capacity. Even with external validation, it is difficult to determine whether the selection of training and test sets was manipulated to maximize the predictive capacity of the model being published.
Different aspects of validation of QSAR models that need attention include methods of selection of training set compounds,[33] setting training set size[34] and impact of variable selection[35] for training set models for determining the quality of prediction. Development of novel validation parameters for judging quality of QSAR models is also important.[9][36][37]
Application[edit]
Chemical[edit]
One of the first historical QSAR applications was to predict boiling points.[38]
It is well known for instance that within a particular family of chemical compounds, especially of organic chemistry, that there are strong correlations between structure and observed properties. A simple example is the relationship between the number of carbons in alkanes and their boiling points. There is a clear trend in the increase of boiling point with an increase in the number carbons, and this serves as a means for predicting the boiling points of higher alkanes.
A still very interesting application is the Hammett equation, Taft equation and pKa prediction methods.[39]
Biological[edit]
The biological activity of molecules is usually measured in assays to establish the level of inhibition of particular signal transduction or metabolic pathways. Drug discovery often involves the use of QSAR to identify chemical structures that could have good inhibitory effects on specific targets and have low toxicity (non-specific activity). Of special interest is the prediction of partition coefficient log P, which is an important measure used in identifying 'druglikeness' according to Lipinski's Rule of Five.
While many quantitative structure activity relationship analyses involve the interactions of a family of molecules with an enzyme or receptor binding site, QSAR can also be used to study the interactions between the structural domains of proteins. Protein-protein interactions can be quantitatively analyzed for structural variations resulted from site-directed mutagenesis.[40]
It is part of the machine learning method to reduce the risk for a SAR paradox, especially taking into account that only a finite amount of data is available (see also MVUE). In general, all QSAR problems can be divided into coding[41] and learning.[42]
Applications[edit]
(Q)SAR models have been used for risk management. QSARS are suggested by regulatory authorities; in the European Union, QSARs are suggested by the REACH regulation, where 'REACH' abbreviates 'Registration, Evaluation, Authorisation and Restriction of Chemicals'.
The chemical descriptor space whose convex hull is generated by a particular training set of chemicals is called the training set's applicability domain. Prediction of properties of novel chemicals that are located outside the applicability domain uses extrapolation, and so is less reliable (on average) than prediction within the applicability domain. The assessment of the reliability of QSAR predictions remains a research topic.
The QSAR equations can be used to predict biological activities of newer molecules before their synthesis.
Examples of machine learning tools for QSAR modeling include:[43]
S.No. | Name | Algorithms | External link |
---|---|---|---|
1. | R | RF,SVM, Naïve Bayesian, and ANN | 'R: The R Project for Statistical Computing'. |
2. | libSVM | SVM | 'LIBSVM -- A Library for Support Vector Machines'. |
3. | Orange | RF, SVM, and Naïve Bayesian | 'Orange Data Mining'. |
4. | RapidMiner | SVM, RF, Naïve Bayes, DT, ANN, and k-NN | 'RapidMiner | #1 Open Source Predictive Analytics Platform'. |
5. | Weka | RF, SVM, and Naïve Bayes | 'Weka 3 - Data Mining with Open Source Machine Learning Software in Java'. |
6. | Knime | DT, Naïve Bayes, and SVM | 'KNIME | Open for Innovation'. |
7. | AZOrange[44] | RT, SVM, ANN, and RF | 'AZCompTox/AZOrange: AstraZeneca add-ons to Orange'. GitHub. 2018-09-19. |
8. | Tanagra | SVM, RF, Naïve Bayes, and DT | 'TANAGRA - A free DATA MINING software for teaching and research'. |
9. | Elki | k-NN | 'ELKI Data Mining Framework'. |
10. | MALLET | 'MALLET homepage'. | |
11. | MOA | 'MOA Massive Online Analysis | Real Time Analytics for Data Streams'. | |
12. | Deep Chem | Logistic Regression, Naive Bayes, RF, ANN, and others | 'DeepChem'. deepchem.io. Retrieved 20 October 2017. |
See also[edit]
- Computer-assisted drug design (CADD)
- QSAR & Combinatorial Science – Scientific journal
- Chemicalize.org:List of predicted structure based properties
References[edit]
- ^Roy K, Kar S, Das RN (2015). 'Chapter 1.2: What is QSAR? Definitions and Formulism'. A primer on QSAR/QSPR modeling: Fundamental Concepts. New York: Springer-Verlag Inc. pp. 2–6. ISBN978-3-319-17281-1.
- ^Ghasemi, Pérez-Sánchez; Mehri, Pérez-Garrido (2018). 'Neural network and deep-learning algorithms used in QSAR studies: merits and drawbacks'. Drug Discovery Today. 23 (10): 1784–1790. doi:10.1016/j.drudis.2018.06.016. PMID29936244.
- ^Nantasenamat C, Isarankura-Na-Ayudhya C, Naenna T, Prachayasittikul V (2009). 'A practical overview of quantitative structure-activity relationship'. Excli J. 8: 74–88. doi:10.17877/DE290R-690.
- ^Nantasenamat C, Isarankura-Na-Ayudhya C, Prachayasittikul V (Jul 2010). 'Advances in computational methods to predict the biological activity of compounds'. Expert Opinion on Drug Discovery. 5 (7): 633–54. doi:10.1517/17460441.2010.492827. PMID22823204.
- ^ abYousefinejad S, Hemmateenejad B (2015). 'Chemometrics tools in QSAR/QSPR studies: A historical perspective'. Chemometrics and Intelligent Laboratory Systems. 149, Part B: 177–204. doi:10.1016/j.chemolab.2015.06.016.
- ^Ghasemi, Pérez-Sánchez; Mehri, fassihi (2016). 'The Role of Different Sampling Methods in Improving Biological Activity Prediction Using Deep Belief Network'. Journal of Computational Chemistry. 38 (10): 1–8. doi:10.1002/jcc.24671. PMID27862046.
- ^ abTropsha A, Gramatica P, Gombar VJ (2003). 'The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models'. QSAR &Comb. Sci. 22: 69–77. doi:10.1002/qsar.200390007.
- ^ abGramatica P (2007). 'Principles of QSAR models validation: internal and external'. QSAR &Comb. Sci. 26 (5): 694–701. doi:10.1002/qsar.200610151.
- ^ abcChirico N, Gramatica P (Aug 2012). 'Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection'. Journal of Chemical Information and Modeling. 52 (8): 2044–58. doi:10.1021/ci300084j. PMID22721530.
- ^Ghasemi, Pérez-Sánchez; Mehri, fassihi (2018). 'Deep neural network in QSAR studies using deep belief network'. Applied Soft Computing. 62: 251–259. doi:10.1016/j.asoc.2017.09.040.
- ^Tropsha, Alexander (2010). 'Best Practices for QSAR Model Development, Validation, and Exploitation'. Molecular Informatics. 29 (6–7): 476–488. doi:10.1002/minf.201000061. ISSN1868-1743. PMID27463326.
- ^Patani GA, LaVoie EJ (Dec 1996). 'Bioisosterism: A Rational Approach in Drug Design'. Chemical Reviews. 96 (8): 3147–3176. doi:10.1021/cr950066q. PMID11848856.
- ^Brown N (2012). Bioisosteres in Medicinal Chemistry. Weinheim: Wiley-VCH. ISBN978-3-527-33015-7.
- ^Thompson SJ, Hattotuwagama CK, Holliday JD, Flower DR (2006). 'On the hydrophobicity of peptides: Comparing empirical predictions of peptide log P values'. Bioinformation. 1 (7): 237–41. doi:10.6026/97320630001237. PMC1891704. PMID17597897.
- ^Wildman SA, Crippen GM (1999). 'Prediction of physicochemical parameters by atomic contributions'. J. Chem. Inf. Comput. Sci. 39 (5): 868–873. doi:10.1021/ci990307l.
- ^ abAjmani S, Jadhav K, Kulkarni SA. 'Group-Based QSAR (G-QSAR)'.
- ^Manoharan P, Vijayan RS, Ghoshal N (Oct 2010). 'Rationalizing fragment based drug discovery for BACE1: insights from FB-QSAR, FB-QSSR, multi objective (MO-QSPR) and MIF studies'. Journal of Computer-Aided Molecular Design. 24 (10): 843–64. Bibcode:2010JCAMD.24.843M. doi:10.1007/s10822-010-9378-9. PMID20740315.
- ^ abPrasanth Kumar S, Jasrai YT, Pandya HA, Rawal RM (November 2013). 'Pharmacophore-similarity-based QSAR (PS-QSAR) for group-specific biological activity predictions'. Journal of Biomolecular Structure & Dynamics. 33 (1): 56–69. doi:10.1080/07391102.2013.849618. PMID24266725.
- ^Leach AR (2001). Molecular modelling: principles and applications. Englewood Cliffs, N.J: Prentice Hall. ISBN978-0-582-38210-7.
- ^Vert JP, Schölkopf B, Tsuda K (2004). Kernel methods in computational biology. Cambridge, Mass: MIT Press. ISBN978-0-262-19509-6.
- ^Dietterich TG, Lathrop RH, Lozano-Pérez T (1997). 'Solving the multiple instance problem with axis-parallel rectangles'. Artificial Intelligence. 89 (1–2): 31–71. doi:10.1016/S0004-3702(96)00034-3.
- ^Caruthers JM, Lauterbach JA, Thomson KT, Venkatasubramanian V, Snively CM, Bhan A, Katare S, Oskarsdottir G (2003). 'Catalyst design: knowledge extraction from high-throughput experimentation'. J. Catal. 216 (1–2): 3776–3777. doi:10.1016/S0021-9517(02)00036-2.
- ^Manz TA, Phomphrai K, Medvedev G, Krishnamurthy BB, Sharma S, Haq J, Novstrup KA, Thomson KT, Delgass WN, Caruthers JM, Abu-Omar MM (Apr 2007). 'Structure-activity correlation in titanium single-site olefin polymerization catalysts containing mixed cyclopentadienyl/aryloxide ligation'. Journal of the American Chemical Society. 129 (13): 3776–7. doi:10.1021/ja0640849. PMID17348648.
- ^Manz TA, Caruthers JM, Sharma S, Phomphrai K, Thomson KT, Delgass WN, Abu-Omar MM (2012). 'Structure–Activity Correlation for Relative Chain Initiation to Propagation Rates in Single-Site Olefin Polymerization Catalysis'. Organometallics. 31 (2): 602–618. doi:10.1021/om200884x.
- ^Gusfield D (1997). Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge, UK: Cambridge University Press. ISBN978-0-521-58519-4.
- ^Helma C (2005). Predictive toxicology. Washington, DC: Taylor & Francis. ISBN978-0-8247-2397-2.
- ^Dossetter AG, Griffen EJ, Leach AG (2013). 'Matched molecular pair analysis in drug discovery'. Drug Discovery Today. 18 (15–16): 724–31. doi:10.1016/j.drudis.2013.03.003. PMID23557664.
- ^Sushko Y, Novotarskyi S, Körner R, Vogt J, Abdelaziz A, Tetko IV (2014). 'Prediction-driven matched molecular pairs to interpret QSARs and aid the molecular optimization process'. Journal of Cheminformatics. 6 (1): 48. doi:10.1186/s13321-014-0048-0. PMC4272757. PMID25544551.
- ^Tong W, Hong H, Xie Q, Shi L, Fang H, Perkins R (April 2005). 'Assessing QSAR Limitations – A Regulatory Perspective'. Current Computer-Aided Drug Design. 1 (2): 195–205. doi:10.2174/1573409053585663.
- ^Dearden JC (2003). 'In silico prediction of drug toxicity'. Journal of Computer-Aided Molecular Design. 17 (2–4): 119–27. Bibcode:2003JCAMD.17.119D. doi:10.1023/A:1025361621494. PMID13677480.
- ^Wold S, Eriksson L (1995). 'Statistical validation of QSAR results'. In Waterbeemd, Han van de (ed.). Chemometric methods in molecular design. Weinheim: VCH. pp. 309–318. ISBN978-3-527-30044-0.
- ^Roy K (Dec 2007). 'On some aspects of validation of predictive quantitative structure-activity relationship models'. Expert Opinion on Drug Discovery. 2 (12): 1567–77. doi:10.1517/17460441.2.12.1567. PMID23488901.
- ^Leonard JT, Roy K (2006). 'On selection of training and test sets for the development of predictive QSAR models'. QSAR & Combinatorial Science. 25 (3): 235–251. doi:10.1002/qsar.200510161.
- ^Roy PP, Leonard JT, Roy K (2008). 'Exploring the impact of size of training sets for the development of predictive QSAR models'. Chemometrics and Intelligent Laboratory Systems. 90 (1): 31–42. doi:10.1016/j.chemolab.2007.07.004.
- ^Put R, Vander Heyden Y (Oct 2007). 'Review on modelling aspects in reversed-phase liquid chromatographic quantitative structure-retention relationships'. Analytica Chimica Acta. 602 (2): 164–72. doi:10.1016/j.aca.2007.09.014. PMID17933600.
- ^Pratim Roy P, Paul S, Mitra I, Roy K (2009). 'On two novel parameters for validation of predictive QSAR models'. Molecules. 14 (5): 1660–701. doi:10.3390/molecules14051660. PMC6254296. PMID19471190.
- ^Chirico N, Gramatica P (Sep 2011). 'Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient'. Journal of Chemical Information and Modeling. 51 (9): 2320–35. doi:10.1021/ci200211n. PMID21800825.
- ^Rouvray DH, Bonchev D (1991). Chemical graph theory: introduction and fundamentals. Tunbridge Wells, Kent, England: Abacus Press. ISBN978-0-85626-454-2.
- ^Fraczkiewicz, R (2013). 'In Silico Prediction of Ionization'. In Reedijk, J (ed.). Reference Module in Chemistry, Molecular Sciences and Chemical Engineering. Reference Module in Chemistry, Molecular Sciences and Chemical Engineering [Online]. vol. 5. Amsterdam, The Netherlands: Elsevier. doi:10.1016/B978-0-12-409547-2.02610-X. ISBN9780124095472.
- ^Freyhult EK, Andersson K, Gustafsson MG (Apr 2003). 'Structural modeling extends QSAR analysis of antibody-lysozyme interactions to 3D-QSAR'. Biophysical Journal. 84 (4): 2264–72. Bibcode:2003BpJ..84.2264F. doi:10.1016/S0006-3495(03)75032-2. PMC1302793. PMID12668435.
- ^Timmerman H, Todeschini R, Consonni V, Mannhold R, Kubinyi H (2002). Handbook of Molecular Descriptors. Weinheim: Wiley-VCH. ISBN978-3-527-29913-3.
- ^Duda RO, Hart PW, Stork DG (2001). Pattern classification. Chichester: John Wiley & Sons. ISBN978-0-471-05669-0.
- ^Lavecchia A (Mar 2015). 'Machine-learning approaches in drug discovery: methods and applications'. Drug Discovery Today. 20 (3): 318–31. doi:10.1016/j.drudis.2014.10.012. PMID25448759.
- ^Stålring JC, Carlsson LA, Almeida P, Boyer S (2011). 'AZOrange - High performance open source machine learning for QSAR modeling in a graphical programming environment'. Journal of Cheminformatics. 3: 28. doi:10.1186/1758-2946-3-28. PMC3158423. PMID21798025.
Further reading[edit]
- Selassie CD (2003). 'History of Quantitative Structure-Activity Relationships'(PDF). In Abraham DJ (ed.). Burger's medicinal Chemistry and Drug Discovery. 1 (6th ed.). New York: Wiley. pp. 1–48. ISBN978-0-471-27401-8.
- Shityakov S, Puskás I, Roewer N, Förster C, Broscheit J (2014). 'Three-dimensional quantitative structure-activity relationship and docking studies in a series of anthocyanin derivatives as cytochrome P450 3A4 inhibitors'. Advances and Applications in Bioinformatics and Chemistry. 7: 11–21. doi:10.2147/AABC.S56478. PMC3970920. PMID24741320.
- Roy K, Kar S, Das RN (2015) Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment, Academic Press, NY, 2015, https://www.elsevier.com/books/understanding-the-basics-of-qsar-for-applications-in-pharmaceutical-sciences-and-risk-assessment/roy/978-0-12-801505-6
- Roy K (2015), Quantitative Structure-Activity Relationships in Drug Design, Predictive Toxicology, and Risk Assessment, IGI Global, PA, http://www.igi-global.com/book/quantitative-structure-activity-relationships-drug/120080
- Roy K (2017), Advances in QSAR Modeling. Applications in Pharmaceutical, Chemical, Food, Agricultural and Environmental Sciences. Springer, http://www.springer.com/in/book/9783319568492
- Dearden JC (2016) The History and Development of Quantitative Structure-Activity Relationships (QSARs). International Journal of Quantitative Structure-Property Relationships (IJQSPR) 1(1), 1-44, http://dx.doi.org/10.4018/IJQSPR.2016010101
- Roy K, Kar S, Das RN, A Primer on QSAR/QSPR Modeling: Fundamental Concepts (SpringerBriefs in Molecular Science), Springer, 2015, http://www.springer.com/gp/book/9783319172804
- Roy K, Kar S, Importance of Applicability Domain of QSAR Models. In: Quantitative Structure-Activity Relationships in Drug Design, Predictive Toxicology, and Risk Assessment (Roy K, Ed), IGI Global, 2015, 180-211, http://dx.doi.org/10.4018/978-1-4666-8136-1.ch005
External links[edit]
- 'The Cheminformatics and QSAR Society'. Retrieved 2009-05-11.
- 'The 3D QSAR Server'. Retrieved 2011-06-18.
- Verma, Rajeshwar P.; Hansch, Corwin (2007). 'Nature Protocols: Development of QSAR models using C-QSAR program'. Protocol Exchange. doi:10.1038/nprot.2007.125. Archived from the original on 2007-05-01. Retrieved 2009-05-11.
A regression program that has dual databases of over 21,000 QSAR models
- 'QSAR World'. Archived from the original on 2009-04-25. Retrieved 2009-05-11.
A comprehensive web resource for QSAR modelers
- Chemoinformatics Tools, Drug Theoretics and Cheminformatics Laboratory
Retrieved from 'https://en.wikipedia.org/w/index.php?title=Quantitative_structure–activity_relationship&oldid=899338084'
Published online 2016 Aug 17. doi: 10.1186/s12900-016-0063-7
PMID: 27534744
Associated Data
Qsar Software List
The raw output data will not be published, but will be made available upon request.
Abstract
Background
The Plasmodium falciparum M18 Aspartyl Aminopeptidase (PfM18AAP) is only aspartyl aminopeptidase which is found in the genome of P. falciparum and is essential for its survival. The PfM18AAP enzyme performs various functions in the parasite and the erythrocytic host such as hemoglobin digestion, erythrocyte invasion, parasite growth and parasite escape from the host cell. It is a valid target to develop antimalarial drugs. In the present work, we employed 3D QSAR modeling, pharmacophore modeling, and molecular docking to identify novel potent inhibitors that bind with M18AAP of P. falciparum.
Results
The PLSR QSAR model showed highest value for correlation coefficient r2 (88 %) and predictive correlation coefficient (pred_r2) =0.6101 for external test set among all QSAR models. The pharmacophore modeling identified DHRR (one hydrogen donor, one hydrophobic group, and two aromatic rings) as an essential feature of PfM18AAP inhibitors. The combined approach of 3D QSAR, pharmacophore, and structure-based molecular docking yielded 10 novel PfM18AAP inhibitors from ChEMBL antimalarial library, 2 novel inhibitors from each derivative of quinine, chloroquine, 8-aminoquinoline and 10 novel inhibitors from WHO antimalarial drugs. Additionally, high throughput virtual screening identified top 10 compounds as antimalarial leads showing G-scores -12.50 to -10.45 (in kcal/mol), compared with control compounds(G-scores -7.80 to -4.70) which are known antimalarial M18AAP inhibitors (AID743024). This result indicates these novel compounds have the best binding affinity for PfM18AAP.
Conclusion
The 3D QSAR models of PfM18AAP inhibitors provided useful information about the structural characteristics of inhibitors which are contributors of the inhibitory potency. Interestingly, In this studies, we extrapolate that the derivatives of quinine, chloroquine, and 8-aminoquinoline, for which there is no specific target has been identified till date, might show the antimalarial effect by interacting with PfM18AAP.
Electronic supplementary material
The online version of this article (doi:10.1186/s12900-016-0063-7) contains supplementary material, which is available to authorized users.
Keywords: Plasmodium falciparum, M18 aspartyl aminopeptidase, 3D QSAR, PLSR, PCR, kNN-MFA, Molecular docking, HTVS, Pharmacophore modeling
Background
Malaria, a mosquito-borne disease, kills roughly 627000 people every year, mostly infants in Africa. It affects about 198 million patients annually (World Health Organization, 2013, http://www.who.int/malaria/media/en/). It is caused by parasites which are clubbed under genus Plasmodium. Among them, P. falciparum is encountered most commonly and is deadliest []. Though there are myriad drugs to treat the menace but the increasing instances of resistance against antimalarial drugs are becoming a deepening concern day by day. In recent years, several cases of resistance have been detected across the globe against artemisinin drugs []. This underscores the need to discover resilient drugs to combat malaria in future. Therefore, in this effort, several molecular drug targets have been identified to develop new drug candidates. An important drug target is M18 aspartyl aminopeptidase (M18AAP) which is expressed in the cytoplasm of P. falciparum by a single copy of PfM18AAP gene. M18AAP interacts with the human erythrocyte membrane protein Spectrin and other proteins during disease kicking off erythrocytic life cycle, and it is essential for the survival of this parasite in Blood cells. It has been reported that the malaria parasites mutated with M18AAP enzyme are not able to survive, which proves that this plays a critical role in the survival of P. falciparum and could serve as an important molecular target to develop potential therapeutic agents to control malaria infection []. In modern times, virtual screening methods like QSAR, pharmacophore modeling, molecular docking have been proved a valuable tool for rapid discovery of novel drug candidates, e.g., the discovery of O-Acetyl-L-Serine Sulfhydrylase of Entamoeba histolytica inhibitors, acetylcholinesterase inhibitors, and antagonists Acetophenazine, fluphenazine and periciazine against Human androgen receptor [–]. In the drug development, the study of Quantitative structure-activity relationships (QSAR) plays an important role to analyze the properties of drugs. QSAR is a mathematical model that relates chemical descriptors of compounds to their quantity showing specific biological or chemical activity []. The molecular descriptors for the compounds are calculated and used to derive QSAR Model []. In the present study, the known bioactive dataset was used to build 3D QSAR models using partial least square regression (PLSR) [9], principal component regression (PCR) [10, ] and k-nearest neighbor-molecular field analysis (kNN-MFA) methods []. After that, pharmacophore mapping was performed to identify the binding modes and structural features of the ligands and followed by molecular docking. The generated models provided a valuable reference which could be applied in the designing of pharmaceuticals with improved antimalarial activity. In the end, virtual screening of antimalarial compounds from ChEMBL Bioassay, and other dataset were also carried out to identify novel potential inhibitors which could be better as compared to the known inhibitors of PfM18AAP.
Methods
Dataset of experimental PfM18AAP inhibitors
A dataset of 32 compounds known as inhibitors of PfM18AAP was extracted from National Center for Biotechnology Information PubChem bioassay (AID 743024) (https://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=743024). Another high throughput screened dataset of 3502 known bioactive inhibitors of PfM18AAP was extracted from AID 1822 used for docking studies against PfM18AAP (http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=1822). A library of 153,873 compounds was obtained from the ChEMBL antimalarial database used for finding novel inhibitors against PfM18AAP metalloproteinase [https://www.ebi.ac.uk/chembl/]. Additionally, 27 antimalarial drugs described by WHO, 32 analogous of quinine compounds(QN) (AID 660170), 24 analogous of chloroquine (CQ) (AID 404780), and 17 analogous of 8-aminoquinoline(8-AmQN) (AID 554037) were also extracted for molecular docking, 3D QSAR model and pharmacophore similarity search. 2D structures were converted to 3D structures using Corina 2.64v [13] and open babel [].
Molecular descriptors
The molecular descriptors were calculated by VLifeMDS version 4.3 using Gasteiger-Marsili charge [15, 16]. The PfM18AAP inhibitors compounds along with their activity pIC50 values were given as input for force field calculation. The steric and electrostatic interaction energies are computed using a methyl probe of charge +1.
Development of 3D QSAR models
The biological activity (pIC50) of inhibitors was selected as dependent variables and descriptors as independent variables. The 60 % data for the training set and 40 % for test set were manually selected. The unicolumn statistics were calculated to validate training and test sets. The 3D QSAR models were built using PLSR, PCR, and kNN-MFA by stepwise forward-backward method [17].
3D QSAR Model validation
Internal validation
To perform internal validation (cross validation), a compound is eliminated from the training set and then its biological activity is predicted to validate model accuracy. This step is repeated until the biological activity of every compound in the training set is predicted once. The cross-validated coefficient, q2 is calculated using the given Eq. (1):
1
Where, yi and ŷi are the actual and predicted activities of the ith molecule in the training set respectively, and ymeans is the average activity of all the molecules in the training set [18, ].
External validation
External validation (pred_r2) is carried out by calculating predicted correlation coefficient (pred_r2) value using following Eq. (2):
2
Where, yi and ŷi are the actual and predicted activities of the ith molecule in the test set, respectively, and ymeans is the average activity of all the molecules in the training set.
A Z-score value is calculated by the following Eq. (3):
3
Where, h is the q2 value calculated for the actual dataset, μ is the average q2and σ is the standard deviation calculated for various models built on different random datasets [].
F-test is Fisher value which indicates statistical significance, a value greater than 30 is considered good, which gives an idea of the chances of failure of the model. On the other hand, q2_se is the standard deviation of cross validated prediction and r2_se is standard deviation is a measure of the absolute quality of a model.
Pharmacophore modeling
The pharmacophore model was built using the Phase module of Schrodinger maestro [21]. The same set of inhibitors of PfM18AAP was subjected to LigPrep module which produces high-quality, all-atom 3D structures with correct chirality. Some pharmacophore hypotheses were generated along with their respective set of aligned conformations. These hypotheses were generated by a systematic variation of many sites and a number of matching active compounds. These selected features were used to build a series of pharmacophore hypotheses by selecting find the common pharmacophore option in phase. The common pharmacophore hypotheses were analyzed using the survival score to yield the best alignment of the active ligands using a maximum overall root mean square deviation (RMSD) value of 2 Å for distance tolerance. Finally, several pharmacophore hypotheses were generated along with their respective set of aligned conformations. All pharmacophore hypotheses were scored for active survival, inactive survival, site, vector, volume, the number of matches, selectivity, energy, active, and inactive terms. Survival score secured by each hypothesis is the measure of the quality of alignment for a particular hypothesis [22].
Docking and scoring
Molecular docking
To understand the nature of the interaction of inhibitors described above [23] with PfM18AAP, molecular docking was performed using GOLD v5.2 (Genetic Optimization for Ligand Docking) [24] and GLIDE module of Schrödinger using [21] against the PfM18AAP. The crystal structure of PfM18AAP (4EME) was obtained from protein data bank (www.rcsb.org/pdb/explore/explore.do?structureId=4eme). Since PfM18AAP requires cofactors for enzymatic activity, Zn was retained during docking analysis []. In GOLD docking, the 10 best docked complexes were ranked based on their GOLD fitness score. In GLIDE docking, the top 10 compounds were selected based on G-score. The binding affinity of docked complex was calculated using X-Score v1.2.1 []. Protein-ligand interaction was analyzed by using Pymol version 1.1r. www.pymol.org/ and LigPlot + v1.4.5 [].
Screening of PfM18AAP inhibitors
In this work, High Throughput Virtual Screening (HTVS) used Glide module of the Schrodinger software suite [21]. The ligand libraries were first prepared by adding hydrogen and generating conformations through the LigPrep module. This LigPrep module generated tautomer with the OPLS2005 force field, the total no. of 411,766 output structures were obtained. Then grid on the protein active site was generated. Firstly, HTVS for every ligand library was done and the top 1000 ranked compounds from every library were subjected to Extra-Precision (XP) screening. In both the cases, the structures were flexibly docked on the protein structure. The non-planar conformations were penalized. Structures were having more than 200 atoms or more than 35 rotatable bonds were not docked. Also, the Van Der Waal’s radius scaling factor was set to 0.8, and the partial charge cutoff was set to 0.15. From these 1000 compounds, the top 10 compounds from every library were extracted as target-bound complexes. These complexes were re-scored, and their binding affinity was calculated using X-score software.
Results
3D QSAR modeling using PLSR Method
A dataset known as inhibitors of PfM18AAP (AID: 743024) was used for the unicolumn statistics analysis, which showed that the training and test sets were suitable for 3D QSAR model development. The test set is interpolative i.e. derived within the min-max range of the training set. The unicolumn statistics scores were shown in Table 1. The PLSR model demonstrated that descriptors S_356, S_660, E_996, and S_270 are important features to inhibit the activity of PfM18AAP, which represent steric and electrostatic field energy of interactions. The statistical parameters calculated for developed 3D QSAR model for PLSR shown in Table 2. The number suffixed with descriptors represents its position on the 3D spatial grid.
Table 1
DataSet | Column Name | Average | Maximum | Minimum | Standard Deviation | Sum |
---|---|---|---|---|---|---|
Training | pIC50 | 5.6527 | 6.7200 | 5.1020 | 0.4450 | 90.4430 |
Test | pIC50 | 5.6559 | 6.3400 | 4.9200 | 0.4849 | 62.2146 |
Table 2
The statistical parameters for PLSR, PCR and 3D-QSAR models
Dependent variable | ZScore r2 | ZScore q2 | BestRand r2 | BestRand q2 | Z-Score Pred r2 | Best-Rand Pred r2 |
---|---|---|---|---|---|---|
PLSR pIC50 | 5.96671 | 2.43240 | 0.46222 | -0.23735 | 1.64037 | 0.44031 |
PCR pIC50 | 5.11408 | 2.20918 | 0.43798 | 0.09365 | 1.39477 | 0.21574 |
Equation 4 represents the PLSR 3D QSAR model:
pIC50 = ‐0.0270 (S_356) + 0.0182(S_660) ‐ 0.0905(E_996) ‐ 0.0125(S_270) + 6.1966
3D QSAR modeling using PCR
The 3D QSAR Model was developed on the same datasets of molecules by PCR method, and several statistical parameters were calculated which are shown in Table 2. The number suffixed with descriptors represents its position on the 3D spatial grid. This model indicated that descriptors are significant for their biological activities.
Equation 5 represents PCR 3D QSAR model:
pIC50 = ‐0.0321(S_356) + 0.0147(S_660) ‐ 0.0886(E_996) ‐ 0.0092(S_270) + 6.3423
3D QSAR Modeling using (kNN-MFA)
The kNN-MFA model shown that the contributing descriptors E_862 (1.0026 1.1562), S_629 (-0.4639 -0.1045) and S_287 (-0.3372, -0.2663) which indicated that degree of amino group shows potent activity. The range at the lattice point E_862 (1.0026, 1.1562) which is positive that means substitution with more electron density could yield more active molecules.
Pharmacophore-based screening of PfM18AAP inhibitors
From the Phase Software, ten hypotheses (pharmacophore models) were generated having four features DHRR (one hydrogen bond donor (D), hydrophobic groups (H) and two aromatic rings (R)). These features were common to all of the 15 compounds of the assay. Common pharmacophore hypothesis is shown in Fig. 1. The best model was chosen based on the survival score and pharmacophore based QSAR. The final hypothesis, DHRR.31 model, was selected based on the survival score and pharmacophore based QSAR, which showed the best alignment of the active set along with the site score (0.79), vector score (0.949), and volume score (0.527), top 5 model is shown in Table 3.
Diagram showing pharmacophore alignments of known PfM18AAP inhibitors (AID 743024)
Table 3
The statistical values of top 5 the pharmacophore hypotheses
ID | Survival | Survival inactive | Site | Vector | Volume | Selectivity | Matches | Energy | Activity | Inactive |
---|---|---|---|---|---|---|---|---|---|---|
DHRR.31 | 11.068 | 9.052 | 0.79 | 0.949 | 0.527 | 1.466 | 14 | 0.001 | 6.34 | 2.016 |
DHRR.27 | 11.068 | 9.052 | 0.79 | 0.949 | 0.527 | 1.466 | 14 | 0.001 | 6.34 | 2.016 |
DHRR.6 | 10.941 | 8.863 | 0.78 | 0.943 | 0.548 | 1.471 | 14 | 0.551 | 6.22 | 2.079 |
DHRR.15 | 10.941 | 8.863 | 0.78 | 0.943 | 0.548 | 1.471 | 14 | 0.551 | 6.22 | 2.079 |
DHRR.26 | 10.892 | 8.81 | 0.79 | 0.944 | 0.535 | 1.47 | 14 | 0 | 6.15 | 2.082 |
Molecular docking
The same data set used for QSAR and Pharmacophore modeling was subjected to the molecular docking analysis. The top 10 compounds showed GOLD fitness score from 60.62 to 39.81 and predicted binding energy from -6.43 to -7.38 kcal/mol (calculated using the X-Score) and G-score from -7.80 to -4.70 kcal/mol (Table 4). The Ligplot + analysis showed that Ser116 and His87 amino acids interact by h-bond interaction, with docked ligands. Since PfM18AAP requires a cofactor for enzymatic activity, docking was performed along with cofactor bound with specific amino acids. A docked complex is depicted in Fig. 2. These results suggest that the novel PfM18AAP inhibitors could be designed considering parameters of docking results leading to new potent drugs against malaria.
Table 4
Top scoring compounds screened using the selected pharmacophore hypothesis
Compound ID | G-Score (kcal/mol) | Align Score | Vector Score | Volume Score | Fitness | Predicted activity (pIC50) |
---|---|---|---|---|---|---|
CHEMBL588000 | -10.33 | 1.4702 | 0.0537 | 0.3833 | 0.2119 | 5.72 |
CHEMBL587141 | -10.12 | 0.8484 | 0.8644 | 0.4971 | 1.6545 | 5.83 |
CHEMBL529157 | -9.81 | 1.7562 | 0.3816 | 0.3651 | 0.2833 | 5.85 |
CHEMBL528484 | -9.79 | 1.5091 | 0.6425 | 0.3672 | 0.7521 | 5.86 |
CHEMBL532976 | -9.52 | 1.2208 | 0.7452 | 0.2344 | 0.9623 | 6.07 |
CHEMBL2414638 | -9.41 | 0.4596 | 0.9888 | 0.3135 | 1.9194 | 5.97 |
CHEMBL601831 | -9.37 | 1.0285 | 0.6596 | 0.29897 | 1.1014 | 5.85 |
CHEMBL390368 | -9.24 | 1.0146 | 0.9530 | 0.3788 | 1.4863 | 5.89 |
CHEMBL591216 | -8.72 | 0.5189 | 0.6304 | 0.3387 | 1.5367 | 5.84 |
CHEMBL465847 | -8.08 | 0.6220 | 0.7935 | 0.3477 | 1.6228 | 5.87 |
Docked Complex of PfM18AAP with known ligand 4-[(7-chloroquinolin-4-yl) amino]-2-(diethylaminomethyl) phenol
Molecular docking analysis was done on another dataset (AID1822:3502 molecules from PubChem Bioassay) known inhibitors of PfM18AAP. The top 10 compounds showed G-score from -7.72 to -6.52 kcal/mol. The G-score indicated that these compounds (Table 5) might bind to pfM18AAP with good binding affinity. Further, predicted binding affinity calculated using X-score for best compounds was found to be in between from -9.54 to -6.51 kcal/mol (Table 5).
Table 5
Prediction of pIC50 Value of current antimalarial drugs described in the WHO
Compound ID | Generic Name | G-Score (kcal/mol) | Align Score | Vector Score | Volume Score | Fitness | Predicted activity (pIC50) |
---|---|---|---|---|---|---|---|
CHEMBL76 | Chloroquine | -3.80 | 0.1086 | 0.9996 | 0.5197 | 2.4288 | 6.208 |
CHEMBL1535 | Hydroxychloroquine | -4.53 | 0.1995 | 0.9973 | 0.3341 | 2.1652 | 6.207 |
CHEMBL303933 | Piperaquine | -5.30 | 0.2720 | 0.9781 | 0.3049 | 2.0563 | 6.19 |
CHEMBL506 | Primaquine | -5.86 | 0.4463 | 0.8889 | 0.4954 | 2.0124 | 6.192 |
CHEMBL2104009 | Amquinate | -5.41 | 0.6142 | 0.9499 | 0.355 | 1.7931 | 6.205 |
CHEMBL416956 | Mefloquine | -5.28 | 0.5712 | 0.7183 | 0.3248 | 1.5672 | 6.20 |
CHEMBL682 | Amodiaquine | -4.48 | 0.6480 | 0.7838 | 0.2687 | 1.5126 | 6.385 |
CHEMBL36 | Pyrimethamine | -5.07 | 0.9521 | 0.9257 | 0.3390 | 1.4712 | 6.207 |
CHEMBL339049 | Tebuquine | -4.55 | 0.6093 | 0.7422 | 0.2286 | 1.4630 | 6.264 |
CHEMBL35228 | Pyronaridine | -5.68 | 0.6975 | 0.8185 | 0.2050 | 1.4422 | 6.198 |
HTVS based screening of PfM18AAP inhibitors
ChEMBL antimalarial dataset (153873) was subjected to molecular docking. The top 10 compounds (after docking), based on their G-score are shown in Table 6. The glide score of these compounds varies from -12.50 to -10.45 kcal/mol. The G-score indicated that these compounds (Table 6) have a good binding affinity for PfM18AAP enzyme. Figure 3 shows the docked complex of ligand CHEMBL1506682 (2-(3,4-Dihydroxyphenyl)-5,7-dihydroxy-4-oxo-4H-chromen-3-yl hexopyranosid-uronic acid) in the active site of the receptor with best G-score (-12.50 kcal/mol).To further validate in silico, predicted binding affinity of the best pose obtained from docking studies for each compound was calculated using X-score program was found to be in between -8.28 and -6.89 kcal/mol shown in Table 6.
Table 6
Top scoring of QN, CQ and 8 Amino-QN analogous screened using the selected pharmacophore hypothesis
IUPAC Name | G-Score (kcal/mol) | Align Score | Vector Score | Volume Score | Fitness | Predicted activity (pIC50) |
---|---|---|---|---|---|---|
(9S)-Cinchonan-9-ol | -4.18 | 0.8099 | 0.5259 | 0.4368 | 1.2878 | 5.521 |
(9S)-6′-Methoxycinchonan-9-ol | -5.47 | 1.0187 | 0.7020 | 0.2737 | 1.1268 | 5.85 |
N-(7-Chloro-4-quinolinyl)-N’-ethyl-1,4-butanediamine | -3.84 | 0.1101 | 0.9993 | 0.5 | 2.4075 | 5.98 |
1,4-Pentanediamine, N4-(7-chloro-4-quinolinyl)-N1,N1-diethyl-Chloroquine | -3.52 | 0.1053 | 0.9983 | 0.4755 | 2.3861 | 5.86 |
PrimaquineN4-(6-Méthoxy-8-quinoléinyl)-1,4-pentanediamine | -5.14 | 0.5014 | 0.9017 | 0.4005 | 1.8844 | 5.75 |
N4-{2,6-Diméthoxy-4-méthyl-5-[3-(trifluorométhyl)phénoxy]-8-quinoléinyl}-1,4-pentanediamine | -5.32 | 0.5274 | 0.9118 | 0.2755 | 1.7478 | 5.67 |
Ligplot diagram and docked Complex of PfM18AAP with ligand ChEMBL Database Compound [2-(3,4-Dihydroxyphenyl)-5,7-dihydroxy-4-oxo-4H-chromen-3-yl hexopyranosiduronic acid]
Discussion
The best model was selected through the comparison between fitness plots (Fig. 4) and radar plots for training and test sets (Fig. 5 (a, b)). The linear graphical representation of fitness plots shows the observed and predicted activities of the data set. The radar plots show the training and the test sets separately by the red (actual activity) and blue (predicted activity) lines. The radar plot for training set represents a good r2 value because the two lines show a good overlap while for the test set a good overlap represents high pred_r2 value. The PLSR contribution plot for the descriptor is given in Fig. 6 which represents the contribution of various descriptors which are important for the inhibitory activity. In PLSR and PCR models, the negative value in electrostatic field descriptors indicates that negative electronic potential is required to increase antimalarial activity, and more electronegative groups are preferred in that position. Though positive value in kNN-MFA model shows that group that imparting positive electrostatic potential is favorable for antimalarial activity, so less electronegative group should prefer in that region. Similarly, negative values in steric descriptors indicate that negative steric potential is favorable for activity, and less lipophilic substitutions or bulky substituents group should be considered in that region, positive value of steric descriptors reveals that positive steric potential is favorable to increase antimalarial activity as in case of 4-[2-(quinolin-4-ylamino)ethyl] benzene-1,2-diol, and more bulky group is advised to prefer in that region. Comparison of statistical parameters of PLSR, PCR, and kNN-MFA, is shown in (Additional file 1) and the predicted pIC50 values in (Additional file 2).
Scatter plots showing the correlation between actual versus predicted activities for training and test set molecules by using 3D QSAR model- PLSR, PCR, and kNN-MFA
Radar plots showing the actual and predicted activities for a Training set b Test set molecules by using 3D QSAR PLSR model
Plot of the percentage contribution of each descriptor in developed 3D QSAR PLSR model explaining variation in the activity
In the present work, we performed screening of CHEMBL antimalarial library to search antimalarial compounds based on the pharmacophoric hypothesis DHRR.31, which resulted in 29,671 compounds. These compounds were subjected to glide docking against PfM18AAP. The top 10 compounds were selected based on the fitness and G-score; predicted activities are shown in Table 7. Further we also carried out screening of 27 WHO antimalarial drugs which resulted in 14 molecules shown in Table 8. Moreover, 17 compounds of 8-aminoquinolines analogous, 24 compounds of CQ analogous and 32 compounds of 8 amino-QN analogous were subjected to screening resulting 17,19, and 22 PfM18AAP inhibitors respectively (Table 8). The resultant top 2 compounds from each analogous were selected based on the fitness and G-score; predicted activities are shown in Table 9. The study found that WHO current antimalarial compound CHEMBL682 (Amodiaquine) has highest predicted value of pIC50 6.38 which is also present in the known dataset of PfM18AAP with pIC50 value 6.72.
Table 7
Molecular Docking Results for known inhibitors (AID743024) against PfM18AAP
IUPAC Name | Gold Score | G-Score (kcal/mol) | X-Score (kcal/mol) | H Bond | No. of Hydrophobic Interaction | No. of NB Interactions | pIC50 Value |
---|---|---|---|---|---|---|---|
4-[(7-chloroquinolin-4-yl)amino]-2-(diethylamino methyl)phenol | 36.57 | -5.35 | -8.09 | Ser116 | 13 | 33 | 6.72 |
7-chloro-N-[2-(3,4-dimethoxyphenyl)ethyl]quinolin-4-amine | 35.17 | -5.40 | -7.48 | - | 11 | 60 | 6.18 |
N-[2-(3,4-dimethoxyphenyl)ethyl]-6-ethoxyquinolin-4-amine | 33.65 | -6.43 | -7.08 | His342 | 9 | 72 | 5.85 |
N-[2-(3,4-dimethoxyphenyl)ethyl]isoquinolin-4-amine | 33.45 | -4.97 | -7.17 | - | 11 | 59 | 5.34 |
4-[2-[(7-chloroquinolin-4-yl)amino]ethyl]benzene-1,2-diol | 32.56 | -7.80 | -7.38 | Ser414 | 12 | 70 | 6.2 |
3-[2-(quinolin-4-ylamino)ethyl]benzene-1,2-diol | 32.41 | -5.67 | -7.36 | Glu284 Ser414 | 10 | 77 | 5.56 |
N-[2-(2-bromo-4,5-dimethoxyphenyl)ethyl]quinolin-4-amine | 32.31 | -4.85 | -7.35 | - | 11 | 61 | 6.34 |
1-benzyl-N-[2-(3,4-dimethoxyphenyl)ethyl]piperidin-4-amine | 32.11 | -4.70 | -7.10 | Ser116 | 11 | 61 | 5.16 |
4-[2-(quinolin-4-ylamino)ethyl]benzene-1,2-diol | 31.89 | -5.25 | -7.19 | Glu284 Ser414 | 10 | 62 | 5.4 |
4-[3-(acridin-9-ylamino)propyl]benzene-1,2-diol | 30.58 | -5.65 | -7.63 | His87 Asp89 | 6 | 46 | 5.43 |
H Bond Hydrogen-Bond, NB Non Bonded
Table 8
Molecular Docking Results for known inhibitors (AID1822) against PfM18AAP
S. No. | Chemical Substance ID | G-Score (kcal/mol) | X-Score (kcal/mol) | HBond | No. of Hydro-phobic Interactions | No. of NB Interactions | % Inhibition |
---|---|---|---|---|---|---|---|
C1 | 49644635 | -7.72 | -8.42 | Gly509 | 8 | 68 | 32.65 |
C2 | 24707924 | -7.71 | -9.54 | Ser116, Asp325, Met436, Lys463 | 8 | 76 | 75.26 |
C3 | 26665815 | -7.48 | -6.51 | Ser116, Cys508 | 4 | 32 | 31.93 |
C4 | 50086555 | -7.36 | -7.66 | Ser116, Glu380, His438, Ser510 | 7 | 35 | 55.6 |
C5 | 49647140 | -7.143 | -7.04 | Ser116, Met436, His438, Lys463 | 7 | 29 | 55.21 |
C6 | 47195345 | -7.11 | -8.14 | His438, Asp325, Glu380, His87, His535 | 7 | 37 | 28.24 |
C7 | 49644096 | -7.07 | -7.37 | Asp325, Glu380, Ser510, His 535 | 4 | 39 | 37.43 |
C8 | 24779308 | -6.88 | -7.29 | His 87,Asp325, Glu380,His535 | 6 | 38 | 37.29 |
C9 | 17504161 | -6.57 | -7.92 | Ser116, Asp435, Met436, Lys463 | 9 | 32 | 53.68 |
C10 | 11532952 | -6.52 | -7.57 | His438 | 9 | 36 | 36.43 |
Table 9
Top scoring 10 potential inhibitors from CHEMBL antimalarial Library against PfM18AAP
We analyzed the types of interactions of each top ranked compound for known inhibitors (AID1822) against PfM18AAP; 2D plots were generated using Ligplot + software and ligand-protein complex. The number of hydrogen bonded interactions, lipophilic interactions and the number of non-bonded interactions was counted and tabulated in Table 5. It is observed that overall all compounds from C1 to C10 have formed at least 1 (C1 and C10), mostly 4 (C3, C4, C7, C8, and C9), and at most 5 (C6) hydrogen bonds. The total number of lipophilic interactions for each compound varies in between 9 (for C9, C10) and 4 (for C3 and C7). Also, the total number of non-bonded interactions for each compound varies from 29 (for C5) to 76 (for C2). These observations suggest that the compounds C3, C4, C6, C7, C8, and C9 have better specificity as they have more hydrogen bonds and compounds C1, C2, C9, and C10 have good binding affinity due to a high number of hydrophobic contacts. The Compound C1 showed interaction with Glide score -7.72 kcal/mol. The docking poses analysis of C1shows one hydrogen bond (Gly509) interaction with amino acid residues of the protein. The next favorable interaction is shown by C2 with G-score of -7.71 kcal/mol and four hydrogen bond interactions with the active site residues Ser116, Asp325, Met436 and Lys463, 76 nonbonded interactions and inhibition (75.26 %) and eight hydrophobic interactions. The Compound C6 showed highest five hydrogen bond interaction (His438, Asp325, Glu380, His87, and His535). Asp325 is found to be the most conserved residues, which is present in 6 out of 10 compounds and Ser116 is found to be the most conserved residues, which is present in 5 out of 10 compounds. Hence, based on the Docking analysis against antimalarial PfM18AAP inhibitors, we conclude that these compounds have a better affinity with PfM18AAP enzyme, thus are novel potential candidate to develop drugs against malaria.
Further, we also analyzed the interactions of CHEMBL antimalarial library’s top ranked inhibitors against PfM18AAP (Table 6). The highest X score of - 11.6 kcal/mol was obtained with the ligand (CHEMBL1506682) having three hydrogen bond (Ser116, Glu381, and Met436) interaction with amino acid residues of the protein. The total number of lipophilic interaction for each compound varies in between 9 (CHEMBL602830 and CHEMBL429) and 4 (for CHEMBL511171). This observation suggests that CHEMBL1506682 have better specificity and CHEMBL602830 have a good binding affinity. Ser510 and Glu380 are found to be the most conserved residues, which is present in 5 out of 10 compounds. Hence, based on the comparison between known bioactive antimalarial M18AAP inhibitors (as control) and top ten novel ChEMBL compounds, we conclude that these compounds could bind to PfM18AAP with better affinity, thus are the potential candidate to develop drugs against malaria.
Conclusions
The present study was aimed at generating the predictive 3D QSAR models capable of revealing the structural requirements for antimalarial inhibitors of PfM18AAP. The comparison of the different statistical parameters of the three models suggests that PLSR model is best due to better internal validation q2= 0.6128 and an external test of pred_r2= 0.6101. Model 3 (kNN-MFA) also had a good internal validation showing q2=0.7641, but the external validation had a bad pred_r2= 0.0366. Therefore both PLSR and PCR models show potential predictive ability as determined by testing the external test set. Thus, 3D QSAR modeling provided a better understanding of the structural requirements of antimalarial compounds, which could help design potent PfM18AAP inhibitors. Also, pharmacophore mapping was applied to identify the binding modes and structural features of the ligands which are important for the biological activity of the inhibitors. The pharmacophore modeling showed that hypothesis DHRR.31 represented the best pharmacophore model for determining PfM18AAP inhibitory activity. Results suggested that the proposed DHRR.31 model can be used to identify the new M18AAP inhibitor and to design a drug rationally for p. falciparum from the extensive 3D database of molecules. Further, HTVS using Glide resulted in several potent PfM18AAP inhibitors from ChEMBL antimalarial data set of 153,873 compounds. These novel compounds having an excellent binding affinity with PfM18AAP are better candidates to design the drug in future. Finally, the 3D QSAR model was deployed on different data set to prioritize PfM18AAP inhibitors and predict new inhibitors. Thus, our study advocates the use of combined approaches of 3D QSAR, pharmacophore modeling, and molecular docking to search for novel potential inhibitors unique to PfM18AAP, which is essential and validated drug target involved in performing various enzymatic functions such as hemoglobin digestion, erythrocyte invasion, and parasite growth in the host cell.
![3d qsar software 3d qsar software](/uploads/1/2/5/1/125179293/625044549.jpg)
Acknowledgment
We would like to thank Dr. Andrew M Lynn, School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110067, India, for proving invaluable suggestions.
Funding
This work was funded by University Performance excellence-II funds (from University Grants Commission), Infrastructure support by Department of Biotechnology through Centre of Excellence in Bioinformatics and also financially supported by Purse funds of Department of Science and Technology, Govt of India.
Availability of data and materials
The raw output data will not be published, but will be made available upon request.
Apr 13, 2018 Download Now! The most used Tamil font in print media and web. Bamini is a great font for any type of graphic design / advertising / simple text and for anything. Now available at our website for free. This is a free font for commercial use as well. The best website for free high-quality Bamini Tamil fonts, with 17 free Bamini Tamil fonts for immediate download, and 12 professional Bamini Tamil fonts for the best price on the Web. 17 Free Bamini Tamil Fonts. Bamini Font – Free Download and Installation + Tamil Keyboard Posted by Editorial Staff September 11, 2018 in Fonts Bamini Font is the number 1 typeface chosen to write in the Tamil language. Free Download (Bamini) Font From (Tamil Normal Font) Category. See Font Style Before You Download (Bamini) Font. Free tamil fonts typing download bamini.
Authors’ contributions
All authors participated in the design of the study. MK and NT performed the comparative analysis of 3D QSAR model, pharmacophore model, and molecular docking. MK, SC, SN and NT wrote the manuscript. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Ethics approval and consent to participate
Not applicable.
Abbreviations
3D QSAR | 3-dimensional quantitative structure activity relationship |
PfM18AAP | Plasmodium falciparum M18 Aspartyl Aminopeptidase |
PLSR | Partial least square regression |
PCR | Principal component regression |
kNN-MFA | k-nearest neighbor-molecular field analysis |
QN | Quinine |
CQ | Chloroquine |
8-AmQN | 8-aminoquinoline |
HTVS | High Throughput Virtual Screening |
P. falciparum | Plasmodium falciparum |
Additional files
Additional file 1:(16K, docx)
The statistical parameters of 3D QSAR models of known bioactive Inhibitors (AID 743024) dataset of PfM18AAP using PLSR, PCR and kNN-MFA methods. (DOCX 16 kb)
Additional file 2:(18K, docx)Comparison between different 3D QSAR models using PLS, PCR and KNN methods for predicting pIC50 values of train set and test set of known bioactive Inhibitors (AID 743024) of PfM18AAP. (DOCX 18 kb)
Contributor Information
Madhulata Kumari, Email: moc.liamg@427ardnahcm.
Subhash Chandra, Email: ni.oc.oohay@unjcs.
Neeraj Tiwari, Email: moc.oohay@oma_nramuk.
Naidu Subbarao, Email: ni.ca.unj.liam@oarsn.
Free Qsar Software
References
1. Newton CR, Krishna S. Severe falciparum malaria in children: current understanding of pathophysiology and supportive treatment. Pharmacol Ther. 1998;79(1):1–53. doi: 10.1016/S0163-7258(98)00008-4. [PubMed] [CrossRef] [Google Scholar]
2. Basco LK, Le Bras J. In vitro activity of artemisinin derivatives against African isolates and clones of Plasmodium falciparum. Am J Trop Med Hyg. 1993;49(3):301–307. [PubMed] [Google Scholar]
3. Lauterbach SB, Coetzer TL. The M18 aspartyl aminopeptidase of Plasmodium falciparum binds to human erythrocyte spectrin in vitro. Malar J. 2008;7:161. doi: 10.1186/1475-2875-7-161.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
4. Nagpal I, Raj I, Subbarao N, Gourinath S. Virtual screening, identification and in vitro testing of novel inhibitors of O-acetyl-L-serine sulfhydrylase of Entamoeba histolytica. PLoS One. 2012;7(2):e30305. doi: 10.1371/journal.pone.0030305.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
5. Mizutani MY, Itai A. Efficient method for high-throughput virtual screening based on flexible docking: discovery of novel acetylcholinesterase inhibitors. J Med Chem. 2004;47(20):4818–28. doi: 10.1021/jm030605g. [PubMed] [CrossRef] [Google Scholar]
6. Bisson WH, Cheltsov AV, Bruey-Sedano N, Lin B, Chen J, Goldberger N, May LT, Christopoulos A, Dalton JT, Sexton PM, Zhang XK, Abagyan R. Discovery of antiandrogen activity of nonsteroidal scaffolds of marketed drugs. Proc Natl Acad Sci U S A. 2007;104(29):11927–32. doi: 10.1073/pnas.0609752104.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
3d Qsar Software Engineering
7. Esposito EX, Hopfinger AJ, Madura JD. Methods for applying the quantitative structure-activity relationship paradigm. Methods Mol Biol. 2004;275:131–214. doi: 10.1385/1-59259-802-1:131. [PubMed] [CrossRef] [Google Scholar]
8. Xue L, Bajorath J. Molecular descriptors in chemoinformatics, computational combinatorial chemistry, and virtual screening. Comb Chem High Throughput Screen. 2000;3(5):363–372. doi: 10.2174/1386207003331454. [PubMed] [CrossRef] [Google Scholar]
9. S. Wold AR, Wold H, Dunn WJ. The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM J Sci Stat Comp. 1984;5:735. doi: 10.1137/0905052. [CrossRef] [Google Scholar]
10. Jolliffe IT. A note on the use of principal components in regression. Appl Stat. 1982;31:300–303. doi: 10.2307/2348005. [CrossRef] [Google Scholar]
11. Malashenko Iu R, Romanovskaia VA, Sokolov IG, Kryshtab TP, Liudvichenko ES. Theoretical evaluation of necessity of carbon dioxide assimilation by microorganisms during growth on various substrates. Ukr Biokhim Zh (1978) 1980;52(2):159–163. [PubMed] [Google Scholar]
12. Ajmani S, Jadhav K, Kulkarni SA. Three-dimensional QSAR using the k-nearest neighbor method and its interpretation. J Chem Inf Model. 2006;46(1):24–31. doi: 10.1021/ci0501286. [PubMed] [CrossRef] [Google Scholar]
13. Molecular Networks GmbH Computerchemie Erlangen, Germany, 1996.
14. O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR. Open Babel: An open chemical toolbox. J Cheminform. 2011;3:33. doi:10.1186/1758-2946-3-33. [PMC free article] [PubMed]
15. VLifeMDS . Molecular Design Suite Pune: VLife Sciences Technologies Pvt Ltd 4. 2010. [Google Scholar]
16. Gasteiger J, Marsili M. Iterative partial equalization of orbital electronegativity—a rapid access to atomic charges. Tetrahedron. 1980;36:3219–3228. doi: 10.1016/0040-4020(80)80168-2. [CrossRef] [Google Scholar]
17. Derksen S, Keselman H. Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables. Brit J Math Stat Psy. 1992;45:265–282. doi: 10.1111/j.2044-8317.1992.tb00992.x. [CrossRef] [Google Scholar]
18. Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. International joint Conference on artificial intelligence: Lawrence Erlbaum Associates Ltd 1995, 1137-1145.
19. Schuurmann G, Ebert RU, Chen J, Wang B, Kuhne R. External validation and prediction employing the predictive squared correlation coefficient test set activity mean vs training set activity mean. J Chem Inf Model. 2008;48(11):2140–2145. doi: 10.1021/ci800253u. [PubMed] [CrossRef] [Google Scholar]
20. Rucker C, Rucker G, Meringer M. y-Randomization and its variants in QSPR/QSAR. J Chem Inf Model. 2007;47(6):2345–2357. doi: 10.1021/ci700157b. [PubMed] [CrossRef] [Google Scholar]
21. Maestro, Version 9.1, Schrodinger LLC, NY2008.
22. Kumar V, Kumar S, Rani P. Pharmacophore modeling and 3DQSAR studies on flavonoids as a-glucosidase inhibitors. Der PharmaChemica. 2010;2:324–35.[Google Scholar]
23. Schoenen FJ, Weiner WS, Baillargeon P, Brown CL, Chase P, Ferguson J, Fernandez-Vega V, Ghosh P, Hodder P, Krise JP, et al. Inhibitors of the Plasmodium falciparum M18 Aspartyl Aminopeptidase. 2013. [Google Scholar]
24. Cole JC, Nissink JWM, Taylor R. Protein ligand docking and virtual screening with GOLD. 2005. [Google Scholar]
25. Sivaraman KK, Oellig CA, Huynh K, Atkinson SC, Poreba M, Perugini MA, Trenholme KR, Gardiner DL, Salvesen G, Drag M et al. X-ray crystal structure and specificity of the Plasmodium falciparum malaria aminopeptidase PfM18AAP. J Mol Biol. 422(4):495-507. [PubMed]
26. Jones G, Willett P, Glen RC, Leach AR, Taylor R. Development and validation of a genetic algorithm for flexible docking. J Mol Biol. 1997;267(3):727–748. doi: 10.1006/jmbi.1996.0897. [PubMed] [CrossRef] [Google Scholar]
Qsar Testing
27. Wang R, Lai L, Wang S. Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J Comput Aided Mol Des. 2002;16(1):11–26. doi: 10.1023/A:1016357811882. [PubMed] [CrossRef] [Google Scholar]
Articles from BMC Structural Biology are provided here courtesy of BioMed Central