Accelrys Product Previous Next Contents Index Top
QSAR



5       Theory: QSAR+ descriptors

A descriptor is a molecular property that QSAR+ can calculate. QSAR+ provides a wide variety of descriptors that you can use in determining new QSAR relationships. This chapter provides information about the following functional families of descriptors available in QSAR+:

Fragment constants descriptors
Conformational descriptors
Electronic descriptors
Receptor descriptors
Quantum mechanical descriptors
Graph-theoretic descriptors
Topological descriptors (available only if you purchase the C2·Descriptors+ module)
Information-content descriptors (available only if you purchase the C2·Descriptors+ module)
Molecular shape analysis (MSA) descriptors
Spatial descriptors (some of which are available only if you purchase the C2·Descriptors+ module)
Structural descriptors
Thermodynamic descriptors
pKa descriptors (ACD Labs)
ADME descriptors

In addition, this chapter provides information on the following 3D-QSAR descriptors:

Molecular field analysis (MFA) descriptors
Receptor surface analysis (RSA) descriptors

For detailed information about how to use descriptors, see Chapter 8, Working with descriptors; Chapter 9, Working with fragment constants; Chapter 10, Performing molecular field analysis; and Chapter 11, Performing molecular shape analysis.


Descriptors in the combi-chem documentation

Additional information about descriptors (and other information) can be found in the combi-chem documentation.

In Start-up and Configuration:

Daylight setup
Oracle setup
MDL ISIS setup

In 4d. Descriptors for library analysis:

Using molecular diversity descriptors
Statistical analyses and data-mining techniques

In Theory:

Graph-theoretic descriptors
Information-theoretic descriptors
Descriptors based on projections of the molecular surface (shadow indices)
Descriptors based on partial charges mapped on surface area
2D and 3D fingerprints metrics

In 4e. Data analysis and library visualization:

Visualization of compounds in descriptor space
Principal component analysis
Factor analysis
Cluster analysis
Multidimensional scaling


Fragment constants descriptors

Fragment constant descriptors are constants that relate the effect of substituents on a "reaction center" from one type of process to another. The basic idea is that similar changes in structure are likely to produce similar changes in reactivity, ionization, or binding. There are different constants corresponding to different effects. These are typically used to parameterize the Hammett (or Hammett-like) equation for some series of analogs. A comprehensive introduction is found in Hansch and Leo (1995). An example is:

Eq. 20            

where kx and kh are reaction rate constants for the substituents x and h, respectively; is an electronic constant determined by an ionization constant; and is fit to the set of analogs being studied. Often, multiple terms corresponding to different properties (electronic, steric, etc.) at different R-group positions are used. In this way measurements of ionization constants can be used to predict rate constants, once a scaling factor () is determined. In this example measures the importance of electronic effects for the rate constant.

The default database currently contains the following types of constants. These come from Table VI-1 of Hansch (1979), except for the sterimol constants, which are calculated.

Sm, Sp

Electronic effect sigma meta and sigma para, respectively. Positive values correspond to electron withdrawal, negative ones with electron release. Sigma is generally not appropriate for ortho substituents because of steric interaction with the reaction center.

F, R

Decompositions of sigma para constant into an inductive (polar) part (F) and a resonance part (R) for the case when the substituent is conjugated with the reaction center, producing through-resonance effects.

pi

Hydrophobic character. Pi for substituent X is given by the difference of its logP from the logP for hydrogen.

HA

Hydrogen-bond acceptor.

HB

Hydrogen-bond donor.

MR

Molar refractivity as given by:

Eq. 21            

where n is the refractive index, MW is the molecular weight, and d is the compound density.

Sterimol-L

Steric length parameter, measured along the substitution point bond axis.

Sterimol-B1 through B4

Steric distances perpendicular to the bond axis. These define a bounding box for the substituent and are numbered in ascending size order.

Sterimol-B5

The overall maximum steric distance perpendicular to the bond axis.


Conformational descriptors

This table lists the conformational descriptors available in QSAR+:

Symbol Description
Energy   The energy of the currently selected conformation in the study table.  
LowEne   The energy of the most stable conformation in the set of conformations belonging to each model.  
EPenalty   The difference between Energy and LowEne.  


Electronic descriptors

This table lists the electronic descriptors available in QSAR+:

Symbol Description
Charge   Sum of partial charges.  
Fcharge   Sum of formal charges.  
Apol   Sum of atomic polarizabilities.  
Dipole   Dipole moment.  
HOMO   Highest occupied molecular orbital.  
LUMO   Lowest unoccupied molecular orbital.  
Sr   Superdelocalizability.  

Sum of atomic polarizabilities (Apol)

The sum of atomic polarizabilities (Apol) descriptor computes the sum of the atomic polarizabilities. The polarizabilities are calculated from the A coefficients used for molecular mechanics calculations:

Eq. 22            

For more information, see Marsali and Gasteiger (1980); Hopfinger (1973).

Dipole moment (Dipole)

The dipole moment descriptor is a 3D electronic descriptor that indicates the strength and orientation behavior of a molecule in an electrostatic field. Both the magnitude and the components (X, Y, Z) of the dipole moment are calculated. It is estimated by utilizing partial atomic charges and atomic coordinates. Partial atomic charges are computed using the charge setup option in the QSAR control panel offering CHARMm charging rules, Gasteiger, CNDO2, and Del Re methods. The descriptor uses Debyes units.

Dipole properties have been correlated to longrange ligand-receptor recognition and subsequent binding.

For more information, see Bottcher (1952); Del Re (1963); Gasteiger (1980); Hopfinger (1973); Marsali (1980).

Highest occupied molecular orbital energy (HOMO)

The HOMO descriptor adds the energy (in electronvolts) of the HOMO for each model, calculated by the CNDO/2 method, to the study table.

HOMO (highest occupied molecular orbital) is the highest energy level in the molecule that contains electrons. It is crucially important in governing molecular reactivity and properties. When a molecule acts as a Lewis base (an electron-pair donor) in bond formation, the electrons are supplied from the molecule's HOMO. How readily this occurs is reflected in the energy of the HOMO. Molecules with high HOMOs are more able to donate their electrons and are hence relatively reactive compared to molecules with low-lying HOMOs; thus the HOMO descriptor should measure the nucleophilicity of a molecule.

For more information, see Fischer (1969); Pople (1970; 1967; 1965; 1966); Sichel (1968); Wiberg (1968).

Lowest unoccupied molecular orbital energy (LUMO)

The LUMO descriptor adds the energy (in electronvolts) of the LUMO for each model, calculated by the CNDO/2 method, to the study Table.

LUMO (lowest unoccupied molecular orbital) is the lowest energy level in the molecule that contains no electrons. It is important in governing molecular reactivity and properties.

When a molecule acts as a Lewis acid (an electron-pair acceptor) in bond formation, incoming electron pairs are received in its LUMO. Molecules with low-lying LUMOs are more able to accept electrons than those with high LUMOs; thus the LUMO descriptor should measure the electrophilicity of a molecule.

For more information, see Pople (1970; 1967; 1965; 1966); Fischer and Kollmar (1969); Sichel (1968); Wiberg (1968).

Superdelocalizability (Sr)

Superdelocalizability is an index of reactivity in aromatic hydrocarbons (AH), proposed by Fukui:

Eq. 23            

Sr = superdelocalizability at position r
ej = bonding energy coefficient in jth MO (eigenvalue)
c = molecular orbital coefficient at position r in the HOMO
m = index of the HOMO

The index is based on the idea that early interaction of the molecular orbitals of two reactants may be regarded as a mutual perturbation, so that the relative energies of the two orbitals change together and maintain a similar degree of overlap as the reactants approach one another.

Therefore, considering the interaction of the MOs of the separated reactants gives us at least an estimate of the slope of the reaction coordinate. From this we make the additional assumptions that (1) this is a prediction of the height of the transition-state barrier at position r, and (2) the greatest interaction will occur at the site of largest orbital density, that is, largest c2.

The concept of delocalizability is introduced by the ej term. For low-lying levels, this energy is large and positive. We may interpret this as meaning the electrons are tightly held, that is, not very delocalizable. For the upper occupied states (especially HOMO), ej is much smaller, that is, the electrons in the higher-energy orbitals are less tightly bound, which means they are relatively delocalizable. Therefore the upper energy levels will dominate the Superdelocalizability term. Consequently, summing S for all atomic positions of a molecule gives a metric of electrophilicity, which may be used to predit relative reactivity in a series of molecules.


Receptor descriptors

Quantitative values, such as the interaction energy calculated in Receptor for a generated receptor model, are available to use in QSAR+. By using Receptor data to develop a QSAR model, you can evaluate the goodness of fit between a candidate structure and a postulated pseudo-receptor.

When you have generated a receptor model and have aligned the models you want to study, you can proceed to build a QSAR using data from the receptor-structure iterations.

This table lists the receptor descriptors available in QSAR+:

Symbol Description
IntraEnergy   Molecular internal energy inside receptor.  
InterEleEnergy   Nonbond electrostatic energy between molecule and receptor.  
InterVDWEnergy   Nonbond van der Waals energy between molecule and receptor.  
InterEnergy   Total nonbond energy between molecule and receptor.  
MinIntraEnergy   Molecular internal energy minimized without receptor.  
StrainEnergy   Molecular strain energy within receptor.  

IntraEnergy

The internal energy of a molecule as it sits in and is constrained by the receptor model you generated.

InterEnergy

The interaction energy of the molecule with the receptor. It is the sum of the van der Waals and electrostatic interactions. The more negative the value, the greater the interaction between the molecule and the receptor.

InterEleEnergy

The electrostatic interaction energy of the molecule with the receptor.

InterVDWEnergy

The van der Waals interaction energy of the molecule with the receptor.

MinIntraEnergy

The internal energy of a molecule as it sits in the receptor site without being subject to receptor model constraints. This value should always be less than or equal to IntraEnergy.

StrainEnergy

The difference in internal energy between the molecule minimized within the receptor model (IntraEnergy) and the molecule minimized without the receptor model (MinIntraEnergy).


Quantum mechanical descriptors

Quantitative values calculated in the MOPAC application of the QUANTUM 1 card deck are available for use in QSAR+. These descriptors are the same as those of the same name described elsewhere in this chapter, except that the MOPAC descriptors are calculated using a semi-empirical method that is likely to generate more accurate values. For information on MOPAC, see the Cerius2 Quantum Mechanics -- Chemistry.

The following table lists the MOPAC descriptors available in QSAR+:

Symbol Description
LUMO_MOPAC   Lowest occupied molecular orbital energy.  
DIPOL_MOPAC   Dipole moment.  
HOMO_MOPAC   Highest occupied molecular orbital energy.  
Hf_MOPAC   Heat of formation.  


Graph-theoretic descriptors

All the graph-theoretic descriptors included here ultimately base their calculations on representation of molecular structures as graphs, where atoms are represented by vertices and covalent chemical bonds by edges.

These descriptors fall into two categories:

All these descriptors perform their evaluations on hydrogen-suppressed graphs, that is, there are no vertices corresponding to hydrogens and no edges corresponding to bonds connecting hydrogens to other atoms.

Please refer to Terms on page 95 for explanations of graph-theoretic terms and symbols used in the descriptor definitions below.

Note

Multiple bonds, if any, are treated as single edges in all descriptor definitions unless specifically mentioned otherwise.  


Topological descriptors

Topological indices are 2D descriptors based on graph theory concepts (Kier and Hall 1976, 1986; Katritzky and Gordeeva 1993). These indices have been widely used in QSPR and in QSAR studies. They help to differentiate the molecules according mostly to their size, degree of branching, flexibility, and overall shape.

Wiener index (W)

The Wiener index is the sum of the chemical bonds existing between all pairs of heavy atoms in the molecule. In graph-theoretical terms: the sum of lengths of minimal paths between all pairs of vertices representing heavy atoms. This is equal to half the sum of all D-matrix entries (Wiener 1947, Müller et al. 1987):

Eq. 24            

Zagreb index (Zagreb)

The Zagreb index is defined as the sum of the squares of vertex valencies (Bonchev 1983):

Eq. 25            

Hosoya index (Z)

Let M be the number of edges in the graph. For any integer k, define p(k) to be the number of ways of choosing k non-adjacent edges from the graph. Note that p(k) is zero for k > [M/2], since there is no set of k non-adjacent edges in a graph of M edges if k > [M/2].

The Hosoya index is the sum of all (nonzero) p(k):

Eq. 26            

with the convention that p(0) = 1 by definition.

It is a moderately easy exercise in graph theory to prove that the formula above can also be given in terms of the following recursion (implemented in C2·Diversity). Let G be the graph, of which the Hosoya index Z(G) is to be calculated. Remove an edge from G and denote the resulting graph by H. Again, remove the same edge from G, this time removing all the edges adjacent to it as well. Denote the resulting graph K. Then the following is always true:

Eq. 27            

The recursion simplifies the given graph until one or both of H and K are empty graphs, in which case the index is defined as:

Eq. 28            

There exists a handy shortcut for graphs that consist of disjoint subgraphs (see an example calculation of Z(benzene) below) -- if G consists of disjoint subgraphs H and K, then:

Eq. 29            

Example

Calculate the Hosoya index of benzene. The hydrogen-suppressed graph representing benzene is a hexagon:   Begin the recursion by removing, say, the right-hand vertical edge and the edges connected to it:  

Continue with the first term in a similar manner: remove another vertical edge from it:  

Both terms on the right hand side are disjoint graphs, each consisting of two identical subgraphs. Thus:  

The calculation is almost complete. For a graph G consisting of one edge, the corresponding H and K graphs are both empty, therefore:  

Also:  

The Hosoya index of benzene is thus:  

The index displayed in the study table is the natural logarithm of Z, to handle the rapid growth of the index with molecule size (Hosoya 1971, Rouvray 1987).

Kier & Hall molecular connectivity index ()

This index, refined by Kier and Hall (1976), is a series of numbers designated by "order" and "subgraph type."

There are four subgraph types: Path, Cluster, Path/Cluster, and Chain. These types emphasize different aspects of atom connectivity within a molecule -- the amount of branching ring structures present and flexibility. Here we refer to these subgraph types as P, C, PC, and CH, respectively. They are defined as follows:

Definition

Given a connected subgraph G:

(i) If G contains a cycle it is of type CH (chain).

Otherwise:

(ii) If all vertex valencies of G (valencies with respect to G, not the entire graph) are either greater than 2 or equal to 1, G is of type C (cluster).

Otherwise:

(iii) If all vertex valencies (as in (ii)) are either equal to 2 or 1, G is of type P (path).

Otherwise:

(iv) G is of type PC (Path/Cluster). That means the valencies greater than 2, equal to 2, and equal to 1, are all present.

"Order" refers to the number of edges in a subgraph. The allowable orders are 0, 1,..., M (M - the number of edges in the entire graph)

Notes:

(a) Subgraphs of order 0 are assigned the class P (path).
(b) Subgraphs of order 1 and 2 are necessarily of type P only.
(c) Subgraphs of order 3 can be of type P, C, or CH only.

Molecular connectivity index of order n corresponding to subgraph type s is denoted by n s.

Given an order n and a subgraph type s, one considers all connected subgraphs of type s consisting of n edges. For each vertex vi in a subgraph, its valence i (with respect to the entire graph) is calculated and the partial index nP corresponding to the given subgraph is found according to:

Eq. 30             ,

(n = number of subgraph vertices).

Finally, the partial indices are summed over all connected subgraphs of the requested type s (Kier and Hall 1976, 1985):

Eq. 31            

Example of the molecular connectivity index

If we calculate molecular connectivity indices for methane and the fluorinated methanes, the following results are obtained in the study table. There is one row for each molecule as usual, and some columns for each type of subgraph. In the Topological Descriptors control panel, subgraph orders from 0 to 3 are specified as the default, so we see CHI-0 through CHI-3 columns. Had the range been 0 to 4, we would have seen a CHI-4 column as well.

CHI-0 CHI-1 CHI-2 CHI-3_P CHI-3_C CHI-3_CH
CH   0   0   0   0   0   0  
CH3F   2   1   0   0   0   0  
CH2F2   2.7   1.414   0.707   0   0   0  
CHF3   3.577   1.732   1.732   0   0.577   0  
CF4   4.5   2   3   0   2   0  

Order zero chi indices, CHI-0

Let us consider the order zero indices first, in the first column (CHI-0), which represent the simplest subdivision or subgraph: the set of vertices. The number of subgraphs of order zero is therefore equal to the number of skeletal atoms or vertices. Each vertex has a property , which is the number of its electrons in sigma bonds to skeletal neighbors.

Eq. 32            

Where:

= number of electrons in bonds to all neighbors.
h = number of H atoms bonded to atom i.

The zeroeth-order subgraph connectivity weight assigned to each vertex is:

Eq. 33            

The order zero index is the sum of all vertex weights in the graph, that is, over all atoms in the skeleton.

Eq. 34            

Methane

Thus for methane, there is only one skeletal atom, C. It has four of its electrons in bonds and is bonded to four H atoms, and therefore has = 0 (that is, 4 - 4), and is assigned a index of 0.

Fluoromethane

atom h c
C   3   4   1   1  
F   0   1   1   1  
Order Zero index for fluoromethane is:   2  

Difluoromethane

atom h c
C   2   4   2   0.707  
F   0   1   1   1  
F   0   1   1   1  
Order Zero index for difluoromethane is:   2.707  

The zeroeth-order index holds little structural information. Only the presence of the nearest neighbor to each atom is captured. In the series methane through tetrafluoromethane, we see an increase in CHI-0, which reflects the increasing size of the molecule skeleton.

Order one chi index

Order one indices are the graph edges, that is, the bonds that connect the skeletal atoms. We replace the atom with the product of the values of the vertices or atoms that form the edge or bond. Thus, the edge between vertices i and j is:

Eq. 35            

and as before, we sum all the weights to obtain the first-order index.

Eq. 36            

Methane

Leaving out hydrogens, the molecular graph of methane is a single point. It has no edges and therefore has first-order index = 0.

Fluoromethane

Fluoromethane has one edge, representing the C-F bond.

Edge j weight, c
C-F   2   1   11/(1/2)
 
First Order index for fluoromethane is:   1  

Difluoromethane

Difluoromethane has two edges.

Edge weight, c
C-F   2   1   21/(1/2)  
C-F   2   1   21/(1/2)  
First Order index for difluoromethane is:   1.414  

First-order indices contain more structural information than zeroeth-order indices. The first-order index encodes the number of edges (bonds) in the molecular graph. Hence CHI - 1 increases throughout the series methane through tetrafluoromethane. Beyond this, the immediate bonding environment of an atom is captured in the edge weights: the weight of the carbon atom becomes smaller as it becomes more substituted. This reduces the rate of increase of CHI - 1 compared to CHI - 0 over the same series.

In an alicyclic compound containing A atoms, the number of skeletal bonds is:

Eq. 37            

where P is called the number of paths of length 1. A "path of length 1" is a bond.

In a cyclic compound with R rings:

Eq. 38            

Thus the number of first-order weights encodes the number of rings.

Second-order chi indices

For order two, we consider pairs of edges (bonds) in the molecular graph. Since methane has no bonds and fluoromethane has only one bond, CHI - 2 for methane and fluoromethane are zero. Difluoromethane has one path of length 2:

Eq. 39            

This is computed in a manner analagous to the lower-order indices, as a product of reciprocal square roots:

Eq. 40            

Thus for difluoromethane:

Path Weight
F-C-F   (1 x 2 x 1)-1/2 = 0.707  

Trifluoromethane has three paths of length 2:

Atom h
C   1   4   3  
F   0   1   1  
F   0   1   1  

2P Weight, c
F-C-F   (1 x 3 x 1)-1/2  
F-C-F   (1 x 3 x 1)-1/2  
F-C-F   (1 x 3 x 1)-1/2  
  1.732  

Third-order chi indices

None of the compounds have any paths of length 3, which would require three edges (that is, three bonds) to be connected, so the CHI - 3_P values for the series are all zero. On the other hand, 1,2-difluoroethane, had we included it, would have a CHI - 3_P index of 0.5.

However, there is another kind of third-order subgraph called a cluster, which involves four skeletal atoms in a trigonal relationship. In this example, this structural motif appears only in trifluoromethane and tetrafluoromethane.

Atom h
C   1   4   3  
F   0   1   1  
F   0   1   1  
F   0   1   1  

3p Weight, c
CF3   (1 x 3 x 1 x 1)-1/2  
  0.577  

The smallest possible ring is three membered and, if there were any three-membered rings in our set, they would be captured by the CHI - 3_CH (CH for "chain", meaning "ring").

Fourth-order chi indices

Similarly, none of our compounds have any paths of length 4, which would require four connected edges, hence all values for the CHI - 4_P index are zero. Tetrafluoromethane contains a fourth-order cluster, however:

Atom h
C   1   4   3  
F   0   1   1  
F   0   1   1  
F   0   1   1  
F   0   1   1  

4p Weight, c
CF4   (4 x 1 x 1 x 1 x 1)-1/2  
  0.5  

The higher-order indices are additive (because they are sums of weighting terms) and constitutive (because the size of the weights depends on atomic values), representing the entire molecular graph.

Kier & Hall valence-modified connectivity index (chiv)

This index is a refinement of the molecular connectivity index (see page 77 for definitions) where a vertex subgraph valence is enhanced to v to take into account electron configuration of the atom represented by the vertex:

Eq. 41            

where Zv is the number of valence electrons in the atom, Z is its atomic number, and h is the number of hydrogens bound to it. This formula is designed to reproduce the unmodified molecular connectivity index for saturated hydrocarbons, for which v = . However, v distinguishes between multiple and single bonds. The denominator introduces further distinction between element rows due to the presence of the atomic number Z (Kier and Hall 1976, 1985).

Kier & Hall subgraph count index (SC)

This is the number of subgraphs of a given type and order (Kier and Hall 1976). (See Kier & Hall molecular connectivity index (c) for definitions.)

Example of the Kier & Hall subgraph count index

Figure 2 . Subgraphs of isopentane

Zeroeth-order indices

SC-0

Refers to the number of zero-order subgraphs in the molecular graph. The number of subgraphs of order zero is simply the number of skeletal atoms or vertices in the molecular graph.

First-order indices

SC-1

The number of first-order subgraphs in the molecular graph, which is the number of edges that connect the vertices of the molecular graph. In other words, it is the number of bonds in the molecule.

Second-order index

SC-2

The number of second-order subgraphs in the molecular graph, which is the number of pairs of connected edges. In other words, it is the number of paths of length 2.

Third-order indices

There are three types of third-order subgraph: Path, Cluster and Ring.

SC-3_P

The number of third-order subgraphs in the molecular graph: the number of paths of length 3.

SC-3_C

Counts the number of clusters.

SC-3_CH

Counts the number of rings or chains.

Kier's shape indices (n (n = 1, 2, 3))

These indices compare the molecule graph with "minimal" and "maximal" graphs, where the meaning of "minimal" and "maximal" depends on the order n. This is intended to capture different aspects of the molecular shape.

Order 1:

The descriptor 1 encodes the count of atoms and the presence of cycles relative to the minimal and maximal graphs. For N vertices, the maximal graph includes edges between all vertex pairs. For the minimal graph a linear path of N - 1 edges connecting the vertices is taken.

The shape index of order 1 is then defined as:

Eq. 42            

where P is the number of edges in the graph (edges are paths of length 1, hence the subscript on the 1), Pmax is the number of edges in the maximal graph -- namely N(N - 1)/2 -- and Pmin is the number of edges in the minimal graph -- namely N - 1.

By inserting the formulas for Pmax and Pmin, one obtains the implemented formula:

Eq. 43            

Order 2:

The descriptor 2 encodes the branching. P, Pmin, and Pmax now denote the number of paths of length 2 in the corresponding graphs. The maximal graph is taken to be the star graph in which all atoms are adjacent to a common atom. Thus, Pmax = (N - 1) (N - 2)/2. The linear graph is again taken as the minimal graph, so Pmin = N - 2. Eq. 42 above thus yields:

Eq. 44            

Order 3:

For order 3, the counts of paths of length 3 are considered, and the maximal graph chosen is a twin-star (Kier 1990) with Pmax = (N - 1) (N - 3)/4 for N odd and Pmax = (N - 2)2/4 for N even. The minimal graph is again the linear one with Pmin = N - 3.

Eq. 42 is adjusted by another factor of 2 -- in the words of the index designer -- "to bring the values into rough equivalence with the other kappa values" (Kier 1990, Hall and Kier 1991):

Eq. 45            

Kier's alpha-modified shape indices (n (n = 1, 2, 3))

These indices are refinements of the shape index (see previous section) that take into consideration the contribution covalent radii and hybridization states make to the shape of the molecule. The indices n are defined by Eq. 43 - 45, with the atom count N replaced by the modified atom count N + . The modifier is defined as:

Eq. 46            

where the summation is over all heavy atoms of the molecule. Here, ri is the radius of the ith heavy atom and rCsp3 is the radius of the sp3 carbon (taken to be 0.77 Å in this implementation). In this calculation the following atoms are considered to be heavy: C, N, O, F, P, Cl, Br, and I (Kier 1990, Hall and Kier 1991).

Molecular flexibility index ()

This is a descriptor based on structural properties that restrict a molecule from being "infinitely flexible", the model for which is an endless chain of C(sp3) atoms. The structural features considered as preventing a molecule from attaining infinite flexibility are: (a) fewer atoms, (b) the presence of rings, (c) branching, and (d) the presence of atoms with covalent radii smaller than those of C(sp3). These features are encoded in the index as follows:

Eq. 47            

where N = number of vertices (Hall and Kier 1991).

Balaban indices (JX and JY)

This is a highly discriminating descriptor, whose values do not substantially increase with molecule size and the number of rings present ( Balaban 1982, Balaban and Ivanciuc 1989). Its evaluation begins with the D-matrix modified as follows:

Having constructed the modified D-matrix, the row sums are calculated:

Eq. 48            

where N is the number of vertices and i = 1, ... , N.

At this stage the contributions based on heteroatom electronegativities and heteroatom covalent radii are included by modifying the si values. The modifiers are two-parameter approximations of electronegativities and covalent radii relative to those of carbon. The exact formulas used in the index calculations are:

Eq. 49            

Eq. 50            

where i is the atomic number and Gi is the (short) periodic table group number. These modifiers are used only with nonmetals: B, C, N, O, F, Si, P, S, Cl, As, Se, Br, Te, and I. For other heteroatoms the values are set at X = Y = 1.

Given the values of X and/or Y for each vertex, the numbers si are adjusted as follows:

sai = X si (for the index JX)
sai = Y si (for the index JY)

and the result inserted in the final formula for the index:

Eq. 51            

where J equals either JX or JY, depending on the modifier type used, M is the number of edges, and N is the number of vertices, and the sum is over all pairs (i, j) with adjacent vertices vi and vj.

Note

The denominator M - N + 2 is really "number of cycles plus 1" (by the Euler formula) and serves as a normalization against the number of rings present in the molecule.  


Information-content descriptors

In this approach, molecules are viewed as structures that can be partitioned into subsets of elements that are in some sense equivalent. The notion of equivalence depends on the particular descriptor. Consider a partition of a set of N elements into k subsets each consisting of Nk elements:

equivalence class: 1 2 ... k
number of elements in each: N1 N2 ... Nk
N1 + N2 + ... + Nk = N

Given a partition P as above, we use the notation:

P = N (N1, N2, ... , Nk).

A probability distribution can be associated with the partition:

pi = Ni / N,

the probability for a randomly chosen element to belong to class i. This degree of uncertainty can be also expressed by the entropy:

Hi = - lb pi (lb is the base-2 logarithm).

The mean entropy of such a probability distribution is then:

Eq. 52            

which, according to Shannon's statistical information theory (Bonchev 1983, and references therein), can be viewed as a measure of the mean quantity of information contained in each structure element (in bits per element).

The partition P, the probabilities pi and the mean quantity of information H form the pattern of calculation for all the information-theoretic descriptors.

Information of atomic composition index (IAC-mean, IAC-total)

The atoms in the molecule are partitioned into equivalence classes corresponding to their atomic numbers. The partition then yields the descriptor IAC-Mean as the mean quantity of information H as defined above.

The descriptor IAC-Total is defined as N X IAC-Mean, where N is the number of atoms in the molecule.

Information indices based on the A-matrix

The two information indices in this category are:

These indices (and several others below) are based on partitioning elements of the A-matrix according to two basic modes:

The example below should make this clear.

Vertex adjacency/equality

The A-matrix consists of zeros and ones, so the partitioning consists of two classes:

P = N 2 (2M, N 2 - 2M)

with M equal to the number of edges (thus 2M equals the number of ones in the A-matrix) and N equal to the number of vertices (N 2
- 2M is the number of zeros in the A-matrix).

Therefore:

Eq. 53            

Vertex adjacency/magnitude

Each matrix element aij is now treated as an equivalence class of aij elements. In this case, each equivalence class consists of either one or zero elements, so the partition is (discarding the classes of zero elements):

P = 2M( 1, 1, ... , 1 ) (2M ones)

The index V_ADJ_mag is thus rather simple:

Eq. 54            

Information indices based on the D-matrix

Two types of indicesare based on this matrix:

These descriptors are defined in exactly the same manner as the vertex adjacency indices, except that the distance matrix is used instead of the adjacency matrix.

Information indices based on the E-matrix and the ED-matrix

The indices based on these matrices are:

These are the descriptors based on the edge adjacency and the edge distance matrices, in exact analogy with those given in the section Information indices based on the A-matrix.

Multigraph information content indices (IC, BIC, CIC, SIC)

To each vertex v, an unordered sequence of ordered pairs is assigned:
{ (m1, n1), (m2, n2), ... , (mk, nk) }, called a coordinate, such that:

k = the valence of the vertex (there is one ordered pair (mj, nj) per each neighboring vertex, vj), and for every j = 1, ..., k:

Having assigned the coordinates to vertices, the partition of vertices is constructed in the usual way, where two vertices are considered equivalent if their coordinates are the same (as unordered k-tuples, i.e., the repetitions of ordered pairs are not ignored, as they would be if we treated the k-tuples purely as sets).

The index corresponding directly to this partition is the index IC ("Information Content").

The following indices are normalizations of IC:

The CIC ("Complementary Information Content") measures the deviation of IC from its maximum possible value corresponding to the partition into classes containing one element each:

ICmax = -N X (1/N) X lb(1/N) = lb(N)

and thus the CIC index is defined as:

(Sarkar et al. 1978, Bonchev et al. 1981, Bonchev 1983, Katritzky et al. 1993.)

Terms


Molecular shape analysis (MSA) descriptors

This table lists the MSA descriptors available in QSAR+

Symbol Description
DIFFV   Difference volume.  
Fo   Common overlap volume (ratio).  
NCOSV   Non-common overlap steric volume.  
ShapeRMS   Rms to shape reference.  
COSV   Common overlap steric volume.  
SRVol   Volume of shape reference compound.  

Common overlap steric volume (COSV)

The common volume between each individual molecule and the molecule selected as the reference compound. This is a measure of how similar in steric shape the analogs are to the shape reference.

Difference volume (DIFFV)

The difference between the volume of the individual molecule and the volume of the shape reference compound.

Common overlap volume ratio (Fo)

The common overlap steric volume descriptor divided by the volume of the individual molecule.

Non-common overlap steric volume (NCOSV)

The volume of the individual molecule and the common overlap steric volume.

Rms to shape reference (ShapeRMS)

Root mean square (rms) deviation between the individual molecule and the shape reference compound.

Volume of shape reference (SRVol)

The volume of the shape reference compound.


Spatial descriptors

This table lists the spatial descriptors available in QSAR+:

Symbol Definition
RadOfGyration   Radius of gyration.  
Jurs descriptors   Jurs charged partial surface area descriptors.  
Shadow indices   Surface area projections.  
Area   Molecular surface area.  
Density   Density.  
PMI   Principal moment of inertia.  
Vm   Molecular volume.  

Shadow indices

This set of geometric descriptors helps to characterize the shape of the molecules. The descriptors are calculated by projecting the molecular surface on three mutually perpendicular planes, XY, YZ, and XZ (Rohrbaugh and Jurs 1987). These descriptors depend not only on conformation but also on the orientation of the molecule. To calculate them, the molecules are first rotated to align the principal moments of inertia with the X, Y, and Z axes.

Figure 3

A total of 10 descriptors are calculated in this set:

1.   Area of the molecular shadow in the XY plane (Sxy).

2.   Area of the molecular shadow in the YZ plane (Syz).

3.   Area of the molecular shadow in the XZ plane (Sxz).

4.   Fraction of area of molecular shadow in the XY plane over area of enclosing rectangle (Sxy,f).

5.   Fraction of area of molecular shadow in the YZ plane over area of enclosing rectangle (Syz,f).

6.   Fraction of area of molecular shadow in the XZ plane over area of enclosing rectangle (Sxz,f).

7.   Length of molecule in the X dimension (Lx).

8.   Length of molecule in the Y dimension (Ly).

9.   Length of molecule in the Z dimension (Lz).

10.   Ratio of largest to smallest dimension ().

Jurs descriptors based on partial charges mapped on surface area

This set of descriptors (Stanton and Jurs 1990) combines shape and electronic information to characterize molecules. The descriptors are calculated by mapping atomic partial charges on solvent-accessible surface areas of individual atoms. A total of 30 different descriptors are included in the set:

1.   Partial positive surface area: sum of the solvent-accessible surface areas of all positively charged atoms (PPSA-1).

2.   Partial negative surface area: sum of the solvent-accessible surface areas of all negatively charged atoms (PNSA-1).

3.   Total charge weighted positive surface area: partial positive solvent-accessible surface area multiplied by the total positive charge (PPSA-2).

4.   Total charge weighted negative surface area: partial negative solvent-accessible surface area multiplied by the total negative charge (PNSA-2).

5.   Atomic charge weighted positive surface area: sum of the product of solvent-accessible surface area X partial charge for all positively charged atoms (PPSA-3).

6.   Atomic charge weighted negative surface area: sum of the product of solvent-accessible surface area X partial charge for all negatively charged atoms (PNSA-3).

7.   Difference in charged partial surface areas: partial positive solvent-accessible surface area minus partial negative solvent- accessible surface area (DPSA-1).

8.   Difference in total charge weighted surface areas: total charge weighted positive solvent-accessible surface area minus total charge weighted negative solvent-accessible surface area (DPSA-2).

9.   Difference in atomic charge weighted surface areas: atomic charge weighted positive solvent-accessible surface area minus atomic charge weighted negative solvent-accessible surface area (DPSA-3).

10¯15.   Fractional charged partial surface areas: set of six descriptors obtained by dividing descriptors 1 to 6 by the total molecular solvent-accessible surface area (FPSA-1, FPSA-2, FPSA-3, FNSA-1, FNSA-2, FNSA-3).

16¯21.   Surface-weighted charged partial surface areas: set of six descriptors obtained by multiplying descriptors 1 to 6 by the total molecular solvent-accessible surface area and dividing by 1000 (WPSA-1, WPSA-2, WPSA-3, WNSA-1, WNSA-2, WNSA- 3).

22.   Relative positive charge: charge of most positive atom divided by the total positive charge (RPCG).

23.   Relative negative charge: charge of most negative atom divided by the total negative charge (RNCG).

24.   Relative positive charge surface area: solvent-accessible surface area of the most positive atom divided by descriptor 22 (RPCS).

25.   Relative negative charge surface area: solvent-accessible surface area of most negative atom divided by descriptor 23 (RNCS).

26.   Total hydrophobic surface area: sum of solvent-accessible surface areas of atoms with absolute value of partial charges less than 0.2 (TASA).

27.   Total polar surface area: sum of solvent-accessible surface areas of atoms with absolute value of partial charges greater or equal than 0.2 (TPSA).

28.   Relative hydrophobic surface area: total hydrophobic surface area divided by the total molecular solvent-accessible surface area (RASA).

29.   Relative polar surface area: total polar surface area divided by the total molecular solvent-accessible surface area (RPSA).

30.   Total molecular solvent-accessible surface area (SASA).

Molecular surface area (Area)

The molecular surface area descriptor is a 3D spatial descriptor that describes the van der Waals area of a molecule. The molecular surface area determines the extent to which a molecule exposes itself to the external environment. This descriptor is related to binding, transport, and solubility.

Radius of gyration

The radius of gyration is calculated using the following equation:

Eq. 55            

where N is the number of atoms and x, y, z are the atomic coordinates relative to the center of mass.

Density (Density)

A 3D spatial descriptor that is defined as the ratio of molecular weight to molecular volume. It has the units of g ml-1. The density reflects the types of atoms and how tightly they are packed in a molecule. Density can be related to transport and melt behavior.

Principal moment of inertia (PMI)

Calculates the principal moments of inertia about the principal axes of a molecule according to the following rules:

Eq. 56            

For more information about this descriptor, see Hill (1960).

Molecular volume (Vm)

A 3D spatial descriptor that defines the molecular volume inside the contact surface. The molecular volume is calculated as a function of conformation. Molecular volume is related to binding and transport.


Structural descriptors

This table lists the structural descriptors available in QSAR+:

Symbol Description
Chiral   Number of chiral centers (R or S) in a molecule.  
MW   Molecular weight.  
Rotlbonds   Number of rotatable bonds.  
Hbond acceptor   Number of hydrogen-bond acceptors.  
Hbond donor   Number of hydrogen-bond donors.  

Number of rotatable bonds (Rotlbonds)

Counts the number of bonds in the current molecule having rotations that are considered to be meaningful for molecular mechanics. All terminal H atoms are ignored (for example, methyl groups are not considered rotatable).


Thermodynamic descriptors

This table lists the thermodynamic descriptors available in QSAR+:

Symbol Description
AlogP   Log of the partition coefficient.  
AlogP98   Log of the partition coefficient, atom-type value.  
Fh2o   Desolvation free energy for water.  
Foct   Desolvation free energy for octanol.  
Hf   Heat of formation.  
MolRef   Molar refractivity.  

AlogP, AlogP98, and molar refractivity (MolRef)

LogP (the octanol/water partition coefficient) and molar refractivity are molecular descriptors that can be used to relate chemical structure to observed chemical behavior. LogP is related to the hydrophobic character of the molecule. The molecular refractivity index of a substituent is a combined measure of its size and polarizability.

The QSAR+ descriptor ALogP and molar refractivity are calculated using the method described by Ghose & Crippen (1989). In this atom-based approach, each atom of the molecule is assigned to a particular class, with additive contributions to the total value of logP and molar refractivity.

For more information, see Leffler and Grunwald (1963).

AlogP98 descriptor

The AlogP98 descriptor is an implementation of the atom-type-based AlogP method using the latest published set of parameters (Ghose et al. 1998).

Desolvation free energy for water (FH2O) and octanol (Foct)

Foct and FH2O are physiochemical properties associated with LFE models of a molecule. These properties have proven useful as molecular descriptors in structure-activity analyses. All LFE computations are based solely on the connectivity of the atoms in a molecule. LFE computations are not conformationally dependent.

Foct is the 1-octanol desolvation free energy and FH2O is the aqueous desolvation free energy derived from a hydration shell model developed by Hopfinger, where Foct and FH2O are in kcal mol-1.

GROUP FH2O FOCT
1. CH3 (Methyl-ali)   0.800   -0.160  
2. O (Hydroxyl-ali)   -5.820   -3.790  
3. CH2 (Methylene-ali)   0.200   -0.520  
4. CH (Methine-ali)   -0.240   -0.560  
5. C (t-butyl-ali)   -0.720   -0.920  
6. N (Nitro)   11.910   11.840  
7. C= (Vinyl)   -0.330   -0.970  
8. F (ali-single)   0.800   1.430  
9. Cl (ali-single)   -0.940   -1.020  
10. Cl (ali-multi)   0.490   0.190  
11. Br (ali)   -1.140   -1.510  
12. F (ali-multi)   -3.060   -2.230  
13. F (aro)   0.800   0.260  
14. NC=O (Peptide)   -3.580   0.370  
15. N (Amide)   -2.930   0.020  
16. H (Amide)   -3.030   -3.490  
17. H (Vinyl)   0.290   0.290  
18. O (Ether-ali)   -3.970   -1.810  
19. CH (aro)   -0.170   -0.640  
20. C (aro)   -0.900   -1.130  
21. O (Hydroxyl-aro)   -5.450   -4.980  
22. NO2 (aro)   11.920   12.030  
23. O (Ether-aro)   -3.750   -3.160  
24. C (aro-fuse)   -0.650   -0.650  
25. N (Amide-aro)   -2.680   -1.300  
26. H (Amide-aro)   -3.150   -3.220  
27. CH3 (aro)   0.820   -0.140  
28. CH2 (aro)   0.260   -0.460  
29. CH (aro)   -0.300   -0.620  
30. C (t-butyl-aro)   -1.030   -1.230  
31. Cl (aro)   0.750   -0.510  
32. C=O (ali)   -0.650   1.670  
33. CO2- (ali)   -15.470   -8.410  
34. NO2 (ali)   11.920   13.200  
35. C6H3 (aro)   -3.200   -5.150  
36. C6H4 (aro)   -2.480   -4.780  
37. C6H5 (aro)   -1.750   -4.320  
38. C6H6 (aro)   -1.020   -3.830  
39. CO2- (Carboxyl-aro)   -12.510   -6.890  
40. S (ali)   0.400   1.100  
41. SH (ali)   -0.850   -0.850  
42. S (aro)   -0.260   -0.410  
43. SH (aro)   -0.490   -1.330  
44. SO (ali)   -2.840   0.040  
45. SO2 (ali)   -5.700   -2.750  
46. SO3 (ali)   -8.280   -2.170  
47. SO4 (ali)   -11.600   -2.800  
48. SO (aro)   -1.950   0.840  
49. SO2 (aro)   -3.110   -0.570  
50. SO3 (aro)   -5.540   -3.720  
51. -C:::CH (ali)   -1.030   -2.020  
52. Naphthyl   -2.610   -6.500  
53. Cyclohexyl   0.760   -3.200  
54. I (ali)   -0.420   -1.220  
55. CF3 (ali)   -1.920   -2.950  
56. CF3 (aro)   -1.050   -2.470  
57. CH=O (ali)   -2.490   -3.390  
58. CH=O (aro)   -1.850   -1.280  
59. COOH (ali)   -7.310   -5.800  
60. COOH (aro)   -5.450   -5.400  
61. NH2 (ali)   -2.500   -0.400  
62. NH2 (aro)   -1.880   -0.520  
63. -C...N (ali)   -4.150   -2.420  
64. -C...N (aro)   -2.550   -2.090  
65. -N= (Pyridine)   -1.820   -0.330  
66. CCl3 (ali)   -2.150   -4.580  
67. CCl3 (aro)   -1.490   -5.410  
68. C= (aro-C in ring)   -0.650   0.150  
69. O=C-NH (aro-CN in ring)   -6.610   -3.890  
70. -N= (aro-ring)   -2.680   -1.160  
71. -NH- (aro-N in ring)   5.780   4.820  
72. -NO- (aro-N in ring)   -8.170   -3.460  
73. N (aro-triv N in ring)   -2.680   -1.970  
74. -O- (aro-ring)   -3.970   -3.860  
75. -S- (aro-ring)   -0.390   -0.880  
76. -S=O (aro-div S in ring)   -1.650   1.170  
77. C=O (aro)   -0.490   1.630  
78. -C:::CH (aro)   -1.020   -1.820  
79. I (aro)   -0.290   -2.190  

QSAR calculates FH2O and Foct for each molecule by searching the molecule for recognizable substituent groups and their bonding patterns and summing the substituent constants contributions for each group that is present in the molecule.

For more information, see Hopfinger (1973; 1980) Pearlman (1980).

Heat of formation (Hf)

The enthalpy for forming a molecule from its constituent atoms, a measure of the relative thermal stability of a molecule. This descriptor is calculated using the MNDO semi-empirical molecular orbital method of Dewar. MNDO is the most rigorous quantum-chemical technique available in QSAR+ and has a wide range of applicability in conformational analysis, intermolecular modeling, and chemical reaction modeling. The atom limit of MNDO is 300 atoms or 300 atomic orbitals (whichever is less) per molecule. The atoms treated by MNDO are: H, B, C, N, O, F, Al, Si, P, S, and Cl.

For more information, see Dewar amd Thiele (1977a; 1977b).


pKa descriptors (ACD Labs)

PKas are calculated and the results displayed in the study table according to user-defined rules.

The pKa program, available separately from Advanced Chemistry Development (ACD), is needed for use of this descriptor. You can contact ACD through their website at www.acdlabs.com.


Molecular field analysis (MFA) descriptors

Molecular field analysis (MFA) evaluates the energy between a probe and a molecular model at a series of points defined by a rectangular or spherical grid. These energies may be added to the study table to form new columns headed according to the probe type. The new columns may be used as independent X variables in the generation of QSARs. For more information about working with MFA descriptors, see Chapter 10, Performing molecular field analysis.


Receptor surface analysis (RSA) descriptors

For a theoretical description of receptor surface models, please see Cerius2 Hypothesis and Receptor Models, which touches briefly on functionality in the Receptor module called receptor surface analysis (RSA).

If you have used Receptor, you may already be familiar with the idea of using the energy of interaction between a drug model and a receptor surface model to calculate a QSAR. For an example of this, run the demonstration log file Cerius2-Resources/ EXAMPLES/demos/DDW_receptordemo2.log. The energies of interaction between the receptor surface model and each molecular model are added to the study table as new columns, which you can use for generating QSARs. These energies may be added to the study table with the Receptor_energies descriptor.

An additional descriptor, Receptor_RSA, allows you to add the energy of interaction between each point on the receptor surface and each model to the study table and use these surface point energies to calculate a QSAR. Instead of one total number that is the sum of the interactions evaluated between each point on the surface and each molecular model, leading to one extra column in the study table, you now have available the energies at each surface point.

Depending on the size of the drug molecules, this is potentially a great number of surface points. Filtering methods are available to reduce the input to the study table, based on the variance of the energies at any point, correlation of the energies with activity data, or simply adding every nth point.

The technique resembles CoMFA but, instead of a rectangular grid, the points considered are taken from the receptor surface. Therefore they are probably more chemically relevant than a rectangular grid, because they exist on a surface that is shaped like a molecule, and even better, a surface constructed from a subset of active molecules.

After adding the receptor surface point energies to the study table, you may calculate a QSAR using the receptor surface energies and biological activities. Early tests indicate that if the genetic function algorithm (GFA) method is used, nonlinear terms must be included.


ADME descriptors

Intestinal Absorption Model Approximately 200 well absorbed molecules, of which 181 were drugs or drug-like, were used to develop a pattern recognition model for predicting passive intestinal absorption. The model was developed using robust outlier detection methods to identify and remove actively transported molecules. Descriptor space was chosen to be AlogP98 and van der Waals Polar Surface Area (PSA) for nitrogen and oxygen atoms and attached hydrogen atoms Multivariate distance (T2) from the center of PSA-AlogP98 space was computed and cutoffs were used to classify test set molecules. Fast PSA (FPSA) was used to rapidly and accurately approximate static PSA (R2 = 0.996, RMSE = 5.5 Å2) Results demonstrated that 91.5% of highly Caco-2 permeable molecules are classified as well absorbed or moderately absorbed.

BBB Penetration Model Features Two models were developed:

a.   1) a robust regression (least median-of-squares) to predict logBB values based on over 120 compounds.

b.   2) a BBB confidence ellipse (derived from over 800 compounds classified as CNS therapeutic) after robust outlier detection.

The regression model has an R2 = 0.889 and RMSE = 0.31. Molecules outside the BBB confidence ellipse are predicted to have extremely poor logBB, so no numerical prediction is given via regression. The model has excellent performance separating the BBB penetrant compounds from the non-BBB penetrant compounds in the PDR dataset.

Water Solubility Model Features The experimental solubility values of 784 compounds were used to develop the model using genetic partial least squares regression. The model has an R2= 0.84 and test set RMSE=0.87. The model predicts the aqueous solubility of a compound at 25 C and reports the predicted solubility and its solubility ranking relative to the solubilities of other drug molecules. The mod el has an RMSE=1.0 for a large diverse test set of 1615 compounds including molecules from the PDR and the CMC.



Accelrys Product Previous Next Contents Index Top

Last updated June 13, 2001 at 03:27PM Pacific Daylight Time.
Copyright © 2001, Accelrys. All rights reserved.