MTK++ Latest version: 0.2.0

Functional Group Recognition

The algorithm used is in close agreement with that published by Ullman, (J. ACM 1976, 23, 31-42) and Willett, Wilson, and Reddaway, (J. Chem. Inf. Comput. Sci. 1991, 31, 225-233).

To functionalize a molecule involves searching it for chemical substructures.

Substructures searching is known as the subgraph isomorphism problem of graph theory and belongs to the class of NP-complete computational problems.

Due to the NP-complete nature of substructure searching usually a screen is carried out to eliminate subgraphs that cannot be contained in the molecule.

My Motivations to Write this Code:

Brute Force Subgraph Isomorphism

Generating the adjacency matrices A and B for the fragment and the molecule containing $P_A$ and $P_B$ atoms respectively.

An exhautive search involves generating $ P_B!/[P_A!(P_B-P_A)!] $ combinations of $P_A$.

Ullmann first noticed that using a depth-first backtracking search dramatically increases efficiency.

Using a labeled graph and a non-binary connection table increases algorithm speed.

Ullman Subgraph Isomorghism

The following example was taken from Molecular Modelling, Principles and Applications 2nd Edition by Andrew R. Leach.

\[A = \left(\begin{array}{cccccc} 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right) \]

\[ A(AM)^T = \left(\begin{array}{cccccc} 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right) \left(\begin{array}{cccc} 0 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 \end{array}\right) = \left(\begin{array}{cccc} 0 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array}\right) = F \]

This depth-first backtracking algorithm uses a General match matrix, M that contains all the possible equivalences between atoms from A and B. The elements of this matrix, $m_{ij}(1 \leq i \leq Pa; 1 \leq j \leq Pb) $ are such that:

\[ \def\tempa{if the $i$th atom of $A$ can be mapped to the $j$th atom of $B$} \def\tempb{otherwise} m_{ij} = \left\{\begin{array}{lp} 1 & \tempa \\ 0 & \tempb \end{array}\right \]

"The Ullmann heuristic states that if a fragment atom $a_i$ has a neighbor $x$, and a molecule atom $b_j$ can be mapped to $a_i$ , then there must exist a neighbor of $b_j$, $y$, that can be mapped to $x$":

\[ m_{ij} = \forall \quad x(1 ...P_A)[(a_{ix} = 1) \quad \Rightarrow \quad \exists \quad y(1 ... P_B)(m_{xy}b_{jy} = 1)] \]

if at any state during the search an atom $i$ in $A$ such that $m_{ij} = 0$ for all atoms in $B$ then a mismatch is identified.

\[ mismatch = \exists \quad i(1 ...P_A)[(m_{ij} = 0 \quad \forall \quad j(1 ... P_B)] \]

     int   Pa;        // Number of atoms in the query pattern
     int   Pb;        // Number of atoms in the database structure
     array A[Pa,Pa];  // adjacency matrix of A
     array B[Pb,Pb];  // adjacency matrix of B
     array M[Pa, Pb]; // general match matrix
     bool isomorphism = false;

     void ullman(int d, array M) {
       array M1 = M;
       bool mismatch;

       for (all unique mappings of atom d) {
         choose new unique mapping for query atom d

         update M accordingly

         refine(M, mismatch);

         if (!mismatch) {
           if (d == Pa) {
             isomorphism = true;
             store M;
           }
           else {
             ullman(d+1, M);
           }
         }
         else {
           M = M1;
         }
       }
     }

     void refine(array M, bool mismatch) {
       mismatch = false;
       bool change = false;
       while (!change || mismatch) {
         for (int i = 0; i < Pa; i++) {
           for (int j = 0; j < Pb; j++) {
             if (M[i][j]) {
               assign mij;
               change = change or (!mij);
             }
           }
         }
         assign mismatch;
       }
     }
     int main() {
       ullman(1, M);
     }

The Functionalize Algorithm

Generated on Fri Dec 23 2011 09:28:54 for MTK++ by Doxygen 1.7.5.1