2.18. Fragment Specification

Atoms in a calculation can be grouped into specific fragments, which serve multiple purposes. Fragment definitions can be used to assign Basis Sets and ECPs, organize output in the population analysis section, and enable features like fragment constrain optimization and Rigid Body Optimization. They are also used in local energy decomposition and multi-level calculations.

2.18.1. Fragments defined on Input File

There are three ways to assign atoms to fragments using the input file. The first method is to assign a specific atom to a specific fragment by placing (n) directly after the atomic symbol in the coordinates section.

*xyz -2 2
 Cu(1)  0.00  0.00  0.00
 Cl(2)  2.25  0.00  0.00
 Cl(2) -2.25  0.00  0.00
 Cl(2)  0.00  2.25  0.00
 Cl(2)  0.00 -2.25  0.00
*

In this example the fragment feature is used to divide the molecule into a “metal” and a “ligand” fragment and consequently the program will print the metal and ligand charges and populations.

----------------------------------------------
CARTESIAN COORDINATES OF FRAGMENTS (ANGSTROEM)
----------------------------------------------

 FRAGMENT 1
  Cu    0.000000    0.000000    0.000000

 FRAGMENT 2
  Cl    2.250000    0.000000    0.000000
  Cl   -2.250000    0.000000    0.000000
  Cl    0.000000    2.250000    0.000000
  Cl    0.000000   -2.250000    0.000000

...

----------------------------------------------
MULLIKEN FRAGMENT CHARGES AND SPIN POPULATIONS
----------------------------------------------
 Fragment   0 :   0.752589     0.842580
 Fragment   1 :  -2.752589     0.157420
Sum of fragment charges         :   -2.0000000
Sum of fragment spin populations:    1.0000000

...

--------------------------------------------
LOEWDIN FRAGMENT CHARGES AND SPIN POPULATONS
--------------------------------------------
 Fragment   0 :   0.222028     0.851552
 Fragment   1 :  -2.222028     0.148448

Alternatively, the %coords block can be used for fragment definitions in the same way—by placing (n) directly after the atomic symbol.

%coords
 CTyp   xyz  # the type of coordinates xyz or internal
 Charge -2   # the total charge of the molecule
 Mult   2    # the multiplicity = 2S+1
 coords
    Cu(1)  0.00  0.00  0.00
    Cl(2)  2.25  0.00  0.00
    Cl(2) -2.25  0.00  0.00
    Cl(2)  0.00  2.25  0.00
    Cl(2)  0.00 -2.25  0.00
 end
end

Important

  • In cases where all atoms are explicitly assigned to fragments, the fragment numbering must start at 1 and use consecutive integers. Non-consecutive or incorrect numbering may lead to errors.

  • If any atom is left unassigned (or explicitly assigned to fragment 0), it will be automatically assigned to a fragment using the fragmentation procedure described in Automatic Fragmentation section. In such cases—where only a subset of atoms is manually assigned to fragments—it is not necessary to use consecutive fragment numbers. However, the highest fragment number specified in the input must be less than the total number of fragments generated by the combination of manual and automatic procedures. If this condition is not met, ORCA will automatically reorder all fragment numbers in ascending order, starting from 1.

Finally, a third way to define fragments consists of using a Definition inside the %frag block. In this scheme, the fragment number comes first, followed by a list of atoms (enumerated starting from 0) enclosed in curly brackets {} and finishing with end. Consecutive atoms can also be specified using the notation initial_atom:final_atom.

*xyz -2 2
 Cu  0.00  0.00  0.00
 Cl  2.25  0.00  0.00
 Cl -2.25  0.00  0.00
 Cl  0.00  2.25  0.00
 Cl  0.00 -2.25  0.00
*

%frag
 Definition
  1 {0} end    # atom 0 for fragment 1
  2 {1:4} end  # atoms 1 to 4 for fragment 2
 end
end
*xyz -2 2
 Cl  2.25  0.00  0.00
 Cl -2.25  0.00  0.00
 Cu  0.00  0.00  0.00
 Cl  0.00  2.25  0.00
 Cl  0.00 -2.25  0.00
*

%frag
 Definition
  1 {2} end        # atom 2 for fragment 1
  2 {0:1 3:4} end  # atoms 0, 1, 3, and 4 for fragment 2
 end
end

Note

  • With the last option (Definition) the %frag block has to be written after the coordinate section.

  • %frag Definition also works with coordinates that are defined via an external file.

2.18.2. Automatic Fragmentation

Starting with ORCA 6.1, a set of automatic fragmentation algorithms has been introduced to recognize and group atoms into fragments automatically.

The automatic fragmentation procedure is triggered by including a %frag block in the input file or when only a subset of atoms has been manually assigned to fragments.

Automatic fragmentation is performed using a set of procedures that can be selected by the user within the %frag block. Each procedure attempts to identify new fragments among atoms that have not yet been assigned. For example, consider a case below where the first 11 atoms belong to a propane molecule and the last three belong to a water molecule. The Water procedure in FragProc identifies the last three atoms as a water molecule and assigns them to the first fragment, while the FunctionalGroups procedure detects and assigns the CH3 and CH2 groups of propane as fragments 2 to 4.

%frag
   FragProc Water, FunctionalGroups
end

* xyz 0 1
 C     0.000000     1.270000    -0.260000
 C     0.000000    -0.000000     0.580000
 C     0.000000    -1.270000    -0.260000
 H    -0.890000     1.320000    -0.910000
 H     0.890000     1.320000    -0.910000
 H     0.000000     2.180000     0.370000
 H     0.880000    -0.000000     1.250000
 H    -0.880000    -0.000000     1.250000
 H     0.890000    -1.320000    -0.910000
 H     0.000000    -2.180000     0.370000
 H    -0.880000    -1.300000    -0.910000
 H    -0.920000     0.850000    -2.430000
 O    -1.690000     0.830000    -3.000000
 H    -1.640000     1.630000    -3.510000
*
  ----------------------------------------------
  CARTESIAN COORDINATES OF FRAGMENTS (ANGSTROEM)
  ----------------------------------------------
  
   FRAGMENT 1
    H    -0.920000    0.850000   -2.430000
    O    -1.690000    0.830000   -3.000000
    H    -1.640000    1.630000   -3.510000
  
   FRAGMENT 2
    C     0.000000    1.270000   -0.260000
    H    -0.890000    1.320000   -0.910000
    H     0.890000    1.320000   -0.910000
    H     0.000000    2.180000    0.370000
  
   FRAGMENT 3
    C     0.000000   -1.270000   -0.260000
    H     0.890000   -1.320000   -0.910000
    H     0.000000   -2.180000    0.370000
    H    -0.880000   -1.300000   -0.910000
  
   FRAGMENT 4
    C     0.000000   -0.000000    0.580000
    H     0.880000   -0.000000    1.250000
    H    -0.880000   -0.000000    1.250000

This automatic fragmentation yields the same result as the manual fragment definition shown below, without the need to inspect the geometry and assign fragments manually.

%frag
 Definition
 1 { 11:13} end  # water
 2 { 0 3:5} end  # CH3
 3 { 2 8:10} end # CH3
 4 { 1 6:7} end  # CH2
 end
end

Note

  • Any fragment defined in the input file take precedence over automatic assignments.

  • ORCA supports up to 10 procedures in FragProc, with the full list provided in Table 2.60 and Table 2.61.

  • Constrained Fragments do not enable automatically the Automatic Fragmentation when there are atoms unassigned to fragments. However, automatic fragmentation can be activated by including a %frag block in the input file.

ORCA provides over 30 FragProc methods, which can be combined in a list to generate fragments for various purposes. Below are explanations and examples of the different procedures.

2.18.2.1. Automatic Fragmentation: Connectivity

FragProc Connectivity groups atoms that are connected by chemical bonds, estimated based on atomic radii. In the example below, the first nine atoms belong to a dimethyl ether molecule, which is automatically detected and assigned to one fragment, while the last three atoms—belonging to a water molecule—are assigned to a second fragment.

%frag
  PrintLevel 3
  FragProc Connectivity
end

*xyz 0 1
 O     0.000000     0.000000     0.000000
 C     0.000000     0.000000     1.380000
 C     1.300000     0.000000    -0.460000
 H    -0.500000     0.870000     1.740000
 H    -0.500000    -0.870000     1.740000
 H     1.000000     0.000000     1.740000
 H     1.300000     0.000000    -1.530000
 H     1.800000     0.870000    -0.100000
 H     1.800000    -0.870000    -0.100000
 H    -1.840000     0.000000    -0.650000
 O    -2.440000     0.000000     0.080000
 H    -3.300000     0.000000    -0.300000
*

In this case, the output of the automatic fragmentation tool indicates that two fragments were assigned by the Connectivity procedure.

------------------------------------------
 Assigned Fragment: 1 
------------------------------------------
  Name: Connectivity:0 Method: Connectivity 
  Natoms: 9 Charge: 0 Mult: 1 

 0 O     0.000000    0.000000    0.000000 
 1 C     0.000000    0.000000    1.380000 
 2 C     1.300000    0.000000   -0.460000 
 3 H    -0.500000    0.870000    1.740000 
 4 H    -0.500000   -0.870000    1.740000 
 5 H     1.000000    0.000000    1.740000 
 6 H     1.300000    0.000000   -1.530000 
 7 H     1.800000    0.870000   -0.100000 
 8 H     1.800000   -0.870000   -0.100000 
------------------------------------------

------------------------------------------
 Assigned Fragment: 2 
------------------------------------------
  Name: Connectivity:1 Method: Connectivity 
  Natoms: 3 Charge: 0 Mult: 1 

 9 H    -1.840000    0.000000   -0.650000 
 10 O    -2.440000    0.000000    0.080000 
 11 H    -3.300000    0.000000   -0.300000 
------------------------------------------

Each fragmentation procedure applies only to atoms that have not already been assigned to a fragment. Consequently, connectivity-based fragmentation will ignore any bonds to atoms that have been assigned in the input file or by a previous FragProc.

For example, if in the previous case example the oxygen atom is manually assigned to fragment 1 in the input file, this manual assignment takes precedence over the FragProc Connectivity. As a result, four fragments are created: one fragment containing the manually assigned oxygen atom, two CH3 groups, and one water molecule.

%frag
  PrintLevel 3
  FragProc Connectivity
end


*xyz 0 1
 O(1)  0.000000     0.000000     0.000000
 C     0.000000     0.000000     1.380000
 C     1.300000     0.000000    -0.460000
 H    -0.500000     0.870000     1.740000
 H    -0.500000    -0.870000     1.740000
 H     1.000000     0.000000     1.740000
 H     1.300000     0.000000    -1.530000
 H     1.800000     0.870000    -0.100000
 H     1.800000    -0.870000    -0.100000
 H    -1.840000     0.000000    -0.650000
 O    -2.440000     0.000000     0.080000
 H    -3.300000     0.000000    -0.300000
*

The output of the automatic fragmentation tool indicates that the first fragment was assigned by the Orca_Input procedure, while the remaining three fragments were assigned by the Connectivity procedure.

===================================================
Tfragmentator: Fragmenting by Orca_input
===================================================

------------------------------------------
 Assigned Fragment: 1
------------------------------------------
  Name: User-defined:0 Method: Orca_input
  Natoms: 1 Charge: 0 Mult: 1 InputId: 1

 0 O     0.000000    0.000000    0.000000
------------------------------------------

===================================================
Tfragmentator: Fragmenting by Connectivity
===================================================

------------------------------------------
 Assigned Fragment: 2 
------------------------------------------ 
  Name: Connectivity:1 Method: Connectivity
  Natoms: 4 Charge: 0 Mult: 1 

 1 C     0.000000    0.000000    1.380000 
 3 H    -0.500000    0.870000    1.740000 
 4 H    -0.500000   -0.870000    1.740000 
 5 H     1.000000    0.000000    1.740000 
------------------------------------------

------------------------------------------
 Assigned Fragment: 3 
------------------------------------------
  Name: Connectivity:2 Method: Connectivity 
  Natoms: 4 Charge: 0 Mult: 1 

 2 C     1.300000    0.000000   -0.460000 
 6 H     1.300000    0.000000   -1.530000 
 7 H     1.800000    0.870000   -0.100000 
 8 H     1.800000   -0.870000   -0.100000
------------------------------------------

------------------------------------------
 Assigned Fragment: 4
------------------------------------------
  Name: Connectivity:3 Method: Connectivity
  Natoms: 3 Charge: 0 Mult: 1

 9 H    -1.840000    0.000000   -0.650000
 10 O    -2.440000    0.000000    0.080000
 11 H    -3.300000    0.000000   -0.300000
------------------------------------------

2.18.2.2. Automatic Fragmentation: Atomic and NotAssigned

FragProc Atomic and FragProc NotAssigned are termination procedures. FragProc Atomic assigns each previously unassigned atom to its own individual fragment, while FragProc NotAssigned assigns all remaining unassigned atoms to a single fragment.

Similar to other FragProc methods, the output of the automatic fragmentation tool indicates that the Atomic or Not_assigned procedure has been used to generate the corresponding fragments.

%frag
  PrintLevel 3
  FragProc Atomic
end

*xyz 0 1
 O(1)  0.000000     0.000000     0.000000
 C     0.000000     0.000000     1.380000
 C     1.300000     0.000000    -0.460000
 H    -0.500000     0.870000     1.740000
 H    -0.500000    -0.870000     1.740000
 H     1.000000     0.000000     1.740000
 H     1.300000     0.000000    -1.530000
 H     1.800000     0.870000    -0.100000
 H     1.800000    -0.870000    -0.100000
 H    -1.840000     0.000000    -0.650000
 O    -2.440000     0.000000     0.080000
 H    -3.300000     0.000000    -0.300000
*
===================================================
Tfragmentator: Fragmenting by Orca_input
===================================================

------------------------------------------
 Assigned Fragment: 1
------------------------------------------
  Name: User-defined:0 Method: Orca_input
  Natoms: 1 Charge: 0 Mult: 1 InputId: 1

 0 O     0.000000    0.000000    0.000000
------------------------------------------

===================================================
Tfragmentator: Fragmenting by Atomic
===================================================

------------------------------------------
 Match: 1, Assigned Fragment: 2
------------------------------------------
  Name: C  Method: Atomic
  Natoms: 1 Charge: 0 Mult: 1

 1 C     0.000000    0.000000    1.380000
------------------------------------------

------------------------------------------
 Match: 2, Assigned Fragment: 3
------------------------------------------
  Name: C  Method: Atomic
  Natoms: 1 Charge: 0 Mult: 1

 2 C     1.300000    0.000000   -0.460000
------------------------------------------

...

------------------------------------------
 Match: 11, Assigned Fragment: 12
------------------------------------------
  Name: H  Method: Atomic
  Natoms: 1 Charge: 0 Mult: 1

 11 H    -3.300000    0.000000   -0.300000
------------------------------------------
%frag
  PrintLevel 3
  FragProc NotAssigned
end


*xyz 0 1
 O(1)  0.000000     0.000000     0.000000
 C     0.000000     0.000000     1.380000
 C     1.300000     0.000000    -0.460000
 H    -0.500000     0.870000     1.740000
 H    -0.500000    -0.870000     1.740000
 H     1.000000     0.000000     1.740000
 H     1.300000     0.000000    -1.530000
 H     1.800000     0.870000    -0.100000
 H     1.800000    -0.870000    -0.100000
 H    -1.840000     0.000000    -0.650000
 O    -2.440000     0.000000     0.080000
 H    -3.300000     0.000000    -0.300000
*
===================================================
Tfragmentator: Fragmenting by Orca_input
===================================================

------------------------------------------
 Assigned Fragment: 1
------------------------------------------ 
  Name: User-defined:0 Method: Orca_input 
  Natoms: 1 Charge: 0 Mult: 1 InputId: 1

 0 O     0.000000    0.000000    0.000000
------------------------------------------

===================================================
Tfragmentator: Setting not assigned atoms to a fragment
===================================================

------------------------------------------
 Assigned Fragment: 1 
------------------------------------------
  Name: Not Assigned Method: Not_assigned
  Natoms: 11 Charge: 0 Mult: 1 

 1 C     0.000000    0.000000    1.380000 
 2 C     1.300000    0.000000   -0.460000
 3 H    -0.500000    0.870000    1.740000
 4 H    -0.500000   -0.870000    1.740000
 5 H     1.000000    0.000000    1.740000 
 6 H     1.300000    0.000000   -1.530000  
 7 H     1.800000    0.870000   -0.100000
 8 H     1.800000   -0.870000   -0.100000
 9 H    -1.840000    0.000000   -0.650000  
 10 O    -2.440000    0.000000    0.080000 
 11 H    -3.300000    0.000000   -0.300000
------------------------------------------
 

Note

  • FragProc NotAssigned is always applied at the end of all FragProc procedures to ensure that no atoms remain without a fragment assignment.

2.18.2.3. Automatic Fragmentation: Internal Libraries

ORCA includes a series of internal libraries containing definitions of many common molecular structures. In all cases, structure recognition is performed using a VF2 subgraph isomorphism algorithm applied to molecular graphs constructed from Cartesian coordinates.

The available fragmentation procedures that make use of internal libraries are listed in Table 2.60.

Table 2.60 Simple input keywords for Fragment detection

Fragment detection Keyword

Description

FunctionalGroups

Contains a list of the most common organic functional groups.

Aminoacids

Contains a list of all amino acids, including all common protonation states, but excluding zwitterionic forms.

AABackbone

Contains fragment definitions for amino acid backbone detection.

Backbone

Performs AABackbone followed by merging all fragments into a single protein backbone fragment.

SeqBackbone

Similar to AABackbone, but peptide bonds are assigned as separate fragments.

AASidechains

Contains a list of all amino acid side chains.

AASCFinegrained

Contains a detailed list of organic functional groups within amino acid side chains.

NABackbone

Contains fragment definitions for DNA/RNA backbones; the phosphate group is assigned to the 3′ position.

SEQNABackbone

Contains fragment definitions for DNA/RNA backbones; the phosphate group is assigned to the 5′ position.

NABBFinegrained

Same as NABackbone, but further splits the phosphate group.

Nucleoticacid

Contains a list of all nucleic acids.

NASidechains

Contains a list of all nucleic acid side chains.

Solvents

Contains definitions for common solvents: 1-octanol, n-hexane, cyclohexane, toluene, chlorobenzene, tetrahydrofuran, benzene, N,N-dimethylformamide, pyridine, dimethyl sulfoxide, acetone, ethanol, acetonitrile, methanol, chloroform, carbon tetrachloride, dichloromethane, ammonia, and water.

Water

Contains definitions of water molecules, faster than Solvents if only fragment water molecules.

Similar to other fragmentation procedures, listing multiple libraries in FragProc will apply the procedures consecutively. Using the first example listed in Automatic Fragmentation (input), and including PrintLevel 3 in the %frag block to increase verbosity, the output will indicate the assignment of fragments by the Water and Functional_groups procedures.

===================================================
Tfragmentator: Fragmenting by Water
===================================================

------------------------------------------
 Match: 1, Assigned Fragment: 1
------------------------------------------
  Name:  WATER Method: Water
  Natoms: 3 Charge: 0 Mult: 1

 11 H    -0.920000    0.850000   -2.430000
 12 O    -1.690000    0.830000   -3.000000
 13 H    -1.640000    1.630000   -3.510000
------------------------------------------

===================================================
Tfragmentator: Fragmenting by Functional_groups
===================================================

------------------------------------------
 Match: 1, Assigned Fragment: 2
------------------------------------------
  Name:  CH3 Method: Functional_groups
  Natoms: 4 Charge: 0 Mult: 1

 0 C     0.000000    1.270000   -0.260000
 3 H    -0.890000    1.320000   -0.910000
 4 H     0.890000    1.320000   -0.910000
 5 H     0.000000    2.180000    0.370000
------------------------------------------

------------------------------------------
 Match: 2, Assigned Fragment: 3
------------------------------------------
  Name:  CH3 Method: Functional_groups
  Natoms: 4 Charge: 0 Mult: 1

 2 C     0.000000   -1.270000   -0.260000
 8 H     0.890000   -1.320000   -0.910000
 9 H     0.000000   -2.180000    0.370000
 10 H    -0.880000   -1.300000   -0.910000
------------------------------------------

------------------------------------------
 Match: 1, Assigned Fragment: 4
------------------------------------------
  Name:  CH2 Method: Functional_groups
  Natoms: 3 Charge: 0 Mult: 1

 1 C     0.000000   -0.000000    0.580000
 6 H     0.880000   -0.000000    1.250000
 7 H    -0.880000   -0.000000    1.250000
------------------------------------------

2.18.2.4. Automatic Fragmentation: External Libraries

The automated fragmentator also allows users to supply .xyz files via the XZYFRAGLIB variable in %frag block, containing geometries that should be recognized as fragments. The FragProc Extlib procedure automatically converts each provided geometry into a molecular graph and then applies a VF2 subgraph isomorphism algorithm—just as is done with the internal libraries.

The following example uses definitions of CH3O and CH3 fragments from the file Mylib.xyz to identify and generate fragments within the dimethyl ether geometry provided below:

%frag
  PrintLevel 3
  FragProc Extlib
  XZYFRAGLIB "Mylib.xyz"
end

*xyz 0 1
 O     0.000000     0.000000     0.000000
 C     0.000000     0.000000     1.380000
 C     1.300000     0.000000    -0.460000
 H    -0.500000     0.870000     1.740000
 H    -0.500000    -0.870000     1.740000
 H     1.000000     0.000000     1.740000
 H     1.300000     0.000000    -1.530000
 H     1.800000     0.870000    -0.100000
 H     1.800000    -0.870000    -0.100000
*

Where Mylib.xyz contains definitions for methyl and methoxy groups.

    5
CHARGE 0 MULT 1 NAME CH3O
 O     0.000000     0.000000     0.000000
 C     0.000000     0.000000     1.380000
 H     1.008807     0.000000     1.736663
 H    -0.328435     0.953845     1.736667
 H    -0.794950    -0.621083     1.736667
    4
CHARGE 0 MULT 1 NAME CH3
 C     0.000000     0.000000     0.000000
 H     0.000000     0.000000     1.070000
 H     1.008807     0.000000    -0.356663
 H    -0.328435    -0.953845    -0.356667

The result is the assignment of atoms 0, 1, 3, 4, and 5 to a CH3O fragment, with the remaining atoms assigned to a CH3 by the Ext_lib procedure.

===================================================
Tfragmentator: Fragmenting by Ext_lib
===================================================
****
**** There are 2 Ref structures found in file Mylib.xyz
****

------------------------------------------
 Match: 1, Assigned Fragment: 1
------------------------------------------
  Name:  CH3O Method: Ext_lib
  Natoms: 5 Charge: 0 Mult: 1

 0 O     0.000000    0.000000    0.000000
 1 C     0.000000    0.000000    1.380000
 3 H    -0.500000    0.870000    1.740000
 4 H    -0.500000   -0.870000    1.740000
 5 H     1.000000    0.000000    1.740000
------------------------------------------

------------------------------------------
 Match: 1, Assigned Fragment: 2
------------------------------------------
  Name:  CH3 Method: Ext_lib
  Natoms: 4 Charge: 0 Mult: 1

 2 C     1.300000    0.000000   -0.460000
 6 H     1.300000    0.000000   -1.530000
 7 H     1.800000    0.870000   -0.100000
 8 H     1.800000   -0.870000   -0.100000
------------------------------------------

Note

  • XZYFRAGLIB allows the inclusion of up to 10 files. Each file may contain multiple fragment definitions; however, each definition must consist of a single, connected molecule—unconnected structures within a single definition are not allowed.

  • The format of Mylib.xyz follows the standard XYZ file structure, but optionally supports three identifiers: CHARGE, MULT, and NAME, with NAME expected to appear last. These identifiers are printed when a fragment is recognized, though they are not currently used in any calculations.

  • Subsequent modifications to fragments by other methods (see Fusebyatoms and Extend below) do not affect the fragment’s CHARGE or MULT values in the identifiers.

Important

  • ORCA’s VF2 subgraph isomorphism algorithm is based solely on atomic connectivity; it does not consider stereochemistry or bond orders.

  • Fragment assignment by FragProc Extlib follows the order of the files listed in XZYFRAGLIB and the order of fragment definitions within each file. Since an atom is excluded from subsequent matching once it has been assigned to a fragment, the order in which fragments are defined in the library is critically important. For example, in the previous case, if CH3 is defined before CH3O in Mylib.xyz, both CH3 groups in the input file will be matched first. This prevents CH3O from being recognized later, even if it is present in the system.

  • Atom assignment to fragments follows the order in which atoms appear in the geometry. In the previous example, the CH3O fragment could be assigned using either the C(1)-O or C(2)-O bond. However, since C(1) appears first in the geometry, it is selected for the CH3O fragment.

2.18.2.5. Automatic Fragmentation: Extend

Many fragments in the internal libraries represent incomplete molecular structures. Therefore, in some cases, it is necessary to add additional hydrogen atoms—or, in certain situations, a hydroxyl group—to complete the system. The FragProc Extend addresses this by identifying oxygen atoms bonded to carbon atoms in previously assigned fragments, as well as hydrogen atoms bonded to carbon, nitrogen, or oxygen. It then extends the corresponding fragments to include these atoms.

Consider the example below, which consists of a zwitterionic glycine molecule fragmented using an external library that defines a C–C–N fragment. In this case, FragProc Extend can be applied to add the missing oxygen and hydrogen atoms, thereby completing the molecular structure.

%frag
  PrintLevel 3
  FragProc Extlib, Extend
  XZYFRAGLIB "Mylib.xyz"
end

*xyz 0 1
 N     0.000000     0.000000     0.000000
 C     0.000000     0.000000     1.460000
 C     1.403962     0.000000     2.015868
 O     1.837767     0.962613     2.627089
 O     2.161436    -0.947142     1.883370
 H     0.423874    -0.764689    -0.525339
 H    -0.538292    -0.884247     1.831954
 H    -0.538292     0.884247     1.831954
 H    -0.970478     0.016940    -0.313504
 H     0.499909     0.831989    -0.313504
*

Where Mylib.xyz in this case is:

    3
Name CCN
 C     0.000000     0.000000     0.000000
 C     0.000000     0.000000     1.500000
 N     1.376503     0.000000    -0.486661

The result is an initial assignment of the C–C–N fragment, followed by an extension of the fragment to include two oxygen atoms and five hydrogen atoms.

------------------------------------------
 Match: 1, Assigned Fragment: 1 
------------------------------------------
  Name:  CCN Method: Ext_lib 
  Natoms: 3 Charge: 0 Mult: 1 

 0 N     0.000000    0.000000    0.000000 
 1 C     0.000000    0.000000    1.460000 
 2 C     1.403962    0.000000    2.015868 
------------------------------------------

===================================================
Tfragmentator: Extending Fragments
===================================================
 Extending Fragment 0 with O  (3)
 Extending Fragment 0 with O  (4)
 Extending Fragment 0 with H  (6)
 Extending Fragment 0 with H  (7)
 Extending Fragment 0 with H  (5)
 Extending Fragment 0 with H  (8)
 Extending Fragment 0 with H  (9) 

Important

  • FragProc Extend attempts to extend any previously assigned fragment.

  • FragProc Extend does not modify the Charge and Mult identifiers of fragments.

2.18.2.6. Automatic Fragmentation: Fusebyatoms

When multiple fragmentation procedures are used in combination, it may be necessary to merge two previously assigned fragments into one. This can be achieved using the FragProc Fusebyatoms procedure, which identifies atom pairs that should belong to the same fragment, as specified in FuseAtomPairs.

In the example below, the objective is to fragment a propane molecule into a CH3 group and a CH3–CH2 fragment. One way to achieve this is by first applying FragProc FunctionalGroups, which fragments the molecule into two CH3 groups and one CH2 group. The CH2 group can then be merged with one of the CH3 groups. This is done by specifying the carbon atoms of the CH2 and one CH3 group in the FuseAtomPairs directive: FuseAtomPairs {0 1} end.

%frag
        printlevel 3
        FragProc FunctionalGroups, Fusebyatoms
        FuseAtomPairs {0 1} end
end

*xyz 0 1
 C     0.000000     1.270000    -0.260000
 C     0.000000    -0.000000     0.580000
 C     0.000000    -1.270000    -0.260000
 H    -0.890000     1.320000    -0.910000
 H     0.890000     1.320000    -0.910000
 H     0.000000     2.180000     0.370000
 H     0.880000    -0.000000     1.250000
 H    -0.880000    -0.000000     1.250000
 H     0.890000    -1.320000    -0.910000
 H     0.000000    -2.180000     0.370000
 H    -0.880000    -1.300000    -0.910000
*

The output from the fragmentator first reports the initial fragmentation performed by the Functional_groups library, followed by a message indicating the subsequent fusion of fragments as specified by the FuseAtomPairs directive.

===================================================
Tfragmentator: Fragmenting by Functional_groups
===================================================

------------------------------------------
 Match: 1, Assigned Fragment: 1
------------------------------------------
  Name:  CH3 Method: Functional_groups
  Natoms: 4 Charge: 0 Mult: 1

 0 C     0.000000    1.270000   -0.260000
 3 H    -0.890000    1.320000   -0.910000
 4 H     0.890000    1.320000   -0.910000
 5 H     0.000000    2.180000    0.370000
------------------------------------------

------------------------------------------
 Match: 2, Assigned Fragment: 2
------------------------------------------
  Name:  CH3 Method: Functional_groups
  Natoms: 4 Charge: 0 Mult: 1

 2 C     0.000000   -1.270000   -0.260000
 8 H     0.890000   -1.320000   -0.910000
 9 H     0.000000   -2.180000    0.370000
 10 H    -0.880000   -1.300000   -0.910000
------------------------------------------

------------------------------------------
 Match: 1, Assigned Fragment: 3
------------------------------------------
  Name:  CH2 Method: Functional_groups
  Natoms: 3 Charge: 0 Mult: 1

 1 C     0.000000   -0.000000    0.580000
 6 H     0.880000   -0.000000    1.250000
 7 H    -0.880000   -0.000000    1.250000
------------------------------------------

===================================================
Tfragmentator: Fusing Fragments
===================================================
Fusing Fragment 0 (atom 0) with Fragment 2 (atom 1)

2.18.2.7. Automatic Fragmentation: Delete and advanced fragmentation workflows

Almost all fragmentation schemes have an associated delete procedure, which removes the fragments generated by that specific scheme. This enables the construction of advanced fragmentation workflows, where a procedure can temporarily “protect” certain atoms from being fragmented by subsequent methods. These protected fragments can later be deleted, allowing the atoms to be reprocessed using a different fragmentation approach.

Consider the system below, which consists of both a phenylalanine molecule and a benzene molecule. The goal is to fragment the phenylalanine into its backbone, one CH2 group, and the phenyl ring, while fragmenting the benzene into six individual CH fragments.

On one hand, by combining the three fragmentation procedures — FragProc AABackbone, Extend, AASCFinegrained — the desired fragmentation of phenylalanine can be achieved. The AABackbone and Extend options identify the aminoacid backbone, while AASCFinegrained separates the CH2 group and the phenyl ring into distinct fragments. Meanwhile, the benzene molecule can be fragmented into six individual CH fragments using FragProc Extlib, which relies on an external library defining the CH fragment. However, these two fragmentation schemes are incompatible. When FragProc AASCFinegrained is applied to the entire system, it fragments not only the phenylalanine side chain but also breaks the benzene ring into a phenyl group and a hydrogen atom. Conversely, if FragProc Extlib is applied first to benzene, it may inadvertently fragment the phenylalanine residue, leading to undesired structural splits.

A viable approach to achieve the desired fragmentation involves first identifying the amino acid using FragProc Aminoacids. This procedure assigns all atoms that belong to amino acid fragments, thereby protecting them from reassignment by subsequent fragmentation steps. The benzene molecule can then be fragmented using an external library via FragProc Extlib. Finally, the phenylalanine fragment is removed using FragProc DelAminoacids, allowing it to be re-fragmented as needed. All these operations are executed by simply concatenating the procedures as follows: FragProc Aminoacids, Extlib, DelAminoacids, AABackbone, Extend, AASCFinegrained

%frag
  PrintLevel 3
  FragProc Aminoacids, Extlib, DelAminoacids, AABackbone, Extend, AASCFinegrained
  XZYFRAGLIB "Mylib.xyz"
  STOREFRAGS true
end

*xyz 0 1
 O     1.300780     3.093320     1.571260
 C     0.407700     2.826000     0.770390
 O    -0.101890     3.606620    -0.030470
 C    -0.115350     1.396690     0.770390
 N    -1.564350     1.396690     0.770390
 H    -1.900780     1.872730     1.595200
 H     0.242580     0.884560     1.663540
 H    -1.901490     0.444630     0.770390
 H    -1.901016     1.873069    -0.054118
 C     0.436090     0.696200    -0.461750
 H     1.512450     0.863150    -0.502770
 H    -0.018950     1.150700    -1.341790
 C     0.178410    -0.790680    -0.515340
 C    -0.895760    -1.288470    -1.262580
 C    -1.134680    -2.667040    -1.312270
 C    -0.299410    -3.547820    -0.614730
 C     0.774760    -3.050040     0.132500
 C     1.013680    -1.671470     0.182200
 H     1.842330    -1.287460     0.758630
 H     1.419110    -3.729500     0.670600
 H    -0.483720    -4.611290    -0.653070
 H    -1.963330    -3.051050    -1.888700
 H    -1.540110    -0.609010    -1.800680
 C     1.181325    -3.732146    -2.620584
 C     2.016592    -4.612923    -1.923046
 C     3.090772    -4.115131    -1.175824
 C     3.329684    -2.736563    -1.126139
 C     2.494417    -1.855785    -1.823677
 C     1.420237    -2.353577    -2.570900
 H     0.770518    -1.668458    -3.113485
 H     0.422252    -4.288369    -3.129823
 H     1.830753    -5.685253    -1.961693
 H     3.740491    -4.800250    -0.633239
 H     4.165243    -2.349352    -0.544907
 H     2.680256    -0.783455    -1.785030
*

Where “Mylib.xyz” in this case is:

2
NAME CH
C 0.00 0.00 0.00
H 1.08 0.00 0.00

The complete fragmentation sequence follows the same structure as in previous examples. In this case, the message Deleted 1 fragments is printed after the fragment assignment performed by Ext_lib, indicating that a fragment previously defined by FragProc Aminoacids has been successfully removed.

===================================================
Tfragmentator: Fragmenting by Amino_Acids
===================================================

------------------------------------------
 Match: 1, Assigned Fragment: 1
------------------------------------------
  Name:  CPHE Method: Amino_Acids
  Natoms: 21 Charge: -1 Mult: 1

 0 O     1.300780    3.093320    1.571260
 1 C     0.407700    2.826000    0.770390
 2 O    -0.101890    3.606620   -0.030470
 3 C    -0.115350    1.396690    0.770390
 4 N    -1.564350    1.396690    0.770390
 5 H    -1.900780    1.872730    1.595200
 6 H     0.242580    0.884560    1.663540
 9 C     0.436090    0.696200   -0.461750
 10 H     1.512450    0.863150   -0.502770
 11 H    -0.018950    1.150700   -1.341790
 12 C     0.178410   -0.790680   -0.515340
 13 C    -0.895760   -1.288470   -1.262580
 14 C    -1.134680   -2.667040   -1.312270
 15 C    -0.299410   -3.547820   -0.614730
 16 C     0.774760   -3.050040    0.132500
 17 C     1.013680   -1.671470    0.182200
 18 H     1.842330   -1.287460    0.758630
 19 H     1.419110   -3.729500    0.670600
 20 H    -0.483720   -4.611290   -0.653070
 21 H    -1.963330   -3.051050   -1.888700
 22 H    -1.540110   -0.609010   -1.800680
------------------------------------------

===================================================
Tfragmentator: Fragmenting by Ext_lib
===================================================
****
**** There are 1 Ref structures found in file Mylib.xyz
****

------------------------------------------
 Match: 1, Assigned Fragment: 2
------------------------------------------
  Name:  CH Method: Ext_lib
  Natoms: 2 Charge: 0 Mult: 1

 23 C     1.181325   -3.732146   -2.620584
 30 H     0.422252   -4.288369   -3.129823
------------------------------------------

------------------------------------------
 Match: 2, Assigned Fragment: 3
------------------------------------------
  Name:  CH Method: Ext_lib
  Natoms: 2 Charge: 0 Mult: 1

 24 C     2.016592   -4.612923   -1.923046
 31 H     1.830753   -5.685253   -1.961693
------------------------------------------

...

------------------------------------------
 Match: 6, Assigned Fragment: 7
------------------------------------------
  Name:  CH Method: Ext_lib
  Natoms: 2 Charge: 0 Mult: 1

 28 C     1.420237   -2.353577   -2.570900
 29 H     0.770518   -1.668458   -3.113485
------------------------------------------

===================================================
Tfragmentator: Deleting Fragments (Amino_Acids)
===================================================
 Deleted 1 fragments
===================================================
Tfragmentator: Fragmenting by Backbone
===================================================

------------------------------------------
 Match: 1, Assigned Fragment: 7
------------------------------------------
  Name: CO-NH3-CH Method: AA_Backbone
  Natoms: 8 Charge: 0 Mult: 0

 0 O     1.300780    3.093320    1.571260
 1 C     0.407700    2.826000    0.770390
 3 C    -0.115350    1.396690    0.770390
 4 N    -1.564350    1.396690    0.770390
 5 H    -1.900780    1.872730    1.595200
 6 H     0.242580    0.884560    1.663540
 7 H    -1.901490    0.444630    0.770390
 8 H    -1.901016    1.873069   -0.054118
------------------------------------------
===================================================
Tfragmentator: Extending Fragments
===================================================
 Extending Fragment 6 with O  (2)

===================================================
Tfragmentator: Fragmenting by AA_SideChains_FG
===================================================

------------------------------------------
 Match: 1, Assigned Fragment: 8
------------------------------------------
  Name: Ph Method: AA_SideChains_FG
  Natoms: 11 Charge: 0 Mult: 1

 12 C     0.178410   -0.790680   -0.515340
 13 C    -0.895760   -1.288470   -1.262580
 14 C    -1.134680   -2.667040   -1.312270
 15 C    -0.299410   -3.547820   -0.614730
 16 C     0.774760   -3.050040    0.132500
 17 C     1.013680   -1.671470    0.182200
 18 H     1.842330   -1.287460    0.758630
 19 H     1.419110   -3.729500    0.670600
 20 H    -0.483720   -4.611290   -0.653070
 21 H    -1.963330   -3.051050   -1.888700
 22 H    -1.540110   -0.609010   -1.800680
------------------------------------------

------------------------------------------
 Match: 1, Assigned Fragment: 9
------------------------------------------
  Name: CH2 Method: AA_SideChains_FG
  Natoms: 3 Charge: 0 Mult: 1

 9 C     0.436090    0.696200   -0.461750
 10 H     1.512450    0.863150   -0.502770
 11 H    -0.018950    1.150700   -1.341790
------------------------------------------

Table Table 2.61 lists the corresponding delete procedure associated with each fragmentation method.

Table 2.61 Simple input keywords for Fragment detection and their corresponding deletion procedure

Fragment detection Keyword

Fragment deletion Keyword

Extlib

DELExtlib

Connectivity

DELConnectivity

Atomic

DELAtomic

FunctionalGroups

DELFunctionalGroups

NotAssigned

Backbone

DELBackbone

SeqBackbone

DELSeqBackbone

AABackbone

DELAABackbone

Aminoacids

DELAminoacids

AASideChains

DELAASideChains

AASCFineGrained

DELAASCFineGrained

NABackbone

DELNABackbone

NABBFineGrained

DELNABBFineGrained

SEQNABackbone

DELSEQNABackbone

NucleoticAcid

DELNucleoticAcid

NASideChains

DELNASideChains

Solvents

DELSolvents

Water

DELWater

Extend

FuseByAtoms

2.18.3. Options available in the %frag input block

Table Table 2.62 contains a list of the options available in the %frag input block.

Table 2.62 List of options in the %frag input block

Option

Type

Default

Description

Printlevel

Integer

1

Verbose output control for automated fragmentation.

STOREFRAGS

Boolean

False

Stores assigned fragments in a .fragments.xyz file.

DoInterFragBonds

Boolean

False

Automatically detects bonds between fragments for CoVaLED analysis.

XZYFRAGLIB

String

None

Filenames used in FragProc Extlib.

FragProc

See Table 2.60, and Table 2.61

ExtLib, Connectivity

Fragmentation procedures to be applied automatically.

Usetopology

Boolean

False

Generate main geometry graph based on .prms file.

TopolFile

String

""

Topology file name to been used when Usetopology True

PrintInputFlags

Boolean

True

Writes a %frag block equivalet to current calculation fragments.