2.8. Resolution-of-the-Identity (RI)¶
A very efficient and well-established way to speed up DFT calculations are resolution-of-the-idendity (RI) techniques also know as density-fitting.[111, 112, 113, 114, 115, 116, 117] Various variants to address the Coulomb part and Hartree-Fock exchange parts (cf. Hartree-Fock and Hybrid DFT). Available options available in ORCA include RI-J, Split-RI-J, RIJONX, RI-JK, and RIJCOSX.
Some Notes on RI in ORCA
Any RI approximation requires the choice of a sufficiently large auxiliary basis set.
Split-RI-J using the
def2/J
auxiliary basis is the default for non-hybrid DFT.RIJCOSX using the
def2/J
auxiliary basis is the default for hybrid DFT.When scalar relativistic Hamiltonians are used with all-electron basis sets, then
SARC/J
can be chosen as ageneral-purpose auxiliary basis set. Other choices are documented in the basis sets section.The usage of RI is generally recommended as the introduced error is very small. The speedup further enables the usage of larger basis sets for large systems which will improve the results significantly.
The default usage of RI approximations can be disabled by the
!NORI
keyword (not recommended!).If you do not want to depend on the RI approximation, a reasonable yet inconvenient approach is to converge a RI-J calculation and then take the resulting orbitals as initial guess for a calculation with exact Coulomb term. This should converge within a few cycles and the total execution time should still be lower than just converging the calculation directly with exact Coulomb treatment.
2.8.1. RI-J¶
Note
This is the default for non-hybrid DFT! Can be turned off by using !NORI
.
A very useful approximation that greatly speeds up DFT calculations unless the molecule gets very large is the so called “RI-approximation”
RI stands for “Resolution of the identity”. In short, charge distributions arising from products of basis functions are approximated by a linear combination of auxiliary basis functions.
There are a variety of different possibilities to determine the expansion coefficients \(c_{k}^{ij}\). A while ago, Almlöf and coworkers [118] have shown that for the approximation of electron repulsion integrals, the best choice is to minimize the residual repulsion.
Note
The basic theory behind the RI method has been known for a long time and since at least the late sixties, methods similar to the RI approximation have been used — mainly in the context of “approximate ab initio methods” such as LEDO, PDDO, and MADO, but also in density functional theory in the mid and late seventies by Baerends, Dunlap, and others [111, 112, 113, 114]
Define:
and
Determining the coefficients that minimize \(T_{ij}\) leads to
where:
Thus, an ordinary two-electron integral becomes
and the total Coulomb energy becomes
where \(\mathrm{\mathbf{P} }\) is the total density matrix.
In a similar way, the Coulomb contribution to the Kohn-Sham matrix is calculated. There are substantial advantages from this approximation: the quantities to be stored are the matrix \(\mathrm{\mathbf{V} }^{-1}\) — which depends only on two indices — and the three-index auxiliary integrals \(t_{r}^{ij}\). This leads to a tremendous reduction of storage requirements compared to a four-index list of repulsion integrals. In addition, the two- and three-index electron repulsion integrals are easier to compute than the four-index integrals, leading to further reductions in processing time. Furthermore, the Coulomb energy and the Kohn-Sham matrix contributions can be quickly assembled by simple vector/matrix operations, leading to large time savings. This arises because each auxiliary basis function \(\eta_{k} \left({ \vec{{r} }} \right)\) appears in the expansion of many charge distributions \(\phi_{i} \left({\vec{{r} }} \right)\phi_{j} \left({ \vec{{r} }} \right)\). Unfortunately, a similar strategy is less easily applied (or, at least, with less benefit) to the Hartree-Fock exchange term.
If the auxiliary basis set \(\left\{\eta \right\}\) is large enough, the approximation is also highly accurate. Since any DFT procedure already has a certain, sometimes sizable, error from the noise in the numerical integration of the XC part, it might be argued that a similarly large error in the Coulomb part is perfectly acceptable without affecting the overall accuracy of the calculation much. Furthermore, the errors introduced by the RI method are usually much smaller than the errors in the calculation due to basis set incompleteness. It is therefore recommended to use the RI procedure for pure DFs. However, one should probably not directly mix absolute total energies obtained from RI and non-RI calculations as the error in the total energy accumulates and will rise with increased molecular size, while the errors in the relative energies will tend to cancel.
There are several choices for auxiliary basis sets described in the next section, which depend on the choice of the primary GTO basis set used to expand the molecular orbitals [1]. The RI procedure requires the inversion of the auxiliary basis metric \(\mathbf{V}\), which is often the most expensive matrix operation in the calculation, due to the O(N\(^{3}\)) scaling and the large size of the auxiliary basis. However, in ORCA this is done via an efficient Cholesky decomposition, which is only performed once during the startup phase of a single point caclulation. Hence, this step is usually of no concern in practice.
In ORCA, the RI approximation is toggled by the input
%method
RI on # do use the RI-J approximation
off # do not use the RI-J approximation
end
Note
If you use RI, you must specify an auxiliary basis set (in the
%basis
section or using the appropriate simple keyword) or use
the !AutoAux
simple keyword.
2.8.2. Split-RI-J¶
There is an improved version of the RI-algorithm that has been implemented since ORCA 2.2.09. This Split-RI-J algorithm yields the same Coulomb energy as the standard RI-algorithm, but is significantly faster if the basis set contains many high angular momentum functions (d-, f-, g-functions). For small basis sets, there is virtually no difference between the two algorithms, except that Split-RI-J uses more memory than standard RI. However, calculations with ca. 2000 basis functions only need about an extra 13 MB for Split-RI-J, which is a trivial requirement on present-day hardware.
The Split-RI-J algorithm is invoked with the !Split-RI-J
simple keyword.
Split-RI-J is presently only available for SCF and gradient
calculations.
Note
The Split-RI-J algorithm is the default if RI is turned on via
!RI
. If you do not want to use Split-RI-J, please also use the keyword!NoSplit-RI-J
2.8.3. RIJONX¶
Alternatively, the RI method can be used for the
Coulomb term and the standard treatment for the exchange term. This
method is called RIJONX since the exchange term should tend towards
linear scaling for large molecules. RIJONX can be invoked via the !RIJONX
keyword for
Hartree-Fock and hybrid DFT calculations.
! HF RIJONX
The requirements for the auxiliary basis are the same as for the normal RI-J method.
2.8.4. RIJCOSX¶
Frustrated by the large difference in execution times between pure and hybrid functionals, we have been motivated to study approximations to the Hartree-Fock exchange term. The method that we have finally come up with is called the “chain of spheres” COSX approximation and may be thought of as a variant of the pseudo-spectral philosophy. Essentially, in performing two electron integrals, the first integration is done numerically on a grid and the second (involving the Coulomb singularity) is done analytically. For algorithmic and theoretical details see Refs. [119] and [120]. Upon combining this treatment with the Split-RI-J method for the Coulomb term (thus, a Coulomb fitting basis is needed!), we have designed the RIJCOSX approximation that can be used to accelerate Hartree-Fock and hybrid DFT calculations. Note that this introduces another grid on top of the DFT integration grid which is usually significantly smaller.
Note
Since ORCA 5, RIJCOSX is the default option for hybrid DFT (can be
turned off by using !NOCOSX
). However, it is by default NOT turned on
for HF.
In particular for large and accurate basis sets, the speedups obtained in this way are very large - we have observed up to a factor of sixty! The procedure is essentially linear scaling such that large and accurate calculations become possible with high efficiency. The RIJCOSX approximation is basically available throughout the program. The default errors are on the order of 0.05 \(\pm\) 0.1 kcal mol\(^{-1}\) or less in the total energies as well as in energy differences and can be made smaller with larger than the default grids or by running the final SCF cycle without this approximation. The impact on bond distances is a fraction of a pm, angles are better than a few tenth of a degree and soft dihedral angles are good to about 1 degree. To the limited extent to which it has been tested, vibrational frequencies are roughly good to 0.1 wavenumbers with the default settings.
The aim of this approximation is to efficiently compute the elements of exchange-type matrices as described in refs. [119, 120]:
where \(\mathrm{\mathbf{P} }\) is some kind of density-type matrix (not necessarily symmetric) and the two-electron integrals are defined over the basis set \(\{\varphi \}\) by:
The approximation pursued here can be written as follows:
Here, the index \(g\) refers to grid points \(\mathrm{\mathbf{r} }_{g}\) and:
where \(w_{g}\) denotes the grid weights. Thus, the first integration is carried out numerically and the second one analytically. Note that this destroys the Hermitian character of the two-electron integrals.
Equation (2.12) is perhaps best evaluated in three steps:
As such, the equations are very similar to the pseudo-spectral method extensively developed and discussed by Friesner and co-workers since the mid 1980s and commercially available in the Jaguar quantum chemistry package. The main difference at this point is that instead of \(X_{\kappa g}\) there appears a least-square fitting operator \(Q_{\kappa g}\) in Friesner’s formulation. Note that an analogue of the fitting procedure has also been implemented in ORCA and — in contrast to Friesner’s pseudo-spectral method — does not need specially optimized grids. The basic idea is to remove the grid errors within the basis set by “fitting” the numerical overlap to the analytical one. Due to its nature, overlap fitting is supposed to work better with larger basis sets.
2.8.4.1. Basic Usage¶
The RIJCOSX approximation can be invoked by the !RIJCOSX
keyword:
! HF RIJCOSX
By default, RIJCOSX will make use of the def2/J
auxiliary basis set.
Note that as the requirements for the SCF and correlation fitting bases are
quite different an correlation auxiliary basis set has to be defined additionally.
This is e.g. the case for RI-MP2 or double-hybrid DFT.
! RI-MP2 def2-TZVPP def2/J def2-TZVPP/C RIJCOSX
Note
This is the default for hybrid DFT! The COSX part can be turned off by using the !NOCOSX
keyword.
2.8.4.2. RIJCOSX Gradient¶
Given the exchange matrix, the exchange energy is given by (a sum over spin cases is left out here for simplicity):
Previous to ORCA6, the gradient of the COSX contribution to the energy was taken as an approximation:
with
as published in the original implementation paper [119].
Starting from ORCA6, this was updated to the full derivative of the energy component, including the derivative of all terms: grid weight derivatives, derivative of the SFitting matrices and all derivatives of \(F\) and \(G\) [2]. The gradient is thus now more accurate and less noisy. In case one wants to revert to the previous approximated version (not recommended), just set:
%method
cosxgradtype grad_n
useqgradfit false
end
2.8.4.3. COSX Numerical Grids¶
For expert users, the grid parameters for the exchange grids can be even more finely controlled:
%method
IntAccX Acc1, Acc2, Acc3
GridX Ang1, Ang2, Ang3
end
There are three grids involved: the smallest grid (Acc1, Ang1) that is
used for the initial SCF iterations, the medium grid (Acc2, Ang2) that
is used until the end of the SCF and the largest grid (Acc3, Ang3) that
is used for the final energy and the gradient evaluations. UseFinalGridX
can be used to turn this last grid on or off, though changing this is not
generally recommended. More details about the grid constructions can be
found in Numerical Integration.
2.8.4.4. SFitting Parameters¶
To modify the overlap fitting parameters, the following input can be specified:
%method
UseSFitting false # Same as NoSFitting in the simple input
# (Default is true)
UseQGradFit false # Uses the SCF fitting matrix for gradient calculations
# (Default is true)
end
Note that overlap fitting also works for HF and MP2 gradients without
specifying any additional keyword. The UseQGradFit
parameter uses
the same fitting matrix for the gradients as for the energy calculation and
is the default behavior since ORCA6.
2.8.4.5. Partial Contraction Scheme¶
Since ORCA 5.0, generally-contracted basis sets can be handled efficiently by using an intermediate partially contracted (pc) atomic-orbital basis for the exchange-matrix computation without affecting the results [120]. Depending on the basis set and element type, computational speedups by many orders of magnitude are possible. For testing or benchmark purposes, the K matrix computation can be done in the original basis by using the flag
%method
COSX_PartialContraction false # No intermediate basis for generally contracted shells
# (Default is true)
end
2.8.4.6. Restoring Full Symmetry¶
The semi-numerical integration scheme in the default COSX algorithm breaks the permutational symmetry of the two-electron integrals. We have observed that this flaw is often the cause of convergence problems for iterative algorithms, in particular for multi-reference theories [121]. An input option is available since ORCA 6.0 to preserve the full eight-fold permutational symmetry of the two-electron integrals:
%method
COSX_IntSymmetry Full # Fully symmetrized integrals
Standard # Original COSX algorithm
end
The full-symmetrization algorithm often improves the numerical stability and is enabled by default for TRAH-CASSCF and CASSCF linear-response calculations. However, the full-symmetrization algorithm may come with additional costs that depend on the number of \(\mathbf{F}\) intermediates. The number of \(\mathbf{F}\) intermediates depends on the symmetry of the density matrix (symmetric (S), anti-symmetric (A), and non-symmetric (N)) and whether overlap fitting is employed, as summarized in the table below.
S fitting |
P symmetry (S, A, N) |
Number of \(\mathbf{F}\) intermediates |
---|---|---|
No |
S / A |
1 |
No |
N |
2 |
Yes |
S / A |
2 |
Yes |
N |
4 |
Note that for symmetric (S) and anti-symmetric (A) densities, we symmetrize and anti-symmetrize, respectively, exchange matrices at the end, which reduces the number of \(\mathbf{F}\) intermediates by a factor of 2. The actual computational costs usually do not increase linearly with the number of \(\mathbf{F}\) intermediates since we compute the costly analytic integrals (\(A_{\nu\tau}(\mathbf{r}_g)\)) only once and then contract them with the additional \(\mathbf{F}\) intermediates. From our experience the overhead of the full-symmetrization algorithm is roughly between 1.2 and 1.5 times that of the original algorithm.
2.8.4.7. COSX Grid and Convergence Issues¶
Symptoms of convergence issues: Erratic convergence behavior, often starting from the first SCF step or possibly setting in at a later stage, which produce crazy energy values with “megahartree” jumps. If overlap fitting is on, the following error message may also be encountered: “Error in Cholesky inversion of numerical overlap”.
Convergence issues may arise when the chosen grid has difficulties in representing the basis set. This is the “grid equivalent” of a linear dependence problem, as discussed in Linear Dependence. It should also be mentioned that the grid-related problem discussed here often goes hand in hand with basis set linear dependence, although not necessarily. The most straightforward way of dealing with this is to increase the size of the integration grid. This, however, is not always desirable or possible.
One way to avoid the Cholesky inversion issue is to turn overlap fitting
off (!NoSFitting
). However, this means that the numerical problems are
still present, but are ignored. Due to the fact that overlap
fitting operates with the numerical overlap and its inverse, it is more
sensitive to linear dependence issues, so turning off the fitting
procedure may lead to convergence. This may be a pragmatic — but by no
means clean — solution, since it relies on the assumption that the
numerical errors are small.
On the other hand, overlap fitting also gives a similar tool to deal with linear dependence issues as the one discussed for basis sets. The eigenvalues of the numerical overlap can be similarly inspected and small values screened out. There is unfortunately no universal way to determine this screening parameter, but see Linear Dependence for typical values.
The parameters influencing the method used for inversion and obtaining the fitting matrix are:
%method
SFitInvertType Cholesky # Cholesky inversion (default)
Cholesky_Q # Cholesky + full Q matrix
Diag # Inversion via diagonalization
Diag_Q # Diag + full Q matrix
SInvThresh 1e-8 # Inversion threshold for Diag and Diag_Q (default 1e-8)
end
By default, the inversion procedure proceeds through Cholesky
decomposition as it is the fastest option. Ideally, the overlap matrix is
non-singular if the basis set is not linearly dependent. For
singular matrices, the Cholesky procedure will fail. It should be noted
at this point that the numerical overlap can become linearly dependent
even if the overlap of basis functions is not, and so a separate
parameter will be needed to take care of grid-related issues. To achieve
this, a diagonalization procedure (Diag
) can be used instead of
Cholesky with the corresponding parameter to screen out eigenvectors
belonging to eigenvalues below a threshold (SInvThresh
). For both
Cholesky and diagonalization procedures, a “full Q” approach is also
available (Cholesky_Q
and Diag_Q
), which corresponds to the use of a
more accurate (untruncated) fitting matrix.
2.8.5. RI-JK¶
An alternative algorithm for accelerating the HF exchange in hybrid DFT or HF calculations is RI-JK developed by Kendall and Früchtl[115]. It will use the RI approximation for both Coulomb and exchange and allows for a fair approximation to the Hartree-Fock exchange matrix. It can be difficult to make this approximation highly accurate. It is, however, usefully fast compared to direct SCF if the molecule is “dense” enough. There are special auxiliary basis sets for this purpose (see section Basis Sets). RI-JK is implemented in ORCA for SCF single point energies but not for gradients. The speedups for small molecules are better than for RIJCOSX, for medium sized molecules (e.g. \((\text{gly})_{4}\)) similar, and for larger molecules RI-JK is less efficient than RIJCOSX. A detailed comparison can be found in Section 2.8.6. The errors of RI-JK are usually below 1 mEh and the error is very smooth (smoother than for RIJCOSX). Hence, for small calculations with large basis sets, RI-JK is a good idea, for large calculations on large molecules RIJCOSX is better.
Note
RI-JK requires a larger auxiliary basis set. For the Karlsruhe basis set, the universal
def2/JK
anddef2/JKsmall
basis sets are available. They are large and accurate.For UHF RI-JK is roughly twice as expensive as for RHF. This is not true for RIJCOSX.
RI-JK is available for conventional and direct runs and also for ANO bases. There the conventional mode is recommended.
2.8.5.1. Basic Usage¶
The RI-JK approximation can be invoked by the !RI-JK
keyword:
! RHF def2/JK RI-JK
2.8.6. Speed Comparison of RIJCOSX and RI-JK¶
A comparison of the RIJCOSX and RI-JK methods is shown in Table 2.43.
def2-SVP |
def2-TZVP(-df) |
def2-TZVPP |
def2-QZVPP |
||
---|---|---|---|---|---|
\(\text{(gly)}_{2}\) |
Default |
105 |
319 |
2574 |
27856 |
RI-JK |
44 |
71 |
326 |
3072 |
|
RIJCOSX |
70 |
122 |
527 |
3659 |
|
\(\text{(gly)}_{4}\) |
Default |
609 |
1917 |
13965 |
161047 |
RI-JK |
333 |
678 |
2746 |
30398 |
|
RIJCOSX |
281 |
569 |
2414 |
15383 |
|
\(\text{(gly)}_{8}\) |
Default |
3317 |
12505 |
82774 |
|
RI-JK |
3431 |
5452 |
16586 |
117795 |
|
RIJCOSX |
1156 |
2219 |
8558 |
56505 |
It is obvious from the data that for small molecules the RI-JK approximation is the most efficient choice. For \(\text{(gly)}_{4}\) this is already no longer obvious. For up to the def2-TZVPP basis set, RI-JK and RIJCOSX are almost identical and for def2-QZVPP RIJCOSX is already a factor of two faster than RI-JK. For large molecules like \(\text{(gly)}_{8}\) with small basis sets RI-JK is not a big improvement but for large basis set it still beats the normal 4-index calculation. RIJCOSX on the other hand is consistently faster. It leads to speedups of around 10 for def2-TZVPP and up to 50-60 for def2-QZVPP. Here it outperforms RI-JK by, again, a factor of two.
2.8.7. Keywords¶
Keyword |
Description |
---|---|
|
Activates RI-J (uses Split-RI-J by default) |
|
Deactivates RI treaments |
|
Activates Split-RI-J |
|
Activates Split-RI-J |
|
Activates RIJONX |
|
Activates RIJCOSX |
|
Deactivates RIJCOSX, will activate RIJONX instead |
|
Activates RI-JK |
|
Deactivates SFitting |
Keyword |
Options |
Description |
---|---|---|
|
|
Activates/deactivates RI (default: Split-RI-J) |
|
|
Control exchange grid accuracy parameters (cf. Numerical Integration) |
|
|
Controls exchange grid parameters (cf. Numerical Integration) |
|
|
Activates SFitting (default: true) |
|
|
Uses Cholesky inversion (default) |
|
Uses Cholesky + full Q matrix |
|
|
Activates inversion via diagonalization |
|
|
Activates diagonalization + full Q matrix |
|
|
|
Inversion threshold for Diag and Diag_Q (default: 1e-8) |
|
|
Activates SCF fitting matrix for gradient calculations (default: true) |
|
|
Activates intermediate basis for generally contracted shells (default: true) |
|
|
Use fully symmetrized integrals |
|
Use original COSX algorithm |