View on GitHub

Fock Matrix Diagonalization¶

This tutorial demonstrates how to extract the Fock Matrix from an ORCA calculation using the ORCA python interface (OPI). After extraction the Fock matrix is diagonalized and Koopmans' theorem is used to estimate the ionization energy.

In this notebook we will:

Import the Required Dependencies
Define a Working Directory and Prepare Structures
Download and Prepare Molecular Structure from PubChem
Run ORCA Calculations
Extract Matrices and Perform Fock Matrix Diagonalization
Visualize the HOMO

Step 1: Import Dependencies¶

We begin by importing all required Python modules for this tutorial. These include:

File and directory handling: for managing input/output files.
JSON parsing: for reading ORCA-generated GBW JSON files.
Numerical libraries: for matrix operations, diagonalization, and eigenvalue problems.
PubChem data retrieval: for downloading molecular structures directly from PubChem using REST APIs.
RDKit: for molecule handling, conversion, and generating 3D geometries.
ORCA Python Interface (OPI): for setting up, running, and parsing ORCA quantum chemistry calculations.
py3Dmol: for interactive 3D visualization of molecular structures directly in the notebook.

Note: We additionally import modules for visualization/plotting like py3Dmol. For this, it might be necessary to install py3Dmol into your OPI venv (e.g., by activating the .venv and using uv pip install py3Dmol).

# > Import pathlib for directory handling
from pathlib import Path
import shutil

# > Import library for parsing and handling JSON data
import json

# > Import libraries for data handling and numerical operations
import numpy as np
from scipy.linalg import eigh

# > Import libraries for retrieving molecular data from PubChem
import requests
from rdkit import Chem
from rdkit.Chem import AllChem

# > OPI imports for performing ORCA calculations and reading output
from opi.core import Calculator
from opi.input.structures.structure import Structure
from opi.input.simple_keywords import BasisSet, Scf
from opi.input.simple_keywords.method import Method

# > Import py3Dmol for 3D molecular visualization
import py3Dmol

Step 2: Define Working Directory¶

All actual calculations will be performed in a subfolder RUN.

# > Calculation is performed in `RUN`
working_dir = Path("RUN")
# > The `working_dir`is automatically (re-)created
shutil.rmtree(working_dir, ignore_errors=True)
working_dir.mkdir()
# > Conversion factor for atomic units to eV
unit_conversion = 27.211386

Step 3: Download and Prepare Molecular Structure from PubChem¶

In this step, we retrieve the molecular structure of the target molecule directly from PubChem using its REST API.

# > Download molecule data from PubChem
mol_name = "Benzene"
cid_url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/{mol_name}/cids/JSON"
cid_response = requests.get(cid_url)
cid_response.raise_for_status()
cid = cid_response.json()['IdentifierList']['CID'][0]

sdf_url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/record/SDF/?record_type=3d"
sdf_response = requests.get(sdf_url)
sdf_response.raise_for_status()
sdf_data = sdf_response.text

# > Parse SDF and generate 3D conformer with RDKit
mol = Chem.MolFromMolBlock(sdf_data)
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol, AllChem.ETKDG())
conf = mol.GetConformer()

# > Convert to XYZ format
num_atoms = mol.GetNumAtoms()
lines = [str(num_atoms), f"{mol_name} from PubChem"]
for atom in mol.GetAtoms():
    pos = conf.GetAtomPosition(atom.GetIdx())
    symbol = atom.GetSymbol()
    lines.append(f"{symbol} {pos.x:.6f} {pos.y:.6f} {pos.z:.6f}")

xyz_data = "\n".join(lines)

# > Visualize the molecular structure using py3Dmol
view = py3Dmol.view(width=500, height=500)
view.addModel(xyz_data, "xyz")
view.setStyle({'stick': {'radius': 0.1}, 'sphere': {'scale': 0.2}})
view.setBackgroundColor('white')
view.zoomTo()
view.show()

# Save the XYZ structure to a file
with open(working_dir / "struc.xyz","w") as f:
    f.write(xyz_data)

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

Step 4: ORCA Calculations¶

Now, we perform a HF/STO-3G calculation. We set up a config_dict and use it to request integrals.

# > Set up path for the structure
xyz_file = working_dir / "struc.xyz"

# > Define GBW JSON configuration
config_dict = {
    "MOCoefficients": True,
    "1elPropertyIntegrals": ["dipole"],
    "1elIntegrals": ["H", "S"],
    "FockMatrix": ["F", "J", "K"],
    "JSONFormats": ["json"]
}

# > Create a Calculator object for ORCA input generation and execution
calc = Calculator(basename="benzene", working_dir=working_dir)

# > Load the molecular structure from XYZ file
structure = Structure.from_xyz(xyz_file)
calc.structure = structure
# > Set the charge
calc.structure.charge = 0
# > Set the multiplicity 
# > Note that only 1 is allowed here
calc.structure.multiplicity = 1

# > Add calculation keywords
calc.input.add_simple_keywords(
            Method.HF,
            BasisSet.STO_3G,
            Scf.TIGHTSCF
)

# > Write the ORCA input file
calc.write_input()
# > Run the ORCA calculation
print("Running ORCA calculation ...", end="")
calc.run()
print("   Done")

# > Get output and use it to create the gbw json output with config
output = calc.get_output()
status = output.terminated_normally()
if status: 
    # > Use config dict to generate integral data
    output.create_gbw_json(force=True,config=config_dict)
    output.parse()
else:
    raise RuntimeError("ORCA did not terminate normally.")

molecule_gbw = output.results_gbw.molecule

Running ORCA calculation ...   Done

Step 5: Extract Matrices and Perform Fock Matrix Diagonalization¶

# Extract integrals from the JSON file:
h = np.array(molecule_gbw.h_matrix)
f = np.array(molecule_gbw.f_matrix[0])
s = np.array(molecule_gbw.s_matrix)

# Build the full AO Fock matrix by adding core Hamiltonian and Fock correction
fao = h + f
# Perform generalized eigenvalue problem
eigenvalues, _ = eigh(fao, s)
# Convert orbital energies from Hartree to eV
eigenvalues_ev = eigenvalues * unit_conversion

# > ID of the highest occupied molecular orbital (HOMO)
homo_id = int(output.get_nelectrons()[0] / 2) - 1

# Estimate ionization energy using Koopmans' theorem
print("Ionization energy (eV):", - eigenvalues_ev[homo_id])

Ionization energy (eV): 7.601747747143446

Step 6: Visualize the HOMO¶

Now we visualize the HOMO (highest occupied orbital):

# > Obtain cube file of HOMO with small resolution 
# > to keep the size of this notebook small
cube_output = output.plot_mo(homo_id,resolution=30)
cube_data = cube_output.cube

view = py3Dmol.view(width=500, height=500)
view.addModel(xyz_data, "xyz")
view.setStyle({'stick': {'radius': 0.1}, 'sphere': {'scale': 0.2}})
view.addVolumetricData(cube_data, "cube", {"isoval": 0.005, "color": "blue", "opacity": 0.8})
view.addVolumetricData(cube_data, "cube", {"isoval": -0.005, "color": "red", "opacity": 0.8})
view.zoomTo()
view.show()

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

It can be seen that the HOMO consists of the \(\pi\)-system where the electron would be removed in an ionization process.

Summary¶

In this notebook we demonstrated how to utilize the ORCA Python Interface (OPI) to obtain Integrals from an ORCA calculation. We used this functionality to obtain the Fock matrix of a benzene molecule, diagonalized it with eigh, and used Koopmans' theorem to estimate the ionization energy. The corresponding HOMO was visualized with py3Dmol.