9.2. Input Formats#
Project/package name:
dpa-adaptPython import:dpa_adaptMain CLI:dpa-adaptOptional short alias:dpaadDisplay name: DPA-ADAPT — Atomistic DPA Adaptation for Property Tasks
dpa-adapt data convert and the Python dpa_adapt.convert() helper auto-detect the input type and route it to the correct pipeline: SMILES table → RDKit 3D conformer generation, structure files → dpdata (auto-detect or explicit --fmt).
9.2.1. SMILES Tables (CSV)#
Trigger: file extension .csv and a SMILES column. By default, the converter reads SMILES/smiles; use --smiles-col for other column names such as smi or mol. Or pass --fmt smiles explicitly.
Parameter | Default | Description |
|---|---|---|
|
| Column name for SMILES strings |
|
| Input table column to read target values from; also used as the output label name |
|
| Fraction of samples used for training set |
| — | Directory of pre-generated |
|
| Filename template under |
|
| Random seed for train/valid splitting |
|
| Random seed for RDKit 3D conformer generation |
# Auto-detected via SMILES column
dpa-adapt data convert --input molecules.csv --output ./npy \
--property-col homo
# Short alias
dpaad data convert --input molecules.csv --output ./npy \
--property-col homo
# Explicit fmt + custom column names
dpa-adapt data convert --input data.csv --output ./npy --fmt smiles \
--smiles-col smi --property-col GAP --train-ratio 0.85 \
--split-seed 42 --conformer-seed 43
# Short alias
dpaad data convert --input data.csv --output ./npy --fmt smiles \
--smiles-col smi --property-col GAP --train-ratio 0.85 \
--split-seed 42 --conformer-seed 43
9.2.2. Structure Files via dpdata#
Trigger: inputs not routed to the SMILES pipeline. This means --fmt is not smiles; when --fmt is omitted, CSV inputs are routed here only if they do not contain a recognized SMILES column. Calls dpdata for format auto-detection or explicit conversion.
9.2.2.1. Common Formats#
| Typical file(s) | Notes |
|---|---|---|
|
| Plain XYZ |
|
| VASP input/final structure |
|
| VASP output (energies, forces, stress) |
|
| VASP XML output |
| VASP structure string | VASP structure from a string |
|
| ABACUS input structure |
| SCF output | ABACUS SCF calculation |
| MD output | ABACUS molecular dynamics |
| Relax output | ABACUS relaxation |
| CP2K MD output | CP2K AIMD output file |
| CP2K SCF output | CP2K single-point output |
|
| DeePMD-kit raw format |
|
| DeePMD-kit compressed/npy format |
| mixed | DeePMD-kit mixed npy format |
|
| DeePMD-kit HDF5 format |
|
| LAMMPS dump trajectory |
|
| LAMMPS data file |
| CP trajectory | Quantum ESPRESSO Car-Parrinello MD |
| PWscf output | Quantum ESPRESSO PWscf |
| Siesta output | SIESTA SCF output |
| Siesta MD output | SIESTA AIMD output |
|
| Gaussian log file |
|
| Gaussian formatted checkpoint |
| Gaussian MD output | Gaussian MD trajectory |
|
| Gaussian input file |
| Amber MD output | Amber MD trajectory |
|
| GROMACS coordinate file |
|
| PWmat output / movement / MLMD |
|
| PWmat final/input structure |
| FHI-aims output/MD | FHI-aims calculation or MD trajectory |
| FHI-aims SCF output | FHI-aims SCF |
| Psi4 output | Psi4 calculation output |
| Psi4 input | Psi4 input file |
| ORCA output | ORCA single-point output |
| SQM output | SQM output |
| SQM input | SQM input |
| OpenMX MD output | OpenMX MD trajectory |
| n2p2 output | n2p2/NNPack output |
| DFTB+ output | DFTB+ detailed.xml |
|
| MDL Molfile |
|
| MDL SDFile |
| Any ASE format | ASE structure (single frame) |
| Any ASE trajectory | ASE trajectory (multi-frame) |
| pymatgen objects | pymatgen Structure |
| pymatgen objects | pymatgen Molecule |
| pymatgen objects | pymatgen ComputedStructureEntry |
| LMDB dir | DeePMD-kit LMDB format |
| List-format dir | List of system directories |
| 3Dmol format | 3Dmol.js format |
You can omit --fmt and let dpdata infer the input format from the file name or content. For example, files named POSCAR, OUTCAR, or *.xyz are often recognized automatically. Use --fmt when the file name is ambiguous or auto-detection fails.
9.2.2.2. Single file#
dpa-adapt data convert --input POSCAR --output ./npy
dpaad data convert --input POSCAR --output ./npy
dpa-adapt data convert --input OUTCAR --output ./npy --fmt vasp/outcar
dpaad data convert --input OUTCAR --output ./npy --fmt vasp/outcar
dpa-adapt data convert --input traj.xyz --output ./npy --fmt xyz
dpaad data convert --input traj.xyz --output ./npy --fmt xyz
9.2.2.3. Glob patterns#
When --input contains wildcards (*, ?, [), conversion uses mirrored batch output:
1 or more matches → each matched file is converted into an output directory that mirrors its path relative to the non-wildcard prefix.
0 matches →
FileNotFoundError.A
manifest.jsonis written into the output root, recording converted and skipped files.
# Glob output mirrors the input tree under ./npy_root
dpa-adapt data convert --input "calcs/**/OUTCAR" --output ./npy_root --fmt vasp/outcar
dpaad data convert --input "calcs/**/OUTCAR" --output ./npy_root --fmt vasp/outcar
For example, calcs/run1/OUTCAR is written as npy_root/run1/OUTCAR/. When --strict is set, the first conversion error fails immediately. Without it, errors are skipped and logged in the manifest.