InertialAR: Autoregressive 3D Molecule Generation with Inertial Frames arXiv 2025

  • 1 The Chinese University of Hong Kong
  • 2 Alibaba DAMO Academy
  • 3 Shanghai Artificial Intelligence Laboratory
  • 4 University of Ottawa

Unconditional Generation

Unconditional generation (mol 4)
Unconditional generation (mol 14)
Unconditional generation (mol 16)

Conditional Generation

Conditional generation targeting molecules that contain a benzene ring and an amide functional group.

Conditional generation (mol 7)
Conditional generation (mol 13)
Conditional generation (mol 14)

Model Architecture

Model architecture

Abstract

Transformer-based autoregressive models have emerged as a unifying paradigm across modalities such as text and images, but their extension to 3D molecule generation remains underexplored. The gap stems from two fundamental challenges: (1) tokenizing molecules into a canonical 1D sequence of tokens that is invariant to both SE(3) transformations and atom index permutations, and (2) designing an architecture capable of modeling hybrid atom-based tokens that couple discrete atom types with continuous 3D coordinates. To address these challenges, we introduce InertialAR. InertialAR devises a canonical tokenization that aligns molecules to their inertial frames and reorders atoms to ensure SE(3) and permutation invariance. Moreover, InertialAR equips the attention mechanism with geometric awareness via geometric rotary positional encoding (GeoRoPE). In addition, it utilizes a hierarchical autoregressive paradigm to predict the next atom-based token, predicting the atom type first and then its 3D coordinates via Diffusion loss. Experimentally, InertialAR achieves state-of-the-art performance on 7 of the 10 evaluation metrics for unconditional molecule generation across QM9, GEOM-Drugs, and B3LYP. Moreover, it significantly outperforms strong baselines in controllable generation for targeted chemical functionality, attaining state-of-the-art results across all 5 metrics.

Unconditional Generation Results

Unconditional generation results

Conditional Generation: CFG Tuning Visualization

CFG tuning visualization

Conditional Generation Results

Conditional generation results

Citation