Skip to content

API reference

Below is the auto-generated documentation for the public modules. The Backends reference page is a hand-curated comparison if you only need to pick one entry point.

Top-level

qqa

Quasi-Quantum Annealing (QQA) for combinatorial and spin-glass optimization.

Reference

Y. Ichikawa, Y. Arai. "Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling." ICLR 2025. https://openreview.net/forum?id=9EfBeXaXf0 (arXiv:2409.02135)

Typical usage::

import networkx as nx
import qqa

qqa.fix_seed(0)
g = nx.random_regular_graph(d=3, n=50, seed=0)
problem = qqa.MaximumIndependentSet(g, penalty=2)
result = qqa.anneal(problem, sol_size=100, num_epochs=1500)
print(result.best_obj, result.runtime)

Spin-glass example::

problem = qqa.SherringtonKirkpatrick(N=100, seed=0)
result = qqa.anneal(problem, sol_size=200, num_epochs=2000)
print("E_0 per spin:", result.best_obj / 100)

AnnealResult dataclass

Result returned by :func:anneal.

Attributes

best_sol: Tensor of the best discrete solution(s) found during annealing. Shape depends on the problem: (sol_size, N) for single-instance, or (num_instance, max_node) for batched-instance problems. best_obj: Best objective value observed. float for single-instance problems, numpy.ndarray of shape (num_instance,) for batched-instance. runtime: Wall-clock time of the annealing loop in seconds. history: Dict of per-epoch metrics (loss_mean, penalty_mean, diversity, bg). Empty if record_history=False. callbacks: List of callback instances that were active. Useful for retrieving e.g. TrajectoryTracker.values.

score = field(default_factory=dict) class-attribute instance-attribute

Human-readable problem-specific score produced by :py:meth:COProblem.score_summary.

  • Single-instance: standard dict {label, value, unit, feasible, extra} with scalar fields.
  • Batched-instance (problem.num_instance > 1): same keys, but value and feasible are np.ndarray of length num_instance, and extra carries arrays plus a feasible_count tally. score is empty for batched problems whose class did not override :py:meth:COProblem.score_summary.

polished_sol = None class-attribute instance-attribute

1-flip-locally-optimal version of :attr:best_sol, populated when :func:anneal is called with polish=True (the default) on a QUBO problem. None for non-QUBO problems or when polishing is disabled. For QUBO problems best_sol / best_obj / score are replaced by the polished result whenever it is strictly better, so callers that just read best_sol automatically benefit from the polish.

anneal(problem, *, sol_size=100, learning_rate=1.0, temp=0.0, schedule=None, min_bg=None, max_bg=None, curve_rate=2, div_param=0.0, num_epochs=10000, check_interval=1000, device='cpu', callbacks=(), record_history=True, verbose=True, mixed_precision='fp32', initial_state=None, polish=True)

Run Quasi-Quantum Annealing on problem.

Parameters

problem: Any :class:~qqa.problems.COProblem subclass. Must expose loss_fn(x) and a relaxation attribute. sol_size: Number of parallel candidate solutions (batch size). learning_rate: AdamW learning rate for the relaxed variable. temp: Langevin noise temperature. If 0 no noise is added. schedule: Callable (epoch, num_epochs) -> bg. If None a :class:LinearBGSchedule is built from min_bg/max_bg. min_bg, max_bg: Convenience override for the default linear schedule. curve_rate: Exponent of the QQA penalty (must be even for the convex regime). div_param: Weight of the diversity term. Set to 0 to disable. num_epochs: Number of gradient steps. check_interval: How often to print progress logs. device: torch device. callbacks: Additional callbacks. A :class:HistoryRecorder is prepended when record_history=True. record_history: If True, loss/penalty/diversity/bg are recorded per epoch. verbose: If True, print periodic progress. mixed_precision: If "bf16" and device is CUDA, the forward pass and loss evaluation run inside torch.amp.autocast with bfloat16 — typically a 1.3x-2x speedup on Ampere/Hopper/Blackwell with negligible accuracy loss for QUBO/spin objectives. The relaxed variable, gradients, and AdamW state stay in float32 for numerical stability. Defaults to "fp32" so behaviour is bit-for-bit identical to the legacy loop. On CPU this option is silently downgraded to "fp32". initial_state: Optional warm-start bitstring. Shape (N,) (broadcast to every chain with light Gaussian jitter so the population still diversifies) or (sol_size, N) (used verbatim). Particularly effective on near-bipartite MaxCut where :func:qqa.warmstart.bfs_2color already gives a 0.96+ ApR seed. Ignored for batched-instance problems. polish: If True (default), run :func:qqa.polish.greedy_one_flip on the winning bitstring at the end of training and replace best_sol / best_obj / score with the polished result whenever it is strictly better. The polish costs O(N · #flips) and is silently skipped for problems without a Q_mat (Spin / Categorical / batched-instance), so it is safe to leave on globally.

fix_seed(seed)

Seed Python/Numpy/Torch (CPU + CUDA) for deterministic runs.

.. note:: This call flips torch.backends.cudnn.deterministic to True for the whole process, which disables some CuDNN auto-tuning and may slow non-QQA code sharing the same interpreter. Call it once near the start of your script (or test session) and not inside hot loops.

generate_graph(n, d=None, p=None, graph_type='reg', random_seed=0)

Generate a random graph.

Parameters

graph_type : {"reg", "prob", "erdos"} * reg -- n-node, d-regular graph * prob -- fast G(n, p) * erdos -- classic Erdős-Rényi G(n, p)

Problems

qqa.problems.base

Abstract problem base classes.

Every problem class in QQA exposes:

  • loss_fn(x) — the (continuous or discrete) objective, vectorised over the leading batch dimension that qqa.anneal uses for the parallel population.
  • relaxation — a :class:~qqa.relaxation.Relaxation instance describing how the variable is represented during annealing.

Binary QUBO problems return losses of shape (B,) for a single graph, or (B, I) for batched-instance variants. Categorical and spin problems return losses of shape (B,).

COProblem

Bases: ABC

Abstract base class for any combinatorial optimisation problem.

score_summary(x_disc)

Problem-specific, human-readable breakdown of a discrete solution.

The default implementation evaluates :meth:loss_fn and reports the raw loss. Concrete subclasses should override to return a dict with label / value / unit / feasible / extra so the dashboard can display e.g. "IS size: 22" instead of "loss: -22".

QUBOProblem

Bases: COProblem

Abstract base for QUBO problems that expose a Q matrix.

normalize_graph(graph)

Return a graph whose nodes are 0, 1, ..., N-1.

Many QUBO constructors use node labels directly as matrix/tensor indices (Q[u, v] = ...), so a graph whose nodes are {10, 20, 30} (or strings, or a subset of range(N)) silently produces a wrong QUBO or raises an IndexError. This helper returns graph unchanged when the labels are already a contiguous 0..N-1 range and returns a relabelled copy otherwise. It does not mutate the input.

qqa.problems.qubo

Binary QUBO problems: MIS, MaxClique, MaxCut.

All classes compute loss = x^T Q x on the continuous relaxation x \in [0, 1]^N supplied by :class:~qqa.relaxation.BinaryRelaxation (or its batched variant). Minimising the loss is equivalent to solving the corresponding combinatorial problem.

The *Instance variants pack a list of graphs of (possibly different) sizes into a single (num_instance, max_node, max_node) Q tensor so the solver can attack all of them in one qqa.anneal call. Each instance carries a pad_mask that the loss / score_summary multiplies in to keep padded positions semantically inert — the optimiser may put anything in x[i, n_i:] because the mask zeroes its contribution to both the loss and the reported objective.

MaximumIndependentSet

Bases: QUBOProblem

MIS as a QUBO: diag(-1) with penalty on each edge.

The loss x^T Q x is -|S| + penalty * (#violated edges), so when all constraints are satisfied, -loss equals the independent-set size.

MaximumIndependentSetInstance

Bases: COProblem

Batched-instance MIS, padded to max_node and masked in the loss.

Parameters

nx_graph_list Heterogeneous list of NetworkX graphs (any sizes n_i <= max_node). max_node Padding width. If None (recommended), uses max(g.number_of_nodes() for g in nx_graph_list). penalty Edge-violation penalty. Either a scalar (broadcast to all instances) or a per-instance sequence of length I. device Torch device for the dense Q_tensor and pad_mask.

Loss

loss[b, i] = (m_i ⊙ x_{b,i})^T Q_i (m_i ⊙ x_{b,i}) where m_i is the per-instance pad mask. The mask makes padded positions strictly inert: anything the optimiser writes into x[:, i, n_i:] is squashed to zero before the einsum.

score_summary(x_disc)

Per-instance IS sizes & feasibility for best_sol of shape (I, N).

value and feasible are 1-D np.ndarray of length num_instance; extra carries plain-Python lists / ints so the whole dict is json.dumps-able (the bench runner relies on this).

MaxClique

Bases: QUBOProblem

Max clique as a QUBO: diag(-1) with penalty on non-edges.

MaxCliqueInstance

Bases: COProblem

Batched-instance Max Clique with per-instance pad mask.

MaxCut

Bases: QUBOProblem

Weighted Max-Cut QUBO (minimising x^T Q x).

MaxCutInstance

Bases: COProblem

Batched-instance Max-Cut with per-instance pad mask.

qqa.problems.categorical

Categorical (one-hot) problems: balanced graph partitioning and coloring.

BalancedGraphPartition

Bases: COProblem

Balanced K-partitioning of a graph.

Minimises the edge cut plus a soft balance penalty (so each partition contains roughly N/K nodes).

cut_ratio(x)

Edge-cut ratio (|E| - intra-class edges) / |E|.

balanceness(x)

Balance score in [0, 1] (higher is better).

Coloring

Bases: COProblem

K-coloring: counts same-colour adjacent pairs (0 iff proper).

qqa.problems.spin

Physics-flavoured spin problems for QQA.

All classes in this module use :class:~qqa.relaxation.SpinRelaxation, so the x tensor fed in during annealing lives in [0, 1] while problem.loss_fn sees the transformed spin s = 2x - 1 \in [-1, +1] (and exactly \pm 1 after rounding).

Energies follow physics conventions (lower is better):

  • Ising 1D: E = -sum_<i,j> J_{ij} s_i s_j - h sum_i s_i
  • Edwards-Anderson / SK / Hopfield: E = -0.5 s^T J s with symmetric J and diag(J) = 0 (so the full sum equals -sum_<i,j> J_{ij} s_i s_j).
  • Binary perceptron: a smooth surrogate for the number of mis-classified teacher-student patterns.

SpinProblem

Bases: COProblem

Base class for \pm 1 spin problems.

Subclasses must populate self.num_spins and attach a :class:SpinRelaxation. Most subclasses also build a symmetric coupling matrix self.J and rely on :meth:quadratic_energy.

quadratic_energy(s)

Compute E = -0.5 s^T J s - h . s for a batch of spin configs.

s has shape (B, N); returns a 1D tensor of shape (B,).

Ising1D

Bases: SpinProblem

One-dimensional Ising chain with nearest-neighbour coupling J.

Energy: E = -J sum_i s_i s_{i+1} - h sum_i s_i.

For J > 0 and h = 0 with periodic boundaries, the ground state is all spins aligned with energy -J * N.

Parameters:

Name Type Description Default
N int

Number of spins.

required
J float

Uniform nearest-neighbour coupling strength.

1.0
h float

Uniform external field.

0.0
periodic bool

Whether to close the chain (s_N = s_0).

True

EdwardsAnderson

Bases: SpinProblem

Edwards-Anderson spin-glass on a hyper-cubic lattice.

Only nearest-neighbour bonds are coupled, with J_{ij} \sim N(0, \sigma^2) drawn once at construction time. The energy is E = -0.5 s^T J s with symmetric J.

Parameters:

Name Type Description Default
L int

Lattice side length (N = L ** dim spins).

required
dim int

Spatial dimension (2 or 3). Default 3 matches the classical 3D-EA benchmark.

3
seed int

RNG seed for the couplings.

0
periodic bool

Whether to use periodic boundary conditions.

True
sigma float

Standard deviation of the Gaussian couplings.

1.0

from_couplings_txt(path, N, *, L=None, dim=3, periodic=True, device='cpu') classmethod

Load an EA instance from a text file of i j J_ij rows.

Compatible with the couplings_L{L}_R1_seed{seed}.txt format produced by related projects: rows of i j J_ij with 0-based indices. No metadata is assumed; N must be provided. Pass L explicitly when N != L**dim (the cubic-root fallback below only yields a correct L for cubic lattices).

SherringtonKirkpatrick

Bases: SpinProblem

Sherrington-Kirkpatrick mean-field spin glass.

All-to-all couplings with J_{ij} \sim N(0, 1/N) for i \ne j and J_{ii} = 0. Energy: E = -0.5 s^T J s.

The standard normalisation J_{ij} \sim N(0, 1/N) makes the typical ground-state energy density e_0 = E_0 / N converge to \approx -0.7632 (Parisi).

PSpinGlass

Bases: SpinProblem

Dense p-spin Sherrington-Kirkpatrick model.

Energy:

.. math::

E = -\sum_{i_1 < i_2 < \dots < i_p} J_{i_1 i_2 \dots i_p}
    s_{i_1} s_{i_2} \dots s_{i_p}

with i.i.d. Gaussian couplings drawn from J \sim \mathcal{N}(0,\, p!/(2 N^{p-1})) so that the typical ground-state energy density is intensive (Crisanti & Sommers, 1992 <https://doi.org/10.1051/jphys:0199200530100128300>_).

For p = 2 this reduces to the classical Sherrington-Kirkpatrick model (use :class:SherringtonKirkpatrick for the symmetric J matrix form). For p \ge 3 the model exhibits a discontinuous 1RSB freezing transition and is a canonical hard benchmark for annealing solvers — small instances already form rugged landscapes with exponentially many metastable states.

Parameters:

Name Type Description Default
N int

Number of spins.

required
p int

Interaction order (p \ge 2). Defaults to 3.

3
seed int

RNG seed for the couplings.

0

RandomFieldIsing

Bases: SpinProblem

Random-Field Ising Model on a hyper-cubic lattice.

Energy:

.. math::

E = -J \sum_{\langle i, j \rangle} s_i s_j - \sum_i h_i s_i,
\qquad h_i \sim \mathcal{N}(0,\, \sigma_h^2)

The couplings are uniform ferromagnetic (J > 0) and the disorder sits in the local fields. This is one of the cleanest models with quenched randomness in equilibrium statistical physics: in 3D with Gaussian fields it has a well-studied finite-temperature phase transition, and the zero-temperature ground-state problem at finite \sigma_h / J is a classical optimisation benchmark (max-flow polynomial-time exact solvers exist — useful as a sanity check for annealing solvers).

Parameters:

Name Type Description Default
L int

Lattice side length (N = L ** dim spins).

required
dim int

Spatial dimension (default 2).

2
J float

Uniform ferromagnetic coupling.

1.0
h_std float

Standard deviation of the Gaussian random field.

1.0
seed int

RNG seed for the fields.

0
periodic bool

Periodic boundary conditions.

True

BinaryPerceptron

Bases: SpinProblem

Teacher-student binary perceptron in the storage formulation.

Patterns \xi^\mu \in \{-1, +1\}^N are drawn uniformly, and a teacher s^* \in \{-1, +1\}^N generates labels \sigma^\mu = sign(\frac{1}{\sqrt N} \xi^\mu \cdot s^*).

The learning loss is the number of patterns on which the student s disagrees with the teacher. During annealing this is replaced by a smooth surrogate \sum_\mu \sigma(-k z^\mu) where z^\mu = \sigma^\mu \frac{1}{\sqrt N} \xi^\mu \cdot s and \sigma is the logistic sigmoid, so gradients are available. After rounding, :meth:error_count returns the exact number of errors.

Parameters:

Name Type Description Default
N int

Input dimension.

required
alpha float

Loading M / N (so M = round(alpha * N) patterns).

0.5
seed int

RNG seed for patterns and teacher.

0
sharpness float

Sigmoid steepness k. Larger k approaches the step loss but is harder to optimise.

10.0

error_count(s)

Exact number of mis-classified patterns for each config in s.

A pattern is counted as an error when the pre-activation z is strictly negative; exact ties (z == 0) occur only for the rare case where the teacher inner product vanishes and are treated as correct.

HopfieldMemory

Bases: SpinProblem

Hopfield associative-memory model with Hebbian couplings.

Given P patterns \xi^\mu \in \{-1, +1\}^N, the couplings are J_{ij} = (1/N) \sum_\mu \xi^\mu_i \xi^\mu_j for i \ne j and J_{ii} = 0. Energy: E = -0.5 s^T J s.

At a stored pattern s = \xi^\mu and low loading \alpha = P/N, E \approx -N/2.

Parameters:

Name Type Description Default
N int

Number of spins.

required
patterns int | ndarray | Tensor | Sequence[Sequence[int]]

Either an integer P (sample P random \pm 1 patterns) or a pre-computed tensor/array of shape (P, N).

1
seed int

RNG seed used when patterns is an integer.

0

overlap(s)

Normalised overlap with every stored pattern.

Returns a tensor of shape (B, P) with m^\mu_b = (1/N) \sum_i \xi^\mu_i s_{b,i}.

qqa.problems.extras

Extra problem catalog for QQA.

Eight classic discrete-optimization problems that plug into the unified qqa.anneal loop alongside the graph / spin / categorical problems already shipped.

Design

  • Every class exposes loss_fn(x) -> (B,) that is minimised by QQA (so e.g. Knapsack returns the negative value of the packed items).
  • Every class exposes score_summary(x_disc) -> dict returning a human-readable breakdown of the best solution so the dashboard can print e.g. "packed value: 138 / 150, feasible: True".
  • Variable sizing attributes are chosen to match the relaxation that each problem consumes:
    • :class:BinaryRelaxation expects num_nodes.
    • :class:SpinRelaxation expects num_spins.
    • :class:CategoricalRelaxation expects num_node + num_category.

NumberPartitioning

Bases: COProblem

Classic number partitioning of N positive integers.

Given a_1, ..., a_N the goal is to split them into two subsets whose sums are as close as possible, i.e. minimise

.. math:: \Bigl(\sum_i a_i s_i\Bigr)^2, \quad s_i \in {-1, +1}.

Uses :class:SpinRelaxation internally so loss_fn receives :math:s \in [-1, 1]^{B\times N} directly.

Knapsack

Bases: COProblem

0/1 knapsack.

Maximise :math:\sum_i v_i x_i subject to :math:\sum_i w_i x_i \le C. We minimise

.. math:: -\sum_i v_i x_i + \lambda\,\bigl[\max(0, \sum_i w_i x_i - C)\bigr]^2

so solutions that respect the capacity dominate.

VertexCover

Bases: COProblem

Minimum vertex cover on an undirected graph.

Select a minimum-size vertex subset that touches every edge; we use the QUBO form

.. math:: H = \sum_i x_i + \lambda \sum_{(u,v)\in E} (1 - x_u)(1 - x_v)

GraphBisection

Bases: COProblem

Balanced graph bisection.

Partition vertices into two equal-size sets minimising the cut:

.. math:: H = \sum_{(u,v)\in E} (x_u - x_v)^2 + \lambda \bigl(\sum_i x_i - N/2 \bigr)^2

MinimumDominatingSet

Bases: COProblem

Minimum dominating set on an undirected graph.

Select a minimum-size vertex subset :math:S such that every vertex is either in :math:S or adjacent to some vertex in :math:S. We minimise

.. math:: H(x) = \sum_v x_v + \lambda \sum_v (1 - x_v)\,\prod_{u \in N(v)} (1 - x_u),

where :math:x \in \{0, 1\}^N. The product factor equals 1 exactly when v is unselected and none of its neighbours are selected (i.e. v is uncovered), so the penalty term counts uncovered vertices.

During annealing we use the same form on the continuous relaxation :math:x \in [0, 1]^N. Because :math:(1 - x_u) \in [0, 1], the product is bounded in :math:[0, 1] and matches the discrete indicator at the corners. To keep the cost :math:O(B \cdot |E|) per forward pass (no per-vertex Python loop) we evaluate the log-product on the edges:

.. math:: \prod_{u \in N(v)} (1 - x_u) = \exp!\left( \sum_{u \in N(v)} \log(1 - x_u + \varepsilon) \right).

SATClause dataclass

Signed literals of a 3-SAT clause. lits[k] = (var_idx, sign).

MaxSAT3

Bases: COProblem

Random 3-SAT / MaxSAT.

Minimise the number of unsatisfied 3-CNF clauses. A clause is represented by three signed literals. Let :math:L_i(x) equal :math:x_i when the literal is positive and :math:1 - x_i otherwise. The clause is violated iff :math:(1-L_1)(1-L_2)(1-L_3) = 1, which we sum across all clauses as the (exact) loss.

TSP

Bases: COProblem

Symmetric travelling salesperson — solved with the penalty method.

Variable encoding follows Lucas (2014) "Ising formulations of many NP problems": x ∈ {0, 1}^{N×N} with x[t, i] = 1 iff city i occupies tour position t. Three additive terms make every constraint visible inside loss_fn:

  • distance — :math:\sum_t \sum_{i \neq j} d_{ij}\, x_{t,i}\, x_{t+1,j} (cyclic, so position N-1 wraps back to 0).
  • row penalty — :math:\lambda_r \sum_t (\sum_i x_{t,i} - 1)^2 so every position holds exactly one city.
  • column penalty — :math:\lambda_c \sum_i (\sum_t x_{t,i} - 1)^2 so every city appears exactly once.

Unlike the previous CategoricalRelaxation-based formulation (where the row constraint was enforced by softmax inside the relaxation, hiding it from the loss), this version uses :class:BinaryRelaxation so QQA sees both penalties as gradients and the user can tune their weights from the Streamlit sidebar — a textbook penalty-method setup.

QAP

Bases: COProblem

Quadratic assignment problem (random flow / distance matrices).

Minimise :math:\sum_{i,j} F_{ij} D_{\pi(i)\pi(j)} where π assigns facilities to locations. Encoded as a CategoricalRelaxation with x[:, i, k] = 1 iff facility i goes to location k.

NQueens

Bases: COProblem

Place N non-attacking queens on an N×N board.

Each row is a simplex (handled by :class:CategoricalRelaxation) so row constraints are free; we penalise column-, diagonal- and anti-diagonal conflicts via pair-counts.

qqa.problems.user

User-defined problem helper.

Drop-in wrapper that lets a user plug an arbitrary differentiable loss into :func:qqa.anneal without having to subclass :class:COProblem::

import torch, qqa

J = torch.randn(50, 50); J = (J + J.T) / 2; J.fill_diagonal_(0)
problem = qqa.UserProblem(
    num_vars=50,
    variable_kind="spin",
    loss_fn=lambda s: -0.5 * torch.einsum("bi,ij,bj->b", s, J, s),
)
result = qqa.anneal(problem, sol_size=128, num_epochs=1000)

Three variable kinds are supported:

  • "binary"x \in [0, 1] (rounded to {0, 1}).
  • "spin"s = 2 \, \text{clip}(x, 0, 1) - 1 \in [-1, +1] and projected to \{-1, +1\}.
  • "categorical" — one-hot simplex x \in \Delta^K per variable.

For categorical kinds the num_vars argument sets the number of categorical variables and num_category must also be passed.

Loss functions must accept a tensor whose leading axis is the parallel batch B = sol_size and return a tensor of shape (B,).

UserProblem

Bases: COProblem

Wrap an arbitrary loss function as a :class:COProblem.

Parameters:

Name Type Description Default
num_vars int

Number of discrete variables (N).

required
loss_fn Callable[[Tensor], Tensor]

Callable mapping a batched tensor to a (B,) loss tensor. For "binary" / "spin" the input has shape (B, N). For "categorical" the input has shape (B, N, K).

required
variable_kind VariableKind

"binary", "spin", or "categorical".

'binary'
num_category int | None

Required when variable_kind == "categorical".

None
relaxation Relaxation | None

Advanced — pass a custom :class:Relaxation instance to override the default chosen from variable_kind.

None
name str

Optional display name used by the GUI/CLI.

'user-problem'
device str | device

Torch device hint (only used by QQA's allocation path; the loss itself is device-agnostic as long as any constants it captures live on the right device).

'cpu'

user_problem_from_source(source, num_vars, variable_kind='binary', num_category=None, name='inline', device='cpu', extra_globals=None)

Build a :class:UserProblem by exec-ing a Python snippet.

The snippet must define a callable named loss_fn (single argument, the batched configuration tensor, returning a (B,) loss tensor). The namespace has torch, np (numpy), and any extra_globals entries pre-loaded, and the defined loss_fn is closed over that namespace.

.. warning:: This executes arbitrary Python code. Only use with trusted input (e.g. the local GUI or CLI on your own machine).

load_problem_from_file(path)

Load a user-provided problem from a Python file.

The file must either define a top-level variable problem that is a :class:COProblem instance, or a callable make_problem() / build() that returns one.

Relaxations

qqa.relaxation

Relaxation strategies for QQA.

A Relaxation defines how a combinatorial variable is represented as a continuous tensor during annealing. It encapsulates:

  • initialization of the relaxed variable,
  • the transformation fed into problem.loss_fn (forward),
  • the discrete projection used to evaluate the true objective (project),
  • the quasi-quantum penalty function,
  • the diversity term across the parallel batch,
  • an in-place Langevin-style perturbation.

All relaxations operate on a leading batch dimension of size sol_size.

Relaxation

Bases: Protocol

Protocol that any relaxation strategy must satisfy.

BinaryRelaxation

Relaxation for binary variables x in [0, 1].

Used for QUBO problems (MIS, MaxClique, MaxCut) on either a single graph (shape (sol_size, N)) or a batch of graphs via an instance problem (shape (sol_size, I, N)).

SpinRelaxation

Bases: BinaryRelaxation

Relaxation for ising-style spin variables s \in \{-1, +1\}.

Internally the latent representation x lives in [0, 1] (same as :class:BinaryRelaxation), but :meth:forward maps it to the spin s = 2 \, \text{clip}(x, 0, 1) - 1 so that problem.loss_fn can safely work on real-valued spins in [-1, +1]. The discrete projection thresholds at 0.5.

Because spin problems typically couple variables quadratically without a convex QUBO structure, AdamW steps can push the latent x outside [0, 1]; we clip before the forward so the effective spin stays in [-1, +1], and :meth:perturb_ always clamps x back even when temp == 0.

BinaryInstanceRelaxation

Bases: BinaryRelaxation

Binary relaxation for batched instance problems.

Expects the problem to expose num_instance and max_node.

CategoricalRelaxation

Relaxation for one-hot categorical variables.

Variable tensor shape: (sol_size, N, K). The forward pass normalises across the category axis and project returns one-hot tensors.

penalty_from_forward(x, x_fwd, curve_rate)

Optimised path: reuse the already-normalised tensor from forward.

Used by :func:qqa.anneal to avoid the simplex normalisation re-running every epoch (the legacy penalty calls forward internally, which doubled the cost on the hot path).

Schedules

qqa.schedule

Annealing schedules for the penalty coefficient bg.

A schedule is any callable schedule(epoch, num_epochs) -> float. The default linear schedule matches the original QQA paper.

LinearBGSchedule dataclass

Linear schedule bg(t) = min_bg + (max_bg - min_bg) * t / T.

When min_bg < 0 and max_bg > 0 the landscape transitions from "convex, half-integer minima" (the quasi-quantum regime) to the discrete regime where binary corners are favoured.

.. note:: The annealing loop calls this with epoch ∈ {0, ..., T-1}, so the final value is min_bg + (T-1)/T * (max_bg - min_bg) rather than exactly max_bg — the schedule approaches max_bg as T → ∞. For short runs (small num_epochs) the last bg can therefore be visibly smaller than max_bg. Increase num_epochs if you need to reach max_bg (matching the published QQA curves), or supply a custom callable for a different endpoint convention.

Callbacks

qqa.callbacks

Callbacks for the QQA annealing loop.

Callbacks receive a CallbackState snapshot at the end of every epoch and can record metrics, adjust hyper-parameters, or track auxiliary objectives.

CallbackState dataclass

Mutable context passed to callbacks at each epoch.

The annealing loop writes fields here. Callbacks may read any field and may write to extras or mutate hyperparams (e.g. div_param).

Callback

Base class. Override on_epoch_end (and optionally other hooks).

HistoryRecorder

Bases: Callback

Record loss / penalty / diversity statistics per epoch.

Performance notes

The recorder is on the hot path of every annealing step. Naively calling tensor.item() for every metric forces a CUDA device->host sync at each epoch and dominates the wall-clock when the kernels themselves are cheap (small problems / GPU). To avoid that:

  • Per-epoch scalars are kept as GPU 0-dim tensors and appended to a Python list. There is no .item() call inside :meth:on_epoch_end.
  • :meth:on_train_end performs a single torch.stack(...).cpu() per metric, which costs one sync regardless of num_epochs.
  • stride skips intermediate epochs entirely (the last epoch is always recorded so that final-state observers see a non-empty history).

The exposed self.history dict is still a dict[str, list[float]] (or list[list[float]] for best_obj of batched-instance problems), so existing code that reads recorder.history["loss_mean"][-1] is unchanged.

AutoDivTuner

Bases: Callback

Adaptively tune div_param to target a desired diversity ratio.

At each epoch: ratio = diversity / (sol_size * N). The controller nudges div_param by lr * (ratio - target) and clips to [0, 1].

PopulationTracker

Bases: Callback

Snapshot the parallel population for post-hoc parallel-search visualisation.

Records, every stride epochs:

  • loss — the (sol_size,) per-replica loss.
  • x — optionally, the continuous variables (heavier but lets you reconstruct PCA trajectories or per-variable heatmaps).

Attributes:

Name Type Description
epochs list[int]

list of recorded epochs.

loss list[Any]

list of (sol_size,) numpy arrays.

x list[Any]

list of (sol_size, ...) numpy arrays when record_x=True; otherwise empty.

TrajectoryTracker

Bases: Callback

Track a secondary problem's objective per epoch.

Useful for e.g. monitoring the "true" MIS size while optimising a penalised QUBO formulation.

Visualization

qqa.visualization

Visualisation helpers for :class:~qqa.annealing.AnnealResult.

Every plotting function accepts a backend argument:

  • "matplotlib" (default) — static figures, no optional deps.
  • "plotly" — interactive figures, requires pip install qqa[plotly].

If Plotly is not installed and backend="plotly" is requested, the functions automatically fall back to matplotlib with a warning.

Plot catalog:

  • :func:plot_history — loss / penalty / diversity dynamics.
  • :func:plot_best_trajectory — best objective value over epochs.
  • :func:plot_schedule — the annealing schedule bg(epoch).
  • :func:plot_run_comparison — overlay multiple runs.
  • :func:plot_parallel_coordinates — hyper-parameter sweep view.
  • :func:plot_solution_heatmap — spins / bits of the best solution.
  • :func:plot_population_evolution — parallel-population loss heat-map.
  • :func:plot_population_embedding — PCA trajectory of the population.

plot_history(result, title='QQA dynamics', backend='matplotlib', show=True)

Plot mean loss, mean penalty and diversity across epochs.

Returns the backend-native figure object ((fig, axes) for matplotlib, go.Figure for plotly).

plot_best_trajectory(result, title='Best objective per epoch', backend='matplotlib', show=True)

Plot best_obj vs epoch (monotonically non-increasing).

plot_schedule(schedule, num_epochs, title='Annealing schedule', backend='matplotlib', show=True)

Visualise the bg annealing schedule over num_epochs.

plot_run_comparison(results, labels=None, title='Run comparison', backend='matplotlib', show=True)

Overlay best_obj trajectories from multiple runs.

plot_parallel_coordinates(sweep_df, objective='best_obj', title='Hyperparameter sweep', backend='plotly', show=True)

Parallel-coordinates plot of a hyper-parameter sweep.

sweep_df is expected to be a pandas.DataFrame (or any object with a to_dict(orient="list") method) whose columns are the hyper-parameters plus one objective column.

The Plotly backend produces a coloured interactive figure (recommended). Matplotlib falls back to a simple scatter-matrix-like rendering.

plot_solution_heatmap(result, problem=None, title='Best solution', backend='matplotlib', show=True)

Render the best discrete solution as a 1D/2D heatmap.

For lattice spin problems (EdwardsAnderson with dim == 2) the solution is reshaped to (L, L) automatically.

plot_population_evolution(tracker, title='Parallel population loss', backend='plotly', show=True)

Render the parallel population's loss landscape across epochs.

tracker must be a :class:~qqa.callbacks.PopulationTracker instance that captured snapshots during the run. Each column of the resulting heat-map is one snapshot epoch, each row is one replica in the sol_size population, and colour encodes loss. Rows are sorted once, by final-epoch loss, to keep the panel readable.

The best trajectory is overlaid as a thin white curve.

plot_population_embedding(tracker, title='Population PCA trajectory', backend='plotly', show=True)

2D PCA of the parallel population's continuous variables over time.

Projects every snapshot of tracker.x (shape (sol_size, N, ...)) onto the 2 principal components computed from the concatenation of all snapshots, then draws the resulting trajectory as a scatter coloured by epoch with replica paths drawn as light-grey lines.

Requires :class:~qqa.callbacks.PopulationTracker to have been run with record_x=True.

Optional PyG backends

The functions below require the pignn extra (pip install "qqa[pignn]").

qqa.pignn

Optional PyTorch Geometric backend: CRA-PI-GNN and CPRA trainers.

This subpackage provides self-contained re-implementations of two GNN-based unsupervised-learning combinatorial-optimization solvers:

  • CRA-PI-GNN — Y. Ichikawa, "Controlling Continuous Relaxation for Combinatorial Optimization," NeurIPS 2024 (https://openreview.net/forum?id=ykACV1IhjD). Single-replica continuous-relaxation annealing on a 2-layer GCN.

  • CPRA — Y. Ichikawa & H. Iwashita, "Continuous Parallel Relaxation for Finding Diverse Solutions in Combinatorial Optimization Problems," TMLR 2025 (https://openreview.net/forum?id=ix33zd5zCw). A multi-head extension of CRA-PI-GNN that returns R diverse solutions in one training run, supporting both penalty- and variation-diversification.

Both reference releases use DGL. Because DGL's prebuilt wheels do not yet target NVIDIA Blackwell (sm_100) and lag the latest PyTorch / CUDA combos, we ship ports written in PyTorch Geometric so QQA users on modern GPUs can compare against either method without juggling DGL.

Why this lives here, not in the main qqa namespace

  • PyG and its transitive deps are heavy. We do not want import qqa to pay for them when the user only needs qqa.anneal.
  • The trainers are backend alternatives to qqa.anneal, not building blocks of it. Keeping them isolated also makes the README's "QQA vs. CRA-PI-GNN vs. CPRA" comparison story clear.

Quickstart

::

pip install "qqa[pignn]"

CRA-PI-GNN (single solution per run)::

import networkx as nx
import qqa
from qqa.pignn import train_cra_pi_gnn

qqa.fix_seed(0)
g = nx.random_regular_graph(d=3, n=200, seed=0)
problem = qqa.MaximumIndependentSet(g, penalty=2)
result = train_cra_pi_gnn(problem, num_epochs=4000)
print(result.score)

CPRA (R diverse solutions per run, e.g. one per penalty level)::

from qqa.pignn import train_cpra_pi_gnn

penalties = [1.5, 2.0, 2.5, 3.0]
replicas = [qqa.MaximumIndependentSet(g, penalty=p) for p in penalties]
result = train_cpra_pi_gnn(
    problem,
    num_replicas=len(replicas),
    replica_problems=replicas,
    num_epochs=4000,
)
for record in result.score["extra"]["replicas"]:
    print(record["score"])

Both functions return a :class:qqa.AnnealResult, so every downstream helper that consumes AnnealResult (visualisation, CLI scoring) keeps working.

See also

  • CRA reference (DGL): https://github.com/Yuma-Ichikawa/CRA4CO
  • CPRA reference (DGL): https://github.com/Yuma-Ichikawa/CPRA4CO

GCNNet

Bases: Module

Two-layer GCN with a learnable node embedding (PI-GNN style).

Parameters

num_nodes: Number of nodes in the graph (also the embedding table size). in_feats: Width of the input embedding. Defaults to floor(sqrt(N)) to match the reference implementation. hidden_dim: Hidden width between the two GCN layers. Defaults to in_feats. dropout: Dropout probability applied after the first GCN layer. Defaults to 0 — the original paper used 0 for the headline MIS results. num_replicas: Number of independent output channels. Defaults to 1, which preserves the single-head CRA-PI-GNN behaviour and keeps the forward output shape (N,). With num_replicas >= 2 the network becomes the CPRA multi-head backbone of Ichikawa & Iwashita (TMLR 2025) — a single shared embedding + GCN backbone produces R parallel continuous solutions in one forward pass and the output shape is (N, R).

Notes

The forward pass takes only edge_index because the node "features" are the learned embedding rows; they evolve through the same backward pass as the convolution weights. This is the standard PI-GNN trick from Schuetz et al. (Nature MI 2022). For num_replicas >= 2 only the second convolution's output channels grow — the embedding and first convolution are shared across replicas, matching the CPRA shared-representation design.

forward(edge_index)

Compute soft node assignments p \in (0, 1).

Parameters

edge_index: (2, 2|E|) long tensor produced by :func:qqa.pignn.graph.nx_to_edge_index.

Returns

torch.Tensor (N,) tensor of probabilities when num_replicas == 1 (CRA-PI-GNN compatibility), or (N, num_replicas) when num_replicas >= 2 (CPRA layout).

train_cpra_pi_gnn(problem, *, num_replicas=4, replica_problems=None, vari_param=0.0, nx_graph=None, in_feats=None, hidden_dim=None, dropout=0.0, learning_rate=0.0001, weight_decay=0.01, annealing=True, init_reg_param=-20.0, annealing_rate=0.001, curve_rate=2, num_epochs=100000, tol=0.0001, patience=1000, early_stop_disc_patience=None, check_interval=1000, device='cpu', seed=None, verbose=True, polish=True)

Train a CPRA multi-head PI-GNN solver and return an :class:AnnealResult.

CPRA (Continual Parallel Relaxation Annealing) is the multi-replica extension of CRA-PI-GNN introduced by Ichikawa & Iwashita, Transactions on Machine Learning Research, 2025 (OpenReview <https://openreview.net/forum?id=ix33zd5zCw>_). A single shared GCN backbone produces R continuous solutions in one forward pass, and the loss combines a per-replica QUBO term, the standard CRA penalty, and an optional inter-replica diversity term.

Two diversification regimes are supported:

  1. Penalty diversification — supply replica_problems (length num_replicas) where each problem instance differs in some hyperparameter (e.g. MaximumIndependentSet(g, penalty=p_r) for a sweep of p_r). One training run yields one solution per penalty level, much cheaper than independent runs.
  2. Variation diversification — leave replica_problems=None so every replica solves the same problem, but set vari_param > 0 to add the diversity term -R · Σ_i std_r(p_{i,r}) (sign chosen so the loss decreases when between-replica spread grows). Replicas then converge to structurally different solutions.
Parameters

problem: Base graph-based problem (used for graph extraction and as the default score_summary provider when replica_problems is None). num_replicas: Number of parallel continuous solutions R. Defaults to 4. replica_problems: Optional list of num_replicas problem instances. When provided, replica_problems[r].loss_fn evaluates the cost for replica r. Must share the same underlying graph as problem (only the QUBO Q_mat may differ — typically via a different penalty weight). When None, all replicas use problem.loss_fn. vari_param: Coefficient of the diversity term. 0 (default) is pure penalty diversification; positive values reward inter-replica spread (used for variation diversification on a fixed problem). nx_graph, in_feats, hidden_dim, dropout, learning_rate, weight_decay, annealing, init_reg_param, annealing_rate, curve_rate, num_epochs, tol, patience, check_interval, device, seed, verbose: Identical semantics to :func:train_cra_pi_gnn.

Returns

qqa.AnnealResult best_sol — the discrete (N,) assignment of the best replica (lowest QUBO objective on its own loss_fn). best_obj — that replica's float objective. history — per-epoch loss, mean_cost, reg_term, vari_term, reg_param arrays plus a per_replica_obj array of shape (epochs, R) for downstream visualisation. score['extra']['replicas'] — list of {replica, obj, score, sol} dicts so the caller can inspect every diversified solution, not only the best one.

Raises

ValueError On invalid num_replicas, vari_param sign, or a replica_problems list whose length does not match num_replicas.

Notes
  • Backbone vs. CPRA4CO. The reference CPRA implementation uses DGL GraphSAGE; this port reuses :class:GCNNet (a 2-layer GCNConv stack) for full parity with :func:train_cra_pi_gnn so head-to-head ablations across the two solvers measure the training objective, not the message-passing op.
  • Best-tracking. The reference CPRA loop returns the final iteration's discretised bits rather than the best-so-far solution. This trainer deliberately tracks the running best per replica — the QQA-side convention — to avoid losing a good early-epoch solution to a transient late spike.
  • History keys differ from :func:train_cra_pi_gnn. train_cra_pi_gnn reports per-epoch "cost"; CPRA reports "mean_cost" (per-replica average) because the raw cost scales linearly with R and is harder to compare across runs. When you mix the two trainers in a single plot, normalise by R yourself.
  • Replica collapse with vari_param=0 and replica_problems=None. With identical losses on every replica and shared embedding+backbone gradients, the R output channels drift toward the same fixed point. They start different (random init) and remain visibly distinct for the first few hundred epochs, but eventually collapse. For real variation diversification on a fixed problem, set vari_param > 0 (e.g. 0.1 to 0.5 works in practice).

train_cra_pi_gnn(problem, *, nx_graph=None, in_feats=None, hidden_dim=None, dropout=0.0, learning_rate=0.0001, weight_decay=0.01, annealing=True, init_reg_param=-20.0, annealing_rate=0.001, curve_rate=2, num_epochs=100000, tol=0.0001, patience=1000, early_stop_disc_patience=None, check_interval=1000, device='cpu', seed=None, verbose=True, polish=True)

Train a CRA-PI-GNN solver and return a :class:qqa.AnnealResult.

Parameters

problem: A graph-based binary QUBO problem from :mod:qqa — typically :class:~qqa.MaximumIndependentSet, :class:~qqa.MaxClique, :class:~qqa.MaxCut, :class:~qqa.VertexCover, or :class:~qqa.GraphBisection. Anything that exposes problem.nx_graph and problem.loss_fn works. nx_graph: Override for problem.nx_graph (rare; only needed for custom problems that store the graph elsewhere). in_feats, hidden_dim: GCN widths. Both default to floor(sqrt(N)), matching the reference paper. dropout: Dropout probability between the two GCN layers. 0 (default) reproduces the headline numbers in the paper. learning_rate, weight_decay: AdamW hyper-parameters. Defaults match the reference. annealing: If False the trainer reduces to vanilla PI-GNN (reg_param = 0 for every epoch). init_reg_param, annealing_rate: CRA schedule: reg_param = init_reg_param + epoch * annealing_rate. Set init_reg_param < 0 so the early-epoch loss landscape is concave (encourages exploration); annealing_rate > 0 linearly ramps it through 0 toward the discrete-favouring regime. curve_rate: Penalty exponent (must be even). Defaults to 2. num_epochs: Hard upper bound on gradient steps. Early stopping (see tol / patience) usually terminates earlier. tol, patience: Stop when both the loss and the penalty change by less than tol for patience consecutive epochs. check_interval: How often the verbose log is printed. device: Torch device. Strings like "cuda" are validated up-front to give a clear error if CUDA is unavailable. seed: If supplied, calls :func:qqa.fix_seed before allocating the model and embedding (so the run is reproducible). verbose: If True print periodic progress and a final summary.

Notes

The defaults here match the NeurIPS 2024 paper and are tuned for large instances (N >= 1000). For small graphs (N <= 200) the paper defaults severely under-converge: try init_reg_param=-2.0, annealing_rate=5e-4, learning_rate=1e-3 (or even learning_rate=1e-2) instead. This is a standard PI-GNN quirk — the optimal annealing schedule is sub-linear in problem size because the cost / penalty magnitudes scale very differently.

Returns

qqa.AnnealResult With best_sol of shape (N,) (rounded {0, 1} tensor), best_obj the float QUBO loss on that solution, and history containing per-epoch arrays loss, cost, reg_term, reg_param.

Raises

TypeError If problem is not graph-based (no nx_graph attribute and no nx_graph override). RuntimeError If device requests CUDA but torch.cuda.is_available() is False. ValueError On obviously-wrong arguments (curve_rate odd, num_epochs negative, ...).

qqa.pignn.trainer

CRA-PI-GNN trainer (PyTorch Geometric).

Faithful port of the fit_model loop from the reference NeurIPS 2024 implementation [#cra]_, with two intentional changes:

  1. The graph backend is :mod:torch_geometric instead of DGL, so the trainer runs on every PyTorch-supported GPU (including NVIDIA Blackwell / sm_100 for which DGL has no prebuilt wheel as of April 2026).
  2. The loss / penalty mathematics are expressed via the QQA primitives :meth:qqa.problems.QUBOProblem.loss_fn and :meth:qqa.relaxation.BinaryRelaxation.penalty, so the function returns a :class:qqa.AnnealResult and is a drop-in alternative to :func:qqa.anneal. Numerically the loss is identical to the original paper for curve_rate=2 (and for any even curve_rate):

.. math::

  L(p; \gamma) \;=\; p^\top Q\, p
                \;+\; \gamma \sum_i \bigl(1 - (1 - 2p_i)^c\bigr)

matches CRA's cost + reg_param * Σ (1 - (2p - 1)^c) because (1 - 2p)^c = (2p - 1)^c for even c.

.. [#cra] Y. Ichikawa, "Controlling Continuous Relaxation for Combinatorial Optimization," NeurIPS 2024. https://github.com/Yuma-Ichikawa/CRA4CO

train_cra_pi_gnn(problem, *, nx_graph=None, in_feats=None, hidden_dim=None, dropout=0.0, learning_rate=0.0001, weight_decay=0.01, annealing=True, init_reg_param=-20.0, annealing_rate=0.001, curve_rate=2, num_epochs=100000, tol=0.0001, patience=1000, early_stop_disc_patience=None, check_interval=1000, device='cpu', seed=None, verbose=True, polish=True)

Train a CRA-PI-GNN solver and return a :class:qqa.AnnealResult.

Parameters

problem: A graph-based binary QUBO problem from :mod:qqa — typically :class:~qqa.MaximumIndependentSet, :class:~qqa.MaxClique, :class:~qqa.MaxCut, :class:~qqa.VertexCover, or :class:~qqa.GraphBisection. Anything that exposes problem.nx_graph and problem.loss_fn works. nx_graph: Override for problem.nx_graph (rare; only needed for custom problems that store the graph elsewhere). in_feats, hidden_dim: GCN widths. Both default to floor(sqrt(N)), matching the reference paper. dropout: Dropout probability between the two GCN layers. 0 (default) reproduces the headline numbers in the paper. learning_rate, weight_decay: AdamW hyper-parameters. Defaults match the reference. annealing: If False the trainer reduces to vanilla PI-GNN (reg_param = 0 for every epoch). init_reg_param, annealing_rate: CRA schedule: reg_param = init_reg_param + epoch * annealing_rate. Set init_reg_param < 0 so the early-epoch loss landscape is concave (encourages exploration); annealing_rate > 0 linearly ramps it through 0 toward the discrete-favouring regime. curve_rate: Penalty exponent (must be even). Defaults to 2. num_epochs: Hard upper bound on gradient steps. Early stopping (see tol / patience) usually terminates earlier. tol, patience: Stop when both the loss and the penalty change by less than tol for patience consecutive epochs. check_interval: How often the verbose log is printed. device: Torch device. Strings like "cuda" are validated up-front to give a clear error if CUDA is unavailable. seed: If supplied, calls :func:qqa.fix_seed before allocating the model and embedding (so the run is reproducible). verbose: If True print periodic progress and a final summary.

Notes

The defaults here match the NeurIPS 2024 paper and are tuned for large instances (N >= 1000). For small graphs (N <= 200) the paper defaults severely under-converge: try init_reg_param=-2.0, annealing_rate=5e-4, learning_rate=1e-3 (or even learning_rate=1e-2) instead. This is a standard PI-GNN quirk — the optimal annealing schedule is sub-linear in problem size because the cost / penalty magnitudes scale very differently.

Returns

qqa.AnnealResult With best_sol of shape (N,) (rounded {0, 1} tensor), best_obj the float QUBO loss on that solution, and history containing per-epoch arrays loss, cost, reg_term, reg_param.

Raises

TypeError If problem is not graph-based (no nx_graph attribute and no nx_graph override). RuntimeError If device requests CUDA but torch.cuda.is_available() is False. ValueError On obviously-wrong arguments (curve_rate odd, num_epochs negative, ...).

train_cpra_pi_gnn(problem, *, num_replicas=4, replica_problems=None, vari_param=0.0, nx_graph=None, in_feats=None, hidden_dim=None, dropout=0.0, learning_rate=0.0001, weight_decay=0.01, annealing=True, init_reg_param=-20.0, annealing_rate=0.001, curve_rate=2, num_epochs=100000, tol=0.0001, patience=1000, early_stop_disc_patience=None, check_interval=1000, device='cpu', seed=None, verbose=True, polish=True)

Train a CPRA multi-head PI-GNN solver and return an :class:AnnealResult.

CPRA (Continual Parallel Relaxation Annealing) is the multi-replica extension of CRA-PI-GNN introduced by Ichikawa & Iwashita, Transactions on Machine Learning Research, 2025 (OpenReview <https://openreview.net/forum?id=ix33zd5zCw>_). A single shared GCN backbone produces R continuous solutions in one forward pass, and the loss combines a per-replica QUBO term, the standard CRA penalty, and an optional inter-replica diversity term.

Two diversification regimes are supported:

  1. Penalty diversification — supply replica_problems (length num_replicas) where each problem instance differs in some hyperparameter (e.g. MaximumIndependentSet(g, penalty=p_r) for a sweep of p_r). One training run yields one solution per penalty level, much cheaper than independent runs.
  2. Variation diversification — leave replica_problems=None so every replica solves the same problem, but set vari_param > 0 to add the diversity term -R · Σ_i std_r(p_{i,r}) (sign chosen so the loss decreases when between-replica spread grows). Replicas then converge to structurally different solutions.
Parameters

problem: Base graph-based problem (used for graph extraction and as the default score_summary provider when replica_problems is None). num_replicas: Number of parallel continuous solutions R. Defaults to 4. replica_problems: Optional list of num_replicas problem instances. When provided, replica_problems[r].loss_fn evaluates the cost for replica r. Must share the same underlying graph as problem (only the QUBO Q_mat may differ — typically via a different penalty weight). When None, all replicas use problem.loss_fn. vari_param: Coefficient of the diversity term. 0 (default) is pure penalty diversification; positive values reward inter-replica spread (used for variation diversification on a fixed problem). nx_graph, in_feats, hidden_dim, dropout, learning_rate, weight_decay, annealing, init_reg_param, annealing_rate, curve_rate, num_epochs, tol, patience, check_interval, device, seed, verbose: Identical semantics to :func:train_cra_pi_gnn.

Returns

qqa.AnnealResult best_sol — the discrete (N,) assignment of the best replica (lowest QUBO objective on its own loss_fn). best_obj — that replica's float objective. history — per-epoch loss, mean_cost, reg_term, vari_term, reg_param arrays plus a per_replica_obj array of shape (epochs, R) for downstream visualisation. score['extra']['replicas'] — list of {replica, obj, score, sol} dicts so the caller can inspect every diversified solution, not only the best one.

Raises

ValueError On invalid num_replicas, vari_param sign, or a replica_problems list whose length does not match num_replicas.

Notes
  • Backbone vs. CPRA4CO. The reference CPRA implementation uses DGL GraphSAGE; this port reuses :class:GCNNet (a 2-layer GCNConv stack) for full parity with :func:train_cra_pi_gnn so head-to-head ablations across the two solvers measure the training objective, not the message-passing op.
  • Best-tracking. The reference CPRA loop returns the final iteration's discretised bits rather than the best-so-far solution. This trainer deliberately tracks the running best per replica — the QQA-side convention — to avoid losing a good early-epoch solution to a transient late spike.
  • History keys differ from :func:train_cra_pi_gnn. train_cra_pi_gnn reports per-epoch "cost"; CPRA reports "mean_cost" (per-replica average) because the raw cost scales linearly with R and is harder to compare across runs. When you mix the two trainers in a single plot, normalise by R yourself.
  • Replica collapse with vari_param=0 and replica_problems=None. With identical losses on every replica and shared embedding+backbone gradients, the R output channels drift toward the same fixed point. They start different (random init) and remain visibly distinct for the first few hundred epochs, but eventually collapse. For real variation diversification on a fixed problem, set vari_param > 0 (e.g. 0.1 to 0.5 works in practice).

qqa.pignn.model

GNN architectures for :mod:qqa.pignn.

Mirrors the reference GCN_dev from CRA4CO_:

  • a learnable per-node :class:torch.nn.Embedding provides the input features (no node attributes are assumed),
  • two stacked :class:torch_geometric.nn.GCNConv layers with a ReLU in-between and dropout,
  • a final :func:torch.sigmoid so the output p \in (0, 1)^N is immediately compatible with the QUBO loss problem.loss_fn(p) = p^T Q p.

.. _CRA4CO: https://github.com/Yuma-Ichikawa/CRA4CO

GCNNet

Bases: Module

Two-layer GCN with a learnable node embedding (PI-GNN style).

Parameters

num_nodes: Number of nodes in the graph (also the embedding table size). in_feats: Width of the input embedding. Defaults to floor(sqrt(N)) to match the reference implementation. hidden_dim: Hidden width between the two GCN layers. Defaults to in_feats. dropout: Dropout probability applied after the first GCN layer. Defaults to 0 — the original paper used 0 for the headline MIS results. num_replicas: Number of independent output channels. Defaults to 1, which preserves the single-head CRA-PI-GNN behaviour and keeps the forward output shape (N,). With num_replicas >= 2 the network becomes the CPRA multi-head backbone of Ichikawa & Iwashita (TMLR 2025) — a single shared embedding + GCN backbone produces R parallel continuous solutions in one forward pass and the output shape is (N, R).

Notes

The forward pass takes only edge_index because the node "features" are the learned embedding rows; they evolve through the same backward pass as the convolution weights. This is the standard PI-GNN trick from Schuetz et al. (Nature MI 2022). For num_replicas >= 2 only the second convolution's output channels grow — the embedding and first convolution are shared across replicas, matching the CPRA shared-representation design.

forward(edge_index)

Compute soft node assignments p \in (0, 1).

Parameters

edge_index: (2, 2|E|) long tensor produced by :func:qqa.pignn.graph.nx_to_edge_index.

Returns

torch.Tensor (N,) tensor of probabilities when num_replicas == 1 (CRA-PI-GNN compatibility), or (N, num_replicas) when num_replicas >= 2 (CPRA layout).

default_in_feats(num_nodes)

Heuristic used by the original CRA paper: floor(sqrt(N)).

qqa.pignn.graph

Graph extraction & PyG conversion utilities for :mod:qqa.pignn.

CRA-PI-GNN is a graph neural network method, so it only applies to QQA problems whose underlying combinatorial structure is a graph (MIS, MaxClique, MaxCut, VertexCover, GraphBisection). Spin-glass problems defined by a coupling matrix J (Edwards–Anderson, SK, perceptron, Hopfield, …) and pure categorical / permutation problems (TSP, QAP, NQueens, Knapsack, …) have no node-edge structure to convolve over and are silently rejected here with a clear TypeError.

extract_nx_graph(problem, override=None)

Return the networkx graph backing a QQA problem.

Parameters

problem: A :class:~qqa.problems.COProblem instance. The function checks problem.nx_graph first (used by MaximumIndependentSet, MaxClique, MaxCut) and falls back to problem.graph (used by VertexCover, GraphBisection). override: If supplied, used directly (for the rare case where a custom problem stores its graph elsewhere). The caller is then responsible for ensuring node labels are 0..N-1 and that the graph matches problem's QUBO matrix.

Returns

networkx.Graph With node labels 0..N-1.

Raises

TypeError If neither override nor any of the supported attribute names on problem resolve to a networkx.Graph. The error lists the supported problem families so the user can diagnose at a glance.

nx_to_edge_index(graph, device='cpu')

Convert a networkx graph to a symmetric PyG edge_index.

PyG's :class:~torch_geometric.nn.GCNConv expects edge_index of shape (2, 2|E|) listing both (u, v) and (v, u) for every undirected edge. Self-loops are not added here — :class:GCNConv inserts them internally when add_self_loops=True (the default).

Parameters

graph: Undirected networkx graph with node labels in 0..N-1. device: Target torch device for the returned tensor.

Returns

torch.Tensor (2, 2|E|) long tensor on device.