API reference¶
Below is the auto-generated documentation for the public modules. The Backends reference page is a hand-curated comparison if you only need to pick one entry point.
Top-level¶
qqa
¶
Quasi-Quantum Annealing (QQA) for combinatorial and spin-glass optimization.
Reference
Y. Ichikawa, Y. Arai. "Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling." ICLR 2025. https://openreview.net/forum?id=9EfBeXaXf0 (arXiv:2409.02135)
Typical usage::
import networkx as nx
import qqa
qqa.fix_seed(0)
g = nx.random_regular_graph(d=3, n=50, seed=0)
problem = qqa.MaximumIndependentSet(g, penalty=2)
result = qqa.anneal(problem, sol_size=100, num_epochs=1500)
print(result.best_obj, result.runtime)
Spin-glass example::
problem = qqa.SherringtonKirkpatrick(N=100, seed=0)
result = qqa.anneal(problem, sol_size=200, num_epochs=2000)
print("E_0 per spin:", result.best_obj / 100)
AnnealResult
dataclass
¶
Result returned by :func:anneal.
Attributes¶
best_sol:
Tensor of the best discrete solution(s) found during annealing. Shape
depends on the problem: (sol_size, N) for single-instance, or
(num_instance, max_node) for batched-instance problems.
best_obj:
Best objective value observed. float for single-instance problems,
numpy.ndarray of shape (num_instance,) for batched-instance.
runtime:
Wall-clock time of the annealing loop in seconds.
history:
Dict of per-epoch metrics (loss_mean, penalty_mean,
diversity, bg). Empty if record_history=False.
callbacks:
List of callback instances that were active. Useful for retrieving
e.g. TrajectoryTracker.values.
score = field(default_factory=dict)
class-attribute
instance-attribute
¶
Human-readable problem-specific score produced by
:py:meth:COProblem.score_summary.
- Single-instance: standard dict
{label, value, unit, feasible, extra}with scalar fields. - Batched-instance (
problem.num_instance > 1): same keys, butvalueandfeasiblearenp.ndarrayof lengthnum_instance, andextracarries arrays plus afeasible_counttally.scoreis empty for batched problems whose class did not override :py:meth:COProblem.score_summary.
polished_sol = None
class-attribute
instance-attribute
¶
1-flip-locally-optimal version of :attr:best_sol, populated when
:func:anneal is called with polish=True (the default) on a QUBO
problem. None for non-QUBO problems or when polishing is disabled.
For QUBO problems best_sol / best_obj / score are replaced
by the polished result whenever it is strictly better, so callers that
just read best_sol automatically benefit from the polish.
anneal(problem, *, sol_size=100, learning_rate=1.0, temp=0.0, schedule=None, min_bg=None, max_bg=None, curve_rate=2, div_param=0.0, num_epochs=10000, check_interval=1000, device='cpu', callbacks=(), record_history=True, verbose=True, mixed_precision='fp32', initial_state=None, polish=True)
¶
Run Quasi-Quantum Annealing on problem.
Parameters¶
problem:
Any :class:~qqa.problems.COProblem subclass. Must expose
loss_fn(x) and a relaxation attribute.
sol_size:
Number of parallel candidate solutions (batch size).
learning_rate:
AdamW learning rate for the relaxed variable.
temp:
Langevin noise temperature. If 0 no noise is added.
schedule:
Callable (epoch, num_epochs) -> bg. If None a
:class:LinearBGSchedule is built from min_bg/max_bg.
min_bg, max_bg:
Convenience override for the default linear schedule.
curve_rate:
Exponent of the QQA penalty (must be even for the convex regime).
div_param:
Weight of the diversity term. Set to 0 to disable.
num_epochs:
Number of gradient steps.
check_interval:
How often to print progress logs.
device:
torch device.
callbacks:
Additional callbacks. A :class:HistoryRecorder is prepended when
record_history=True.
record_history:
If True, loss/penalty/diversity/bg are recorded per epoch.
verbose:
If True, print periodic progress.
mixed_precision:
If "bf16" and device is CUDA, the forward pass and loss
evaluation run inside torch.amp.autocast with bfloat16 — typically
a 1.3x-2x speedup on Ampere/Hopper/Blackwell with negligible accuracy
loss for QUBO/spin objectives. The relaxed variable, gradients, and
AdamW state stay in float32 for numerical stability. Defaults to
"fp32" so behaviour is bit-for-bit identical to the legacy loop.
On CPU this option is silently downgraded to "fp32".
initial_state:
Optional warm-start bitstring. Shape (N,) (broadcast to every
chain with light Gaussian jitter so the population still
diversifies) or (sol_size, N) (used verbatim). Particularly
effective on near-bipartite MaxCut where
:func:qqa.warmstart.bfs_2color already gives a 0.96+ ApR seed.
Ignored for batched-instance problems.
polish:
If True (default), run :func:qqa.polish.greedy_one_flip on the
winning bitstring at the end of training and replace best_sol /
best_obj / score with the polished result whenever it is
strictly better. The polish costs O(N · #flips) and is silently
skipped for problems without a Q_mat (Spin / Categorical /
batched-instance), so it is safe to leave on globally.
fix_seed(seed)
¶
Seed Python/Numpy/Torch (CPU + CUDA) for deterministic runs.
.. note::
This call flips torch.backends.cudnn.deterministic to True for
the whole process, which disables some CuDNN auto-tuning and may slow
non-QQA code sharing the same interpreter. Call it once near the start
of your script (or test session) and not inside hot loops.
Problems¶
qqa.problems.base
¶
Abstract problem base classes.
Every problem class in QQA exposes:
loss_fn(x)— the (continuous or discrete) objective, vectorised over the leading batch dimension thatqqa.annealuses for the parallel population.relaxation— a :class:~qqa.relaxation.Relaxationinstance describing how the variable is represented during annealing.
Binary QUBO problems return losses of shape (B,) for a single graph, or
(B, I) for batched-instance variants. Categorical and spin problems
return losses of shape (B,).
COProblem
¶
Bases: ABC
Abstract base class for any combinatorial optimisation problem.
score_summary(x_disc)
¶
Problem-specific, human-readable breakdown of a discrete solution.
The default implementation evaluates :meth:loss_fn and reports the
raw loss. Concrete subclasses should override to return a dict with
label / value / unit / feasible / extra so the
dashboard can display e.g. "IS size: 22" instead of "loss: -22".
normalize_graph(graph)
¶
Return a graph whose nodes are 0, 1, ..., N-1.
Many QUBO constructors use node labels directly as matrix/tensor indices
(Q[u, v] = ...), so a graph whose nodes are {10, 20, 30} (or
strings, or a subset of range(N)) silently produces a wrong QUBO or
raises an IndexError. This helper returns graph unchanged when
the labels are already a contiguous 0..N-1 range and returns a
relabelled copy otherwise. It does not mutate the input.
qqa.problems.qubo
¶
Binary QUBO problems: MIS, MaxClique, MaxCut.
All classes compute loss = x^T Q x on the continuous relaxation
x \in [0, 1]^N supplied by :class:~qqa.relaxation.BinaryRelaxation
(or its batched variant). Minimising the loss is equivalent to solving the
corresponding combinatorial problem.
The *Instance variants pack a list of graphs of (possibly different)
sizes into a single (num_instance, max_node, max_node) Q tensor so the
solver can attack all of them in one qqa.anneal call. Each instance
carries a pad_mask that the loss / score_summary multiplies in to keep
padded positions semantically inert — the optimiser may put anything in
x[i, n_i:] because the mask zeroes its contribution to both the loss
and the reported objective.
MaximumIndependentSet
¶
Bases: QUBOProblem
MIS as a QUBO: diag(-1) with penalty on each edge.
The loss x^T Q x is -|S| + penalty * (#violated edges), so when
all constraints are satisfied, -loss equals the independent-set size.
MaximumIndependentSetInstance
¶
Bases: COProblem
Batched-instance MIS, padded to max_node and masked in the loss.
Parameters¶
nx_graph_list
Heterogeneous list of NetworkX graphs (any sizes n_i <= max_node).
max_node
Padding width. If None (recommended), uses
max(g.number_of_nodes() for g in nx_graph_list).
penalty
Edge-violation penalty. Either a scalar (broadcast to all instances)
or a per-instance sequence of length I.
device
Torch device for the dense Q_tensor and pad_mask.
Loss¶
loss[b, i] = (m_i ⊙ x_{b,i})^T Q_i (m_i ⊙ x_{b,i})
where m_i is the per-instance pad mask. The mask makes padded
positions strictly inert: anything the optimiser writes into
x[:, i, n_i:] is squashed to zero before the einsum.
score_summary(x_disc)
¶
Per-instance IS sizes & feasibility for best_sol of shape (I, N).
value and feasible are 1-D np.ndarray of length
num_instance; extra carries plain-Python lists / ints so
the whole dict is json.dumps-able (the bench runner relies on this).
MaxClique
¶
MaxCut
¶
qqa.problems.categorical
¶
Categorical (one-hot) problems: balanced graph partitioning and coloring.
BalancedGraphPartition
¶
qqa.problems.spin
¶
Physics-flavoured spin problems for QQA.
All classes in this module use :class:~qqa.relaxation.SpinRelaxation, so the
x tensor fed in during annealing lives in [0, 1] while
problem.loss_fn sees the transformed spin s = 2x - 1 \in [-1, +1]
(and exactly \pm 1 after rounding).
Energies follow physics conventions (lower is better):
- Ising 1D:
E = -sum_<i,j> J_{ij} s_i s_j - h sum_i s_i - Edwards-Anderson / SK / Hopfield:
E = -0.5 s^T J swith symmetricJanddiag(J) = 0(so the full sum equals-sum_<i,j> J_{ij} s_i s_j). - Binary perceptron: a smooth surrogate for the number of mis-classified teacher-student patterns.
SpinProblem
¶
Bases: COProblem
Base class for \pm 1 spin problems.
Subclasses must populate self.num_spins and attach a
:class:SpinRelaxation. Most subclasses also build a symmetric coupling
matrix self.J and rely on :meth:quadratic_energy.
quadratic_energy(s)
¶
Compute E = -0.5 s^T J s - h . s for a batch of spin configs.
s has shape (B, N); returns a 1D tensor of shape (B,).
Ising1D
¶
Bases: SpinProblem
One-dimensional Ising chain with nearest-neighbour coupling J.
Energy: E = -J sum_i s_i s_{i+1} - h sum_i s_i.
For J > 0 and h = 0 with periodic boundaries, the ground state is
all spins aligned with energy -J * N.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
N
|
int
|
Number of spins. |
required |
J
|
float
|
Uniform nearest-neighbour coupling strength. |
1.0
|
h
|
float
|
Uniform external field. |
0.0
|
periodic
|
bool
|
Whether to close the chain ( |
True
|
EdwardsAnderson
¶
Bases: SpinProblem
Edwards-Anderson spin-glass on a hyper-cubic lattice.
Only nearest-neighbour bonds are coupled, with J_{ij} \sim N(0, \sigma^2)
drawn once at construction time. The energy is
E = -0.5 s^T J s with symmetric J.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
L
|
int
|
Lattice side length ( |
required |
dim
|
int
|
Spatial dimension ( |
3
|
seed
|
int
|
RNG seed for the couplings. |
0
|
periodic
|
bool
|
Whether to use periodic boundary conditions. |
True
|
sigma
|
float
|
Standard deviation of the Gaussian couplings. |
1.0
|
from_couplings_txt(path, N, *, L=None, dim=3, periodic=True, device='cpu')
classmethod
¶
Load an EA instance from a text file of i j J_ij rows.
Compatible with the couplings_L{L}_R1_seed{seed}.txt format
produced by related projects: rows of i j J_ij with 0-based
indices. No metadata is assumed; N must be provided. Pass L
explicitly when N != L**dim (the cubic-root fallback below only
yields a correct L for cubic lattices).
SherringtonKirkpatrick
¶
Bases: SpinProblem
Sherrington-Kirkpatrick mean-field spin glass.
All-to-all couplings with J_{ij} \sim N(0, 1/N) for i \ne j and
J_{ii} = 0. Energy: E = -0.5 s^T J s.
The standard normalisation J_{ij} \sim N(0, 1/N) makes the typical
ground-state energy density e_0 = E_0 / N converge to
\approx -0.7632 (Parisi).
PSpinGlass
¶
Bases: SpinProblem
Dense p-spin Sherrington-Kirkpatrick model.
Energy:
.. math::
E = -\sum_{i_1 < i_2 < \dots < i_p} J_{i_1 i_2 \dots i_p}
s_{i_1} s_{i_2} \dots s_{i_p}
with i.i.d. Gaussian couplings drawn from
J \sim \mathcal{N}(0,\, p!/(2 N^{p-1})) so that the typical
ground-state energy density is intensive (Crisanti & Sommers, 1992
<https://doi.org/10.1051/jphys:0199200530100128300>_).
For p = 2 this reduces to the classical Sherrington-Kirkpatrick
model (use :class:SherringtonKirkpatrick for the symmetric J
matrix form). For p \ge 3 the model exhibits a discontinuous
1RSB freezing transition and is a canonical hard benchmark for
annealing solvers — small instances already form rugged landscapes
with exponentially many metastable states.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
N
|
int
|
Number of spins. |
required |
p
|
int
|
Interaction order ( |
3
|
seed
|
int
|
RNG seed for the couplings. |
0
|
RandomFieldIsing
¶
Bases: SpinProblem
Random-Field Ising Model on a hyper-cubic lattice.
Energy:
.. math::
E = -J \sum_{\langle i, j \rangle} s_i s_j - \sum_i h_i s_i,
\qquad h_i \sim \mathcal{N}(0,\, \sigma_h^2)
The couplings are uniform ferromagnetic (J > 0) and the disorder
sits in the local fields. This is one of the cleanest models with
quenched randomness in equilibrium statistical physics: in 3D with
Gaussian fields it has a well-studied finite-temperature phase
transition, and the zero-temperature ground-state problem at finite
\sigma_h / J is a classical optimisation benchmark (max-flow
polynomial-time exact solvers exist — useful as a sanity check for
annealing solvers).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
L
|
int
|
Lattice side length ( |
required |
dim
|
int
|
Spatial dimension (default |
2
|
J
|
float
|
Uniform ferromagnetic coupling. |
1.0
|
h_std
|
float
|
Standard deviation of the Gaussian random field. |
1.0
|
seed
|
int
|
RNG seed for the fields. |
0
|
periodic
|
bool
|
Periodic boundary conditions. |
True
|
BinaryPerceptron
¶
Bases: SpinProblem
Teacher-student binary perceptron in the storage formulation.
Patterns \xi^\mu \in \{-1, +1\}^N are drawn uniformly, and a
teacher s^* \in \{-1, +1\}^N generates labels
\sigma^\mu = sign(\frac{1}{\sqrt N} \xi^\mu \cdot s^*).
The learning loss is the number of patterns on which the student s
disagrees with the teacher. During annealing this is replaced by a
smooth surrogate \sum_\mu \sigma(-k z^\mu) where
z^\mu = \sigma^\mu \frac{1}{\sqrt N} \xi^\mu \cdot s and
\sigma is the logistic sigmoid, so gradients are available. After
rounding, :meth:error_count returns the exact number of errors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
N
|
int
|
Input dimension. |
required |
alpha
|
float
|
Loading |
0.5
|
seed
|
int
|
RNG seed for patterns and teacher. |
0
|
sharpness
|
float
|
Sigmoid steepness |
10.0
|
error_count(s)
¶
Exact number of mis-classified patterns for each config in s.
A pattern is counted as an error when the pre-activation z is
strictly negative; exact ties (z == 0) occur only for the rare
case where the teacher inner product vanishes and are treated as
correct.
HopfieldMemory
¶
Bases: SpinProblem
Hopfield associative-memory model with Hebbian couplings.
Given P patterns \xi^\mu \in \{-1, +1\}^N, the couplings are
J_{ij} = (1/N) \sum_\mu \xi^\mu_i \xi^\mu_j for i \ne j and
J_{ii} = 0. Energy: E = -0.5 s^T J s.
At a stored pattern s = \xi^\mu and low loading \alpha = P/N,
E \approx -N/2.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
N
|
int
|
Number of spins. |
required |
patterns
|
int | ndarray | Tensor | Sequence[Sequence[int]]
|
Either an integer |
1
|
seed
|
int
|
RNG seed used when |
0
|
overlap(s)
¶
Normalised overlap with every stored pattern.
Returns a tensor of shape (B, P) with
m^\mu_b = (1/N) \sum_i \xi^\mu_i s_{b,i}.
qqa.problems.extras
¶
Extra problem catalog for QQA.
Eight classic discrete-optimization problems that plug into the unified
qqa.anneal loop alongside the graph / spin / categorical problems
already shipped.
Design¶
- Every class exposes
loss_fn(x) -> (B,)that is minimised by QQA (so e.g. Knapsack returns the negative value of the packed items). - Every class exposes
score_summary(x_disc) -> dictreturning a human-readable breakdown of the best solution so the dashboard can print e.g. "packed value: 138 / 150, feasible: True". - Variable sizing attributes are chosen to match the relaxation
that each problem consumes:
- :class:
BinaryRelaxationexpectsnum_nodes. - :class:
SpinRelaxationexpectsnum_spins. - :class:
CategoricalRelaxationexpectsnum_node+num_category.
- :class:
NumberPartitioning
¶
Bases: COProblem
Classic number partitioning of N positive integers.
Given a_1, ..., a_N the goal is to split them into two subsets
whose sums are as close as possible, i.e. minimise
.. math:: \Bigl(\sum_i a_i s_i\Bigr)^2, \quad s_i \in {-1, +1}.
Uses :class:SpinRelaxation internally so loss_fn receives
:math:s \in [-1, 1]^{B\times N} directly.
Knapsack
¶
Bases: COProblem
0/1 knapsack.
Maximise :math:\sum_i v_i x_i subject to
:math:\sum_i w_i x_i \le C. We minimise
.. math:: -\sum_i v_i x_i + \lambda\,\bigl[\max(0, \sum_i w_i x_i - C)\bigr]^2
so solutions that respect the capacity dominate.
VertexCover
¶
Bases: COProblem
Minimum vertex cover on an undirected graph.
Select a minimum-size vertex subset that touches every edge; we use the QUBO form
.. math:: H = \sum_i x_i + \lambda \sum_{(u,v)\in E} (1 - x_u)(1 - x_v)
GraphBisection
¶
Bases: COProblem
Balanced graph bisection.
Partition vertices into two equal-size sets minimising the cut:
.. math:: H = \sum_{(u,v)\in E} (x_u - x_v)^2 + \lambda \bigl(\sum_i x_i - N/2 \bigr)^2
MinimumDominatingSet
¶
Bases: COProblem
Minimum dominating set on an undirected graph.
Select a minimum-size vertex subset :math:S such that every vertex
is either in :math:S or adjacent to some vertex in :math:S. We
minimise
.. math:: H(x) = \sum_v x_v + \lambda \sum_v (1 - x_v)\,\prod_{u \in N(v)} (1 - x_u),
where :math:x \in \{0, 1\}^N. The product factor equals 1 exactly
when v is unselected and none of its neighbours are selected
(i.e. v is uncovered), so the penalty term counts uncovered
vertices.
During annealing we use the same form on the continuous relaxation
:math:x \in [0, 1]^N. Because :math:(1 - x_u) \in [0, 1], the
product is bounded in :math:[0, 1] and matches the discrete
indicator at the corners. To keep the cost :math:O(B \cdot |E|) per
forward pass (no per-vertex Python loop) we evaluate the log-product
on the edges:
.. math:: \prod_{u \in N(v)} (1 - x_u) = \exp!\left( \sum_{u \in N(v)} \log(1 - x_u + \varepsilon) \right).
SATClause
dataclass
¶
Signed literals of a 3-SAT clause. lits[k] = (var_idx, sign).
MaxSAT3
¶
Bases: COProblem
Random 3-SAT / MaxSAT.
Minimise the number of unsatisfied 3-CNF clauses. A clause is
represented by three signed literals. Let :math:L_i(x) equal
:math:x_i when the literal is positive and :math:1 - x_i otherwise.
The clause is violated iff
:math:(1-L_1)(1-L_2)(1-L_3) = 1, which we sum across all clauses as
the (exact) loss.
TSP
¶
Bases: COProblem
Symmetric travelling salesperson — solved with the penalty method.
Variable encoding follows Lucas (2014) "Ising formulations of many NP
problems": x ∈ {0, 1}^{N×N} with x[t, i] = 1 iff city i
occupies tour position t. Three additive terms make every
constraint visible inside loss_fn:
- distance — :math:
\sum_t \sum_{i \neq j} d_{ij}\, x_{t,i}\, x_{t+1,j}(cyclic, so positionN-1wraps back to0). - row penalty — :math:
\lambda_r \sum_t (\sum_i x_{t,i} - 1)^2so every position holds exactly one city. - column penalty — :math:
\lambda_c \sum_i (\sum_t x_{t,i} - 1)^2so every city appears exactly once.
Unlike the previous CategoricalRelaxation-based formulation (where the
row constraint was enforced by softmax inside the relaxation, hiding
it from the loss), this version uses :class:BinaryRelaxation so QQA
sees both penalties as gradients and the user can tune their weights
from the Streamlit sidebar — a textbook penalty-method setup.
qqa.problems.user
¶
User-defined problem helper.
Drop-in wrapper that lets a user plug an arbitrary differentiable loss into
:func:qqa.anneal without having to subclass :class:COProblem::
import torch, qqa
J = torch.randn(50, 50); J = (J + J.T) / 2; J.fill_diagonal_(0)
problem = qqa.UserProblem(
num_vars=50,
variable_kind="spin",
loss_fn=lambda s: -0.5 * torch.einsum("bi,ij,bj->b", s, J, s),
)
result = qqa.anneal(problem, sol_size=128, num_epochs=1000)
Three variable kinds are supported:
"binary"—x \in [0, 1](rounded to{0, 1})."spin"—s = 2 \, \text{clip}(x, 0, 1) - 1 \in [-1, +1]and projected to\{-1, +1\}."categorical"— one-hot simplexx \in \Delta^Kper variable.
For categorical kinds the num_vars argument sets the number of categorical
variables and num_category must also be passed.
Loss functions must accept a tensor whose leading axis is the parallel batch
B = sol_size and return a tensor of shape (B,).
UserProblem
¶
Bases: COProblem
Wrap an arbitrary loss function as a :class:COProblem.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_vars
|
int
|
Number of discrete variables ( |
required |
loss_fn
|
Callable[[Tensor], Tensor]
|
Callable mapping a batched tensor to a |
required |
variable_kind
|
VariableKind
|
|
'binary'
|
num_category
|
int | None
|
Required when |
None
|
relaxation
|
Relaxation | None
|
Advanced — pass a custom :class: |
None
|
name
|
str
|
Optional display name used by the GUI/CLI. |
'user-problem'
|
device
|
str | device
|
Torch device hint (only used by QQA's allocation path; the loss itself is device-agnostic as long as any constants it captures live on the right device). |
'cpu'
|
user_problem_from_source(source, num_vars, variable_kind='binary', num_category=None, name='inline', device='cpu', extra_globals=None)
¶
Build a :class:UserProblem by exec-ing a Python snippet.
The snippet must define a callable named loss_fn (single argument, the
batched configuration tensor, returning a (B,) loss tensor). The
namespace has torch, np (numpy), and any extra_globals entries
pre-loaded, and the defined loss_fn is closed over that namespace.
.. warning:: This executes arbitrary Python code. Only use with trusted input (e.g. the local GUI or CLI on your own machine).
load_problem_from_file(path)
¶
Load a user-provided problem from a Python file.
The file must either define a top-level variable problem that is a
:class:COProblem instance, or a callable make_problem() / build()
that returns one.
Relaxations¶
qqa.relaxation
¶
Relaxation strategies for QQA.
A Relaxation defines how a combinatorial variable is represented as a
continuous tensor during annealing. It encapsulates:
- initialization of the relaxed variable,
- the transformation fed into
problem.loss_fn(forward), - the discrete projection used to evaluate the true objective (
project), - the quasi-quantum penalty function,
- the diversity term across the parallel batch,
- an in-place Langevin-style perturbation.
All relaxations operate on a leading batch dimension of size sol_size.
Relaxation
¶
Bases: Protocol
Protocol that any relaxation strategy must satisfy.
BinaryRelaxation
¶
Relaxation for binary variables x in [0, 1].
Used for QUBO problems (MIS, MaxClique, MaxCut) on either a single graph
(shape (sol_size, N)) or a batch of graphs via an instance problem
(shape (sol_size, I, N)).
SpinRelaxation
¶
Bases: BinaryRelaxation
Relaxation for ising-style spin variables s \in \{-1, +1\}.
Internally the latent representation x lives in [0, 1] (same as
:class:BinaryRelaxation), but :meth:forward maps it to the spin
s = 2 \, \text{clip}(x, 0, 1) - 1 so that problem.loss_fn can
safely work on real-valued spins in [-1, +1]. The discrete projection
thresholds at 0.5.
Because spin problems typically couple variables quadratically without a
convex QUBO structure, AdamW steps can push the latent x outside
[0, 1]; we clip before the forward so the effective spin stays in
[-1, +1], and :meth:perturb_ always clamps x back even when
temp == 0.
BinaryInstanceRelaxation
¶
Bases: BinaryRelaxation
Binary relaxation for batched instance problems.
Expects the problem to expose num_instance and max_node.
CategoricalRelaxation
¶
Relaxation for one-hot categorical variables.
Variable tensor shape: (sol_size, N, K). The forward pass normalises
across the category axis and project returns one-hot tensors.
penalty_from_forward(x, x_fwd, curve_rate)
¶
Optimised path: reuse the already-normalised tensor from forward.
Used by :func:qqa.anneal to avoid the simplex normalisation
re-running every epoch (the legacy penalty calls forward
internally, which doubled the cost on the hot path).
Schedules¶
qqa.schedule
¶
Annealing schedules for the penalty coefficient bg.
A schedule is any callable schedule(epoch, num_epochs) -> float.
The default linear schedule matches the original QQA paper.
LinearBGSchedule
dataclass
¶
Linear schedule bg(t) = min_bg + (max_bg - min_bg) * t / T.
When min_bg < 0 and max_bg > 0 the landscape transitions from
"convex, half-integer minima" (the quasi-quantum regime) to the discrete
regime where binary corners are favoured.
.. note::
The annealing loop calls this with epoch ∈ {0, ..., T-1}, so the
final value is min_bg + (T-1)/T * (max_bg - min_bg) rather than
exactly max_bg — the schedule approaches max_bg as T → ∞.
For short runs (small num_epochs) the last bg can therefore be
visibly smaller than max_bg. Increase num_epochs if you need to
reach max_bg (matching the published QQA curves), or supply a
custom callable for a different endpoint convention.
Callbacks¶
qqa.callbacks
¶
Callbacks for the QQA annealing loop.
Callbacks receive a CallbackState snapshot at the end of every epoch and
can record metrics, adjust hyper-parameters, or track auxiliary objectives.
CallbackState
dataclass
¶
Mutable context passed to callbacks at each epoch.
The annealing loop writes fields here. Callbacks may read any field and
may write to extras or mutate hyperparams (e.g. div_param).
Callback
¶
Base class. Override on_epoch_end (and optionally other hooks).
HistoryRecorder
¶
Bases: Callback
Record loss / penalty / diversity statistics per epoch.
Performance notes¶
The recorder is on the hot path of every annealing step. Naively calling
tensor.item() for every metric forces a CUDA device->host sync at
each epoch and dominates the wall-clock when the kernels themselves are
cheap (small problems / GPU). To avoid that:
- Per-epoch scalars are kept as GPU 0-dim tensors and appended to a
Python list. There is no
.item()call inside :meth:on_epoch_end. - :meth:
on_train_endperforms a singletorch.stack(...).cpu()per metric, which costs one sync regardless ofnum_epochs. strideskips intermediate epochs entirely (the last epoch is always recorded so that final-state observers see a non-empty history).
The exposed self.history dict is still a dict[str, list[float]]
(or list[list[float]] for best_obj of batched-instance problems),
so existing code that reads recorder.history["loss_mean"][-1] is
unchanged.
AutoDivTuner
¶
Bases: Callback
Adaptively tune div_param to target a desired diversity ratio.
At each epoch: ratio = diversity / (sol_size * N). The controller
nudges div_param by lr * (ratio - target) and clips to [0, 1].
PopulationTracker
¶
Bases: Callback
Snapshot the parallel population for post-hoc parallel-search visualisation.
Records, every stride epochs:
loss— the(sol_size,)per-replica loss.x— optionally, the continuous variables (heavier but lets you reconstruct PCA trajectories or per-variable heatmaps).
Attributes:
| Name | Type | Description |
|---|---|---|
epochs |
list[int]
|
list of recorded epochs. |
loss |
list[Any]
|
list of |
x |
list[Any]
|
list of |
Visualization¶
qqa.visualization
¶
Visualisation helpers for :class:~qqa.annealing.AnnealResult.
Every plotting function accepts a backend argument:
"matplotlib"(default) — static figures, no optional deps."plotly"— interactive figures, requirespip install qqa[plotly].
If Plotly is not installed and backend="plotly" is requested, the
functions automatically fall back to matplotlib with a warning.
Plot catalog:
- :func:
plot_history— loss / penalty / diversity dynamics. - :func:
plot_best_trajectory— best objective value over epochs. - :func:
plot_schedule— the annealing schedulebg(epoch). - :func:
plot_run_comparison— overlay multiple runs. - :func:
plot_parallel_coordinates— hyper-parameter sweep view. - :func:
plot_solution_heatmap— spins / bits of the best solution. - :func:
plot_population_evolution— parallel-population loss heat-map. - :func:
plot_population_embedding— PCA trajectory of the population.
plot_history(result, title='QQA dynamics', backend='matplotlib', show=True)
¶
Plot mean loss, mean penalty and diversity across epochs.
Returns the backend-native figure object ((fig, axes) for matplotlib,
go.Figure for plotly).
plot_best_trajectory(result, title='Best objective per epoch', backend='matplotlib', show=True)
¶
Plot best_obj vs epoch (monotonically non-increasing).
plot_schedule(schedule, num_epochs, title='Annealing schedule', backend='matplotlib', show=True)
¶
Visualise the bg annealing schedule over num_epochs.
plot_run_comparison(results, labels=None, title='Run comparison', backend='matplotlib', show=True)
¶
Overlay best_obj trajectories from multiple runs.
plot_parallel_coordinates(sweep_df, objective='best_obj', title='Hyperparameter sweep', backend='plotly', show=True)
¶
Parallel-coordinates plot of a hyper-parameter sweep.
sweep_df is expected to be a pandas.DataFrame (or any object with a
to_dict(orient="list") method) whose columns are the hyper-parameters
plus one objective column.
The Plotly backend produces a coloured interactive figure (recommended). Matplotlib falls back to a simple scatter-matrix-like rendering.
plot_solution_heatmap(result, problem=None, title='Best solution', backend='matplotlib', show=True)
¶
Render the best discrete solution as a 1D/2D heatmap.
For lattice spin problems (EdwardsAnderson with dim == 2) the
solution is reshaped to (L, L) automatically.
plot_population_evolution(tracker, title='Parallel population loss', backend='plotly', show=True)
¶
Render the parallel population's loss landscape across epochs.
tracker must be a :class:~qqa.callbacks.PopulationTracker instance
that captured snapshots during the run. Each column of the resulting
heat-map is one snapshot epoch, each row is one replica in the
sol_size population, and colour encodes loss. Rows are sorted once,
by final-epoch loss, to keep the panel readable.
The best trajectory is overlaid as a thin white curve.
plot_population_embedding(tracker, title='Population PCA trajectory', backend='plotly', show=True)
¶
2D PCA of the parallel population's continuous variables over time.
Projects every snapshot of tracker.x (shape (sol_size, N, ...))
onto the 2 principal components computed from the concatenation of all
snapshots, then draws the resulting trajectory as a scatter coloured by
epoch with replica paths drawn as light-grey lines.
Requires :class:~qqa.callbacks.PopulationTracker to have been run with
record_x=True.
Optional PyG backends¶
The functions below require the pignn extra
(pip install "qqa[pignn]").
qqa.pignn
¶
Optional PyTorch Geometric backend: CRA-PI-GNN and CPRA trainers.
This subpackage provides self-contained re-implementations of two GNN-based unsupervised-learning combinatorial-optimization solvers:
-
CRA-PI-GNN — Y. Ichikawa, "Controlling Continuous Relaxation for Combinatorial Optimization," NeurIPS 2024 (https://openreview.net/forum?id=ykACV1IhjD). Single-replica continuous-relaxation annealing on a 2-layer GCN.
-
CPRA — Y. Ichikawa & H. Iwashita, "Continuous Parallel Relaxation for Finding Diverse Solutions in Combinatorial Optimization Problems," TMLR 2025 (https://openreview.net/forum?id=ix33zd5zCw). A multi-head extension of CRA-PI-GNN that returns
Rdiverse solutions in one training run, supporting both penalty- and variation-diversification.
Both reference releases use DGL. Because DGL's prebuilt wheels do
not yet target NVIDIA Blackwell (sm_100) and lag the latest PyTorch
/ CUDA combos, we ship ports written in PyTorch Geometric so QQA
users on modern GPUs can compare against either method without juggling
DGL.
Why this lives here, not in the main qqa namespace¶
- PyG and its transitive deps are heavy. We do not want
import qqato pay for them when the user only needsqqa.anneal. - The trainers are backend alternatives to
qqa.anneal, not building blocks of it. Keeping them isolated also makes the README's "QQA vs. CRA-PI-GNN vs. CPRA" comparison story clear.
Quickstart¶
::
pip install "qqa[pignn]"
CRA-PI-GNN (single solution per run)::
import networkx as nx
import qqa
from qqa.pignn import train_cra_pi_gnn
qqa.fix_seed(0)
g = nx.random_regular_graph(d=3, n=200, seed=0)
problem = qqa.MaximumIndependentSet(g, penalty=2)
result = train_cra_pi_gnn(problem, num_epochs=4000)
print(result.score)
CPRA (R diverse solutions per run, e.g. one per penalty level)::
from qqa.pignn import train_cpra_pi_gnn
penalties = [1.5, 2.0, 2.5, 3.0]
replicas = [qqa.MaximumIndependentSet(g, penalty=p) for p in penalties]
result = train_cpra_pi_gnn(
problem,
num_replicas=len(replicas),
replica_problems=replicas,
num_epochs=4000,
)
for record in result.score["extra"]["replicas"]:
print(record["score"])
Both functions return a :class:qqa.AnnealResult, so every downstream
helper that consumes AnnealResult (visualisation, CLI scoring) keeps
working.
See also¶
- CRA reference (DGL): https://github.com/Yuma-Ichikawa/CRA4CO
- CPRA reference (DGL): https://github.com/Yuma-Ichikawa/CPRA4CO
GCNNet
¶
Bases: Module
Two-layer GCN with a learnable node embedding (PI-GNN style).
Parameters¶
num_nodes:
Number of nodes in the graph (also the embedding table size).
in_feats:
Width of the input embedding. Defaults to floor(sqrt(N)) to
match the reference implementation.
hidden_dim:
Hidden width between the two GCN layers. Defaults to in_feats.
dropout:
Dropout probability applied after the first GCN layer. Defaults
to 0 — the original paper used 0 for the headline MIS results.
num_replicas:
Number of independent output channels. Defaults to 1,
which preserves the single-head CRA-PI-GNN behaviour and keeps
the forward output shape (N,). With num_replicas >= 2
the network becomes the CPRA multi-head backbone of
Ichikawa & Iwashita (TMLR 2025) — a single shared embedding +
GCN backbone produces R parallel continuous solutions in
one forward pass and the output shape is (N, R).
Notes¶
The forward pass takes only edge_index because the node
"features" are the learned embedding rows; they evolve through the
same backward pass as the convolution weights. This is the standard
PI-GNN trick from Schuetz et al. (Nature MI 2022). For
num_replicas >= 2 only the second convolution's output channels
grow — the embedding and first convolution are shared across
replicas, matching the CPRA shared-representation design.
forward(edge_index)
¶
Compute soft node assignments p \in (0, 1).
Parameters¶
edge_index:
(2, 2|E|) long tensor produced by
:func:qqa.pignn.graph.nx_to_edge_index.
Returns¶
torch.Tensor
(N,) tensor of probabilities when num_replicas == 1
(CRA-PI-GNN compatibility), or (N, num_replicas) when
num_replicas >= 2 (CPRA layout).
train_cpra_pi_gnn(problem, *, num_replicas=4, replica_problems=None, vari_param=0.0, nx_graph=None, in_feats=None, hidden_dim=None, dropout=0.0, learning_rate=0.0001, weight_decay=0.01, annealing=True, init_reg_param=-20.0, annealing_rate=0.001, curve_rate=2, num_epochs=100000, tol=0.0001, patience=1000, early_stop_disc_patience=None, check_interval=1000, device='cpu', seed=None, verbose=True, polish=True)
¶
Train a CPRA multi-head PI-GNN solver and return an :class:AnnealResult.
CPRA (Continual Parallel Relaxation Annealing) is the multi-replica
extension of CRA-PI-GNN introduced by Ichikawa & Iwashita,
Transactions on Machine Learning Research, 2025
(OpenReview <https://openreview.net/forum?id=ix33zd5zCw>_).
A single shared GCN backbone produces R continuous solutions in
one forward pass, and the loss combines a per-replica QUBO term, the
standard CRA penalty, and an optional inter-replica diversity term.
Two diversification regimes are supported:
- Penalty diversification — supply
replica_problems(lengthnum_replicas) where each problem instance differs in some hyperparameter (e.g.MaximumIndependentSet(g, penalty=p_r)for a sweep ofp_r). One training run yields one solution per penalty level, much cheaper than independent runs. - Variation diversification — leave
replica_problems=Noneso every replica solves the sameproblem, but setvari_param > 0to add the diversity term-R · Σ_i std_r(p_{i,r})(sign chosen so the loss decreases when between-replica spread grows). Replicas then converge to structurally different solutions.
Parameters¶
problem:
Base graph-based problem (used for graph extraction and as the
default score_summary provider when replica_problems is
None).
num_replicas:
Number of parallel continuous solutions R. Defaults to 4.
replica_problems:
Optional list of num_replicas problem instances. When
provided, replica_problems[r].loss_fn evaluates the cost for
replica r. Must share the same underlying graph as
problem (only the QUBO Q_mat may differ — typically via
a different penalty weight). When None, all replicas use
problem.loss_fn.
vari_param:
Coefficient of the diversity term. 0 (default) is pure
penalty diversification; positive values reward inter-replica
spread (used for variation diversification on a fixed problem).
nx_graph, in_feats, hidden_dim, dropout, learning_rate,
weight_decay, annealing, init_reg_param, annealing_rate, curve_rate,
num_epochs, tol, patience, check_interval, device, seed, verbose:
Identical semantics to :func:train_cra_pi_gnn.
Returns¶
qqa.AnnealResult
best_sol — the discrete (N,) assignment of the best
replica (lowest QUBO objective on its own loss_fn).
best_obj — that replica's float objective.
history — per-epoch loss, mean_cost, reg_term,
vari_term, reg_param arrays plus a per_replica_obj
array of shape (epochs, R) for downstream visualisation.
score['extra']['replicas'] — list of
{replica, obj, score, sol} dicts so the caller can inspect
every diversified solution, not only the best one.
Raises¶
ValueError
On invalid num_replicas, vari_param sign, or a
replica_problems list whose length does not match
num_replicas.
Notes¶
- Backbone vs. CPRA4CO. The reference CPRA implementation uses
DGL
GraphSAGE; this port reuses :class:GCNNet(a 2-layerGCNConvstack) for full parity with :func:train_cra_pi_gnnso head-to-head ablations across the two solvers measure the training objective, not the message-passing op. - Best-tracking. The reference CPRA loop returns the final iteration's discretised bits rather than the best-so-far solution. This trainer deliberately tracks the running best per replica — the QQA-side convention — to avoid losing a good early-epoch solution to a transient late spike.
- History keys differ from :func:
train_cra_pi_gnn.train_cra_pi_gnnreports per-epoch"cost"; CPRA reports"mean_cost"(per-replica average) because the raw cost scales linearly with R and is harder to compare across runs. When you mix the two trainers in a single plot, normalise by R yourself. - Replica collapse with
vari_param=0andreplica_problems=None. With identical losses on every replica and shared embedding+backbone gradients, the R output channels drift toward the same fixed point. They start different (random init) and remain visibly distinct for the first few hundred epochs, but eventually collapse. For real variation diversification on a fixed problem, setvari_param > 0(e.g.0.1to0.5works in practice).
train_cra_pi_gnn(problem, *, nx_graph=None, in_feats=None, hidden_dim=None, dropout=0.0, learning_rate=0.0001, weight_decay=0.01, annealing=True, init_reg_param=-20.0, annealing_rate=0.001, curve_rate=2, num_epochs=100000, tol=0.0001, patience=1000, early_stop_disc_patience=None, check_interval=1000, device='cpu', seed=None, verbose=True, polish=True)
¶
Train a CRA-PI-GNN solver and return a :class:qqa.AnnealResult.
Parameters¶
problem:
A graph-based binary QUBO problem from :mod:qqa — typically
:class:~qqa.MaximumIndependentSet, :class:~qqa.MaxClique,
:class:~qqa.MaxCut, :class:~qqa.VertexCover, or
:class:~qqa.GraphBisection. Anything that exposes
problem.nx_graph and problem.loss_fn works.
nx_graph:
Override for problem.nx_graph (rare; only needed for custom
problems that store the graph elsewhere).
in_feats, hidden_dim:
GCN widths. Both default to floor(sqrt(N)), matching the
reference paper.
dropout:
Dropout probability between the two GCN layers. 0 (default)
reproduces the headline numbers in the paper.
learning_rate, weight_decay:
AdamW hyper-parameters. Defaults match the reference.
annealing:
If False the trainer reduces to vanilla PI-GNN
(reg_param = 0 for every epoch).
init_reg_param, annealing_rate:
CRA schedule: reg_param = init_reg_param + epoch * annealing_rate.
Set init_reg_param < 0 so the early-epoch loss landscape is
concave (encourages exploration); annealing_rate > 0 linearly
ramps it through 0 toward the discrete-favouring regime.
curve_rate:
Penalty exponent (must be even). Defaults to 2.
num_epochs:
Hard upper bound on gradient steps. Early stopping (see
tol / patience) usually terminates earlier.
tol, patience:
Stop when both the loss and the penalty change by less than
tol for patience consecutive epochs.
check_interval:
How often the verbose log is printed.
device:
Torch device. Strings like "cuda" are validated up-front to
give a clear error if CUDA is unavailable.
seed:
If supplied, calls :func:qqa.fix_seed before allocating the
model and embedding (so the run is reproducible).
verbose:
If True print periodic progress and a final summary.
Notes¶
The defaults here match the NeurIPS 2024 paper and are tuned for
large instances (N >= 1000). For small graphs (N <= 200) the
paper defaults severely under-converge: try
init_reg_param=-2.0, annealing_rate=5e-4, learning_rate=1e-3
(or even learning_rate=1e-2) instead. This is a standard PI-GNN
quirk — the optimal annealing schedule is sub-linear in problem size
because the cost / penalty magnitudes scale very differently.
Returns¶
qqa.AnnealResult
With best_sol of shape (N,) (rounded {0, 1} tensor),
best_obj the float QUBO loss on that solution, and
history containing per-epoch arrays
loss, cost, reg_term, reg_param.
Raises¶
TypeError
If problem is not graph-based (no nx_graph attribute and
no nx_graph override).
RuntimeError
If device requests CUDA but torch.cuda.is_available() is
False.
ValueError
On obviously-wrong arguments (curve_rate odd, num_epochs
negative, ...).
qqa.pignn.trainer
¶
CRA-PI-GNN trainer (PyTorch Geometric).
Faithful port of the fit_model loop from the reference
NeurIPS 2024 implementation [#cra]_, with two intentional changes:
- The graph backend is :mod:
torch_geometricinstead of DGL, so the trainer runs on every PyTorch-supported GPU (including NVIDIA Blackwell /sm_100for which DGL has no prebuilt wheel as of April 2026). - The loss / penalty mathematics are expressed via the QQA primitives
:meth:
qqa.problems.QUBOProblem.loss_fnand :meth:qqa.relaxation.BinaryRelaxation.penalty, so the function returns a :class:qqa.AnnealResultand is a drop-in alternative to :func:qqa.anneal. Numerically the loss is identical to the original paper forcurve_rate=2(and for any evencurve_rate):
.. math::
L(p; \gamma) \;=\; p^\top Q\, p
\;+\; \gamma \sum_i \bigl(1 - (1 - 2p_i)^c\bigr)
matches CRA's cost + reg_param * Σ (1 - (2p - 1)^c) because
(1 - 2p)^c = (2p - 1)^c for even c.
.. [#cra] Y. Ichikawa, "Controlling Continuous Relaxation for Combinatorial Optimization," NeurIPS 2024. https://github.com/Yuma-Ichikawa/CRA4CO
train_cra_pi_gnn(problem, *, nx_graph=None, in_feats=None, hidden_dim=None, dropout=0.0, learning_rate=0.0001, weight_decay=0.01, annealing=True, init_reg_param=-20.0, annealing_rate=0.001, curve_rate=2, num_epochs=100000, tol=0.0001, patience=1000, early_stop_disc_patience=None, check_interval=1000, device='cpu', seed=None, verbose=True, polish=True)
¶
Train a CRA-PI-GNN solver and return a :class:qqa.AnnealResult.
Parameters¶
problem:
A graph-based binary QUBO problem from :mod:qqa — typically
:class:~qqa.MaximumIndependentSet, :class:~qqa.MaxClique,
:class:~qqa.MaxCut, :class:~qqa.VertexCover, or
:class:~qqa.GraphBisection. Anything that exposes
problem.nx_graph and problem.loss_fn works.
nx_graph:
Override for problem.nx_graph (rare; only needed for custom
problems that store the graph elsewhere).
in_feats, hidden_dim:
GCN widths. Both default to floor(sqrt(N)), matching the
reference paper.
dropout:
Dropout probability between the two GCN layers. 0 (default)
reproduces the headline numbers in the paper.
learning_rate, weight_decay:
AdamW hyper-parameters. Defaults match the reference.
annealing:
If False the trainer reduces to vanilla PI-GNN
(reg_param = 0 for every epoch).
init_reg_param, annealing_rate:
CRA schedule: reg_param = init_reg_param + epoch * annealing_rate.
Set init_reg_param < 0 so the early-epoch loss landscape is
concave (encourages exploration); annealing_rate > 0 linearly
ramps it through 0 toward the discrete-favouring regime.
curve_rate:
Penalty exponent (must be even). Defaults to 2.
num_epochs:
Hard upper bound on gradient steps. Early stopping (see
tol / patience) usually terminates earlier.
tol, patience:
Stop when both the loss and the penalty change by less than
tol for patience consecutive epochs.
check_interval:
How often the verbose log is printed.
device:
Torch device. Strings like "cuda" are validated up-front to
give a clear error if CUDA is unavailable.
seed:
If supplied, calls :func:qqa.fix_seed before allocating the
model and embedding (so the run is reproducible).
verbose:
If True print periodic progress and a final summary.
Notes¶
The defaults here match the NeurIPS 2024 paper and are tuned for
large instances (N >= 1000). For small graphs (N <= 200) the
paper defaults severely under-converge: try
init_reg_param=-2.0, annealing_rate=5e-4, learning_rate=1e-3
(or even learning_rate=1e-2) instead. This is a standard PI-GNN
quirk — the optimal annealing schedule is sub-linear in problem size
because the cost / penalty magnitudes scale very differently.
Returns¶
qqa.AnnealResult
With best_sol of shape (N,) (rounded {0, 1} tensor),
best_obj the float QUBO loss on that solution, and
history containing per-epoch arrays
loss, cost, reg_term, reg_param.
Raises¶
TypeError
If problem is not graph-based (no nx_graph attribute and
no nx_graph override).
RuntimeError
If device requests CUDA but torch.cuda.is_available() is
False.
ValueError
On obviously-wrong arguments (curve_rate odd, num_epochs
negative, ...).
train_cpra_pi_gnn(problem, *, num_replicas=4, replica_problems=None, vari_param=0.0, nx_graph=None, in_feats=None, hidden_dim=None, dropout=0.0, learning_rate=0.0001, weight_decay=0.01, annealing=True, init_reg_param=-20.0, annealing_rate=0.001, curve_rate=2, num_epochs=100000, tol=0.0001, patience=1000, early_stop_disc_patience=None, check_interval=1000, device='cpu', seed=None, verbose=True, polish=True)
¶
Train a CPRA multi-head PI-GNN solver and return an :class:AnnealResult.
CPRA (Continual Parallel Relaxation Annealing) is the multi-replica
extension of CRA-PI-GNN introduced by Ichikawa & Iwashita,
Transactions on Machine Learning Research, 2025
(OpenReview <https://openreview.net/forum?id=ix33zd5zCw>_).
A single shared GCN backbone produces R continuous solutions in
one forward pass, and the loss combines a per-replica QUBO term, the
standard CRA penalty, and an optional inter-replica diversity term.
Two diversification regimes are supported:
- Penalty diversification — supply
replica_problems(lengthnum_replicas) where each problem instance differs in some hyperparameter (e.g.MaximumIndependentSet(g, penalty=p_r)for a sweep ofp_r). One training run yields one solution per penalty level, much cheaper than independent runs. - Variation diversification — leave
replica_problems=Noneso every replica solves the sameproblem, but setvari_param > 0to add the diversity term-R · Σ_i std_r(p_{i,r})(sign chosen so the loss decreases when between-replica spread grows). Replicas then converge to structurally different solutions.
Parameters¶
problem:
Base graph-based problem (used for graph extraction and as the
default score_summary provider when replica_problems is
None).
num_replicas:
Number of parallel continuous solutions R. Defaults to 4.
replica_problems:
Optional list of num_replicas problem instances. When
provided, replica_problems[r].loss_fn evaluates the cost for
replica r. Must share the same underlying graph as
problem (only the QUBO Q_mat may differ — typically via
a different penalty weight). When None, all replicas use
problem.loss_fn.
vari_param:
Coefficient of the diversity term. 0 (default) is pure
penalty diversification; positive values reward inter-replica
spread (used for variation diversification on a fixed problem).
nx_graph, in_feats, hidden_dim, dropout, learning_rate,
weight_decay, annealing, init_reg_param, annealing_rate, curve_rate,
num_epochs, tol, patience, check_interval, device, seed, verbose:
Identical semantics to :func:train_cra_pi_gnn.
Returns¶
qqa.AnnealResult
best_sol — the discrete (N,) assignment of the best
replica (lowest QUBO objective on its own loss_fn).
best_obj — that replica's float objective.
history — per-epoch loss, mean_cost, reg_term,
vari_term, reg_param arrays plus a per_replica_obj
array of shape (epochs, R) for downstream visualisation.
score['extra']['replicas'] — list of
{replica, obj, score, sol} dicts so the caller can inspect
every diversified solution, not only the best one.
Raises¶
ValueError
On invalid num_replicas, vari_param sign, or a
replica_problems list whose length does not match
num_replicas.
Notes¶
- Backbone vs. CPRA4CO. The reference CPRA implementation uses
DGL
GraphSAGE; this port reuses :class:GCNNet(a 2-layerGCNConvstack) for full parity with :func:train_cra_pi_gnnso head-to-head ablations across the two solvers measure the training objective, not the message-passing op. - Best-tracking. The reference CPRA loop returns the final iteration's discretised bits rather than the best-so-far solution. This trainer deliberately tracks the running best per replica — the QQA-side convention — to avoid losing a good early-epoch solution to a transient late spike.
- History keys differ from :func:
train_cra_pi_gnn.train_cra_pi_gnnreports per-epoch"cost"; CPRA reports"mean_cost"(per-replica average) because the raw cost scales linearly with R and is harder to compare across runs. When you mix the two trainers in a single plot, normalise by R yourself. - Replica collapse with
vari_param=0andreplica_problems=None. With identical losses on every replica and shared embedding+backbone gradients, the R output channels drift toward the same fixed point. They start different (random init) and remain visibly distinct for the first few hundred epochs, but eventually collapse. For real variation diversification on a fixed problem, setvari_param > 0(e.g.0.1to0.5works in practice).
qqa.pignn.model
¶
GNN architectures for :mod:qqa.pignn.
Mirrors the reference GCN_dev from CRA4CO_:
- a learnable per-node :class:
torch.nn.Embeddingprovides the input features (no node attributes are assumed), - two stacked :class:
torch_geometric.nn.GCNConvlayers with a ReLU in-between and dropout, - a final :func:
torch.sigmoidso the outputp \in (0, 1)^Nis immediately compatible with the QUBO lossproblem.loss_fn(p) = p^T Q p.
.. _CRA4CO: https://github.com/Yuma-Ichikawa/CRA4CO
GCNNet
¶
Bases: Module
Two-layer GCN with a learnable node embedding (PI-GNN style).
Parameters¶
num_nodes:
Number of nodes in the graph (also the embedding table size).
in_feats:
Width of the input embedding. Defaults to floor(sqrt(N)) to
match the reference implementation.
hidden_dim:
Hidden width between the two GCN layers. Defaults to in_feats.
dropout:
Dropout probability applied after the first GCN layer. Defaults
to 0 — the original paper used 0 for the headline MIS results.
num_replicas:
Number of independent output channels. Defaults to 1,
which preserves the single-head CRA-PI-GNN behaviour and keeps
the forward output shape (N,). With num_replicas >= 2
the network becomes the CPRA multi-head backbone of
Ichikawa & Iwashita (TMLR 2025) — a single shared embedding +
GCN backbone produces R parallel continuous solutions in
one forward pass and the output shape is (N, R).
Notes¶
The forward pass takes only edge_index because the node
"features" are the learned embedding rows; they evolve through the
same backward pass as the convolution weights. This is the standard
PI-GNN trick from Schuetz et al. (Nature MI 2022). For
num_replicas >= 2 only the second convolution's output channels
grow — the embedding and first convolution are shared across
replicas, matching the CPRA shared-representation design.
forward(edge_index)
¶
Compute soft node assignments p \in (0, 1).
Parameters¶
edge_index:
(2, 2|E|) long tensor produced by
:func:qqa.pignn.graph.nx_to_edge_index.
Returns¶
torch.Tensor
(N,) tensor of probabilities when num_replicas == 1
(CRA-PI-GNN compatibility), or (N, num_replicas) when
num_replicas >= 2 (CPRA layout).
default_in_feats(num_nodes)
¶
Heuristic used by the original CRA paper: floor(sqrt(N)).
qqa.pignn.graph
¶
Graph extraction & PyG conversion utilities for :mod:qqa.pignn.
CRA-PI-GNN is a graph neural network method, so it only applies to QQA
problems whose underlying combinatorial structure is a graph (MIS,
MaxClique, MaxCut, VertexCover, GraphBisection). Spin-glass problems
defined by a coupling matrix J (Edwards–Anderson, SK, perceptron,
Hopfield, …) and pure categorical / permutation problems (TSP, QAP,
NQueens, Knapsack, …) have no node-edge structure to convolve over and
are silently rejected here with a clear TypeError.
extract_nx_graph(problem, override=None)
¶
Return the networkx graph backing a QQA problem.
Parameters¶
problem:
A :class:~qqa.problems.COProblem instance. The function checks
problem.nx_graph first (used by MaximumIndependentSet,
MaxClique, MaxCut) and falls back to problem.graph
(used by VertexCover, GraphBisection).
override:
If supplied, used directly (for the rare case where a custom
problem stores its graph elsewhere). The caller is then
responsible for ensuring node labels are 0..N-1 and that the
graph matches problem's QUBO matrix.
Returns¶
networkx.Graph
With node labels 0..N-1.
Raises¶
TypeError
If neither override nor any of the supported attribute names
on problem resolve to a networkx.Graph. The error lists
the supported problem families so the user can diagnose at a
glance.
nx_to_edge_index(graph, device='cpu')
¶
Convert a networkx graph to a symmetric PyG edge_index.
PyG's :class:~torch_geometric.nn.GCNConv expects edge_index of
shape (2, 2|E|) listing both (u, v) and (v, u) for every
undirected edge. Self-loops are not added here — :class:GCNConv
inserts them internally when add_self_loops=True (the default).
Parameters¶
graph:
Undirected networkx graph with node labels in 0..N-1.
device:
Target torch device for the returned tensor.
Returns¶
torch.Tensor
(2, 2|E|) long tensor on device.