Announcing Moreau
We built Moreau to solve convex optimization problems orders of magnitude faster than existing solvers. It compiles problem structure once and solves batches of problems on the CPU or GPU without overhead, making it fast enough to embed in ML training pipelines, large-scale simulations, and real-time control. It is differentiable, batchable, and has native PyTorch and JAX interfaces.
Why Now?
Two things materially changed since we originally published the cvxpylayers paper and package in 2019. On the hardware/software side, GPUs have proliferated thanks to the transformer architecture for LLMs, and NVIDIA’s cuDSS library makes sparse linear algebra practical on GPUs, which is the core bottleneck in interior-point optimization methods.
On the demand side, AI is moving into the physical world — robotics, vehicles, energy grids, datacenters — where decisions have to satisfy safety/physics constraints and trade off competing objectives. This type of physical AI necessitates solvers that run in the loop, which requires differentiability, batching, and speed for learning. History has proven that optimization techniques are well-suited for these complex decision-making systems: model predictive control in robotics, portfolio optimization in finance, power grid dispatch solves in energy.
What is Moreau
Moreau solves convex conic programs:
where is a product of convex cones. This covers LPs, QPs, SOCPs, SDPs, that is, nearly all convex programs encountered in practice.
Moreau separates fixed problem structure from changing problem data. You define the sparsity pattern once and then Moreau compiles a solver optimized for your specific problem instance. Then, in a simulation, training loop, or online, you solve repeatedly with new problem data and zero per-problem overhead. This, along with batching and implicit (smoothed) differentiation of the solution map, makes embedding optimization layers in training loops practical. It runs on CPUs and NVIDIA GPUs, with AMD GPU and TPU support coming soon.
Speed
Single-instance solves
| Problem | Size | Moreau | Mosek | Clarabel | Speedup |
|---|---|---|---|---|---|
| Multi-Period OPF (LP) | 134K vars, 293K cons | 0.25 s | 5.9 s | 10.7 s | 24× |
| HVAC MPC (QP) | 123K vars, 181K cons | 3.8 s | 107 s | 146 s | 28× |
| Solar Data-Fitting (SOCP) | 102K vars, 104K cons | 8 s | — | 12 min | 90× |
Batched solves
| Problem | Size | Moreau | Mosek | Clarabel | Speedup |
|---|---|---|---|---|---|
| Robotics MPC (QP) | 1K vars, 1K cons × 512 | 79 ms | 29.2 s | 12.8 s | 162× |
| Portfolio (QP) | 2K vars, 2K cons × 1000 | 200 ms | 32 s | 54 s | 160× |
Moreau CUDA on H100 vs Mosek 11 and Clarabel on AMD EPYC 9554P (64-core, bare metal). Forward pass only, compile time excluded. All solvers at default tolerances. 3-trial medians. Batched Mosek/Clarabel times are multithreaded solves.
The dominant cost in an interior-point method is factoring a sparse linear system at each iteration. Moreau compiles the sparsity structure once, then re-solves with zero per-solve overhead. On the GPU, cuDSS handles the sparse linear algebra; on the CPU, optimized multithreaded Rust factorization. We also employ specialized linear algebra for structured problems (e.g. portfolio optimization, MPC) that often deliver 100x speedups over CPU solvers.
Differentiabilty
Moreau computes exact gradients through the KKT conditions of the solved problem. The backward pass solves a similar linear system to the one solved every iteration in the forward pass, so differentiation costs less than the solve itself. Forward and backward both run on the same device. For problems where exact gradients are degenerate (e.g. certain parameters in LPs), Moreau also supports user-specified smoothing of the gradients.
This is what makes Moreau more than just a fast solver that runs on GPUs. Any optimization problem you can express, you can now quickly and reliably differentiate through, which means you can learn the parameters of the problem end-to-end. Cost function weights, constraint bounds, dynamics models are some examples of parameters one could train.
Moreau also enables end-to-end learning of the ubiquitous predict-then-optimize architectures, where a neural network is used for prediction, which feeds into an optimization problem to ultimately make the decision. Previously, only the prediction part of the pipeline was trained, disregarding its effect on the optimize part.
As a concrete example: suppose you have an MPC controller with hand-tuned comfort weights for each zone in a building.
Instead of tuning by hand, define a loss over the resulting trajectories and backpropagate through the entire MPC solve to learn the weights.
Moreau makes this a single backward() call.
from moreau.torch import Solver
solver = Solver(n, m, P_row_offsets, P_col_indices,
A_row_offsets, A_col_indices, cones)
solver.setup(P_values, A_values) # tensors with requires_grad=True
solution = solver.solve(q, b)
loss = trajectory_cost(solution.x)
loss.backward() # gradients flow through the solve
The same pattern applies to learning cost functions for robotic manipulation, training optimization layers inside neural networks, and differentiable simulation with hard constraints.
Batching
Moreau solves many instances of the same problem structure in parallel. Since the sparsity problem is shared across the batch, only the numerical data varies. This is the natural interfaces for training loops and Monte-Carlo simulations. GPUs are particularly good at batching.
settings = moreau.Settings(batch_size=128)
solver = moreau.CompiledSolver(
n=2, m=3,
P_row_offsets=[0, 1, 2], P_col_indices=[0, 1],
A_row_offsets=[0, 2, 3, 4], A_col_indices=[0, 1, 0, 1],
cones=moreau.Cones(num_zero_cones=1, num_nonneg_cones=2),
settings=settings,
)
solver.setup(P_values=[1., 1.], A_values=[1., 1., 1., 1.]) # shared across batch
solution = solver.solve(qs=qs_batch, bs=bs_batch) # (128, n) and (128, m)
Applications
Control
MPC for 1,000-zone buildings, robotic manipulation, autonomous vehicle trajectory planning
Finance
Multi-period portfolio construction, trade execution, scenario-based risk analysis
Energy
Optimal power flow for large grids, battery dispatch, electricity market clearing
ML
Optimization layers in neural networks, learning constraint parameters, differentiable simulation
Who we are
We’re Optimal Intellect, a research lab from the team behind CVXPY and CVXPYlayers, building toward optimization as a first-class primitive in ML and physical AI.

Shane Barratt
Cofounder

Parth Nobel
Cofounder

Steven Diamond
Cofounder
We’re backed by Menlo Ventures, Anthropic, Soumith Chintala (creator of PyTorch), Trevor Capezza (co-founder Erebor), Jonny Dyer (co-founder Muon Space), and Matt Wytock (co-founder Gridmatic), among other angels.
Get started
Moreau is available now. Academic licenses are free - request access and you’ll have it today. Enterprise use requires a commercial license; see our pricing page for details.
If you have a large optimization workload, send us a problem instance and we’ll tell you if Moreau is a fit.