Neurosymbolic (NeSy) models integrate neural networks with symbolic reasoning for robust and interpretable AI. State-of-the-art NeSy models require the symbolic component to be expressed differentiably, which complicates the use of approximate inference. EM-NeSy recasts probabilistic NeSy learning as an instance of the Expectation-Maximization algorithm. In the E-step, the posterior over neurally predicted symbols conditioned on the label is computed via probabilistic inference. In the M-step, neural parameters are updated based on this posterior using gradient descent solely through the neural component — unlocking the full potential of EM for NeSy learning, with no differentiability requirements on the symbolic side.
Standard NeSy learning backpropagates gradients through both the symbolic and neural components. EM-NeSy breaks this dependency. The symbolic component no longer needs to be differentiable — it simply needs to compute a posterior. That posterior becomes the training signal for the neural component.
NeSy learning is reframed as a latent variable model. The subsymbolic input X is observed. The symbolic output Z — the neural component's prediction — is latent. The target Y is observed and inferred from Z by the symbolic component encoding background knowledge.
The E-step is inference-engine-agnostic. Each approach has tradeoffs. EM-NeSy doesn't require a new approach per method — the framework stays fixed while the inference engine is swapped.
Three benchmarks. All combine visual perception with symbolic reasoning under weak supervision — no direct annotation on intermediate symbols, only the final label.
| Method | 4 digits acc% | 15 digits acc% | 100 digits acc% | 4d time | 15d time | 100d time |
|---|---|---|---|---|---|---|
| A-NeSI | 93.28 | 55.88 | overflow | 25.75 | 57.55 | — |
| EXAL | 90.71 | 62.62 | 6.67 | 4.17 | 42.44 | 372.20 |
| DeepStochLog (exact) | 92.70 | T/O | T/O | — | T/O | T/O |
| BP-std (exact) | 92.16 | 72.79 | 11.60 | 2.21 | 7.93 | 36.26 |
| BP-EM (ours) | 92.53 | 74.85 | 9.20 | 1.33 | 1.95 | 8.51 |
| Method | 4×4 acc% | 9×9 acc% | 4×4 time | 9×9 time |
|---|---|---|---|---|
| A-NeSI | 87.20 | 59.20 | 46.08 | 73.33 |
| Scallop | 75.00 | T/O | 1.52 | T/O |
| SL (exact) | 86.70 | T/O | — | T/O |
| BP-std | 87.80 | 0.50 | 38.23 | — |
| BP-EM (ours) | 89.70 | 0.50 | 21.23 | — |
| ABC-EM (ours) | 86.30 | 53.30 | 1.40 | 8.24 |
| Method | 12×12 acc% | 30×30 acc% | 12×12 time | 30×30 time |
|---|---|---|---|---|
| A-NeSI | 98.96 | 67.57 | 439.10 | 1596.51 |
| I-MLE | 95.34 | 93.40 | 26.77 | 227.82 |
| EXAL | 94.19 | 80.85 | 11.1 | 84.3 |
| SPL (exact) | 78.20 | T/O | — | T/O |
| ABC-EM (ours) | 98.80 | 69.60 | 7.26 | 8.37 |
Standard NeSy models require inference-specific modifications to stay differentiable under approximate inference. EM-NeSy eliminates that requirement entirely, providing one unified framework that handles any inference method without modification.
THE
SYMBOLIC
COMPONENT
DOESN'T
NEED TO
BACKPROP.