Sim2Science: ML with Imperfect Scientific Models
NeurIPS 2026 Workshop | Venue TBD
Important Dates
| Submission Deadline: | TBD |
| Author Notification: | TBD |
| Camera-Ready Deadline: | TBD |
| Workshop Date: | December 2026, TBD |
All deadlines are 23:59 Anywhere on Earth (AOE)
About the Workshop
AI4Science has matured into an established field, with ML now embedded throughout the simulator-based workflows of the natural sciences. Much of this progress runs through simulators—mechanistic models hand-crafted by domain experts and fit to data—that encode our scientific theories and underpin prediction, parameter inference, experimentation, and decision-making. Across fields, ML now accelerates, emulates, and infers within these simulator-based workflows.
Yet the progress of ML in science rests on a fragile foundation. "All models are wrong, but some are useful" cuts to the core of AI4Science: an ML method coupled to a simulator can only be as good as that simulator. Simulators are wrong for concrete reasons—they deliberately simplify complex systems, omit poorly understood or computationally intractable physics, and depend on uncertain parameters—inducing an unavoidable discrepancy between simulated and observed data. Left unaccounted for, this discrepancy propagates through the coupled ML–simulator system and biases the scientific conclusions we draw and the real-world decisions we make from it.
Scientists manage this wrongness by maintaining, for the same phenomenon, a hierarchy of simulators at different fidelities. In chemistry and materials, density-functional theory spans a hierarchy of exchange–correlation functionals and basis sets, with classical force fields as a cheaper tier. In fusion, fast core-transport simulators stand alongside high-fidelity turbulence simulations. In neuroscience, detailed biophysical circuit models reduce to point-neuron and mean-field approximations. The same pattern recurs in climate, astrophysics, and fluid dynamics. Although the sources of wrongness differ, these hierarchies share a vocabulary—multi-fidelity and multi-resolution modelling—and feed the same downstream tasks: parameter and state inference, uncertainty quantification, control, and decision-making.
The central question of this workshop is: How can we best leverage imperfect scientific simulators when confronted with real-world data, and how can ML help to account for and mitigate limitations in simulator-based workflows across a wide range of domains?
We pursue two complementary aims:
- (i) to define how machine learning can systematically detect, absorb, and correct model misspecification when integrating simulators with empirical data, and
- (ii) to foster cross-domain synergy by establishing a shared understanding of how simulator discrepancies are identified and quantified across the natural sciences.
Tagline: Cross-domain machine learning for imperfect, misspecified scientific simulators.
Topics of Interest
We invite contributions on theory and methods as well as applications spanning biology, chemistry, physics, materials science, climate science, and related fields. Topics include:
- Simulation-based inference and related parameter inference methods
- Understanding and mitigating model misspecification, including simulator diagnostics and discrepancy modeling
- Emulator and surrogate modeling, as well as hybrid and physics-informed approaches
- Analysis of simulator structure, degeneracy, simplifications, and identifiability
- Simulator pipelines, including data handling, preprocessing, and integration with downstream ML models
- Active learning and Bayesian optimization for fitting parameters or model components
- Closed-loop and experiment-in-the-loop scientific workflows
- Multi-fidelity and multi-resolution modeling
- (Agentic) model and equation discovery
- Differentiable frameworks, LLM-assisted scientific reasoning, and workflow automation
Contact
For questions or inquiries about the workshop, please contact us at:
sim2science@gmail.com