A hands-on, step-by-step guide to designing end-to-end DRL solutions for combinatorial optimisations problems!
Combinatorial optimization problems (COPs) arise across transportation, manufacturing, healthcare, and many other domains, but their NP-hard nature and discrete solution spaces make them less suitable to traditional gradient-based methods.
Deep reinforcement learning (DRL) has emerged as a compelling alternative, offering the ability to automatically learn high-quality solution strategies with minimal manual design. The primary focus of this tutorial is a hands-on, step-by-step guide to developing end-to-end DRL solutions for COPs. Participants will be walked through the four essential design pillars 1) instance encoding schemes, 2) Markov decision process formulations, 3) neural network architectures, and 4) reinforcement learning training algorithms, with vehicle routing and job shop scheduling serving as concrete running examples throughout.
Attendees will leave with the practical knowledge to design and implement a DRL-based solver solution from scratch, a research overview of how these methods extend to complex real-world variants, and an overview of the open challenges that define the field.
Why DRL for combinatorial optimization? Problem landscape and real-world relevance.
Deep dive into instance encoding, MDP formulation, and neural architectures.
Brief intro to policy gradient methods, actor-critic, REINFORCE, and modern variants.
Routing (VRP/TSP) and scheduling (JSP) as concrete running examples.
Extending to variants with multiple objectives, constraints, and uncertainties.
DRL as an end-to-end paradigm for solving problems from diverse domains.
Open questions from the audience.
For example: scalability, generalization, and hybrid methods.
Environment installation and notebook walkthrough.
Representing problem instances.
State, action, transition, and reward design.
Suitable NN architecture design for problem.
End-to-end training loop.
Explore the environment and problem charachteristics.
Key takeaways and pointers to further reading.
All materials are based on the tutorial paper authored by the organizers. Slides, notebooks, and code will be released here.
Wu, Bukhsh & Zhang (2025). Deep Reinforcement Learning for Combinatorial Optimization: A Tutorial.
Presentation slides for all four parts of the tutorial.
Hands-on coding exercises covering all four design pillars with example(s).
Full Python codebase with setup instructions.
Research on deep reinforcement learning for learning-based optimization methods addressing resource utilization, planning, and scheduling. Publications in JMLR, Machine Learning, ICML, ECAI.
Research spans deep learning, combinatorial optimization, and integer programming. Published in ICML, ICLR, NeurIPS, AAAI, KDD. 2000+ Google Scholar citations. Area chair at major ML venues.
Chair of BNVKI (Benelux AI Association). Initiated the Data Science meets Optimisation workshop series at IJCAI (2019–2026). Tutorials at SIKS, EURO PhD School, ACP Summer School, SAIL Spring School, and IJCAI 2024.
Research on DRL for complex scheduling and planning, bridging theoretical models and real-world deployment. Published in Neural Computing and Applications and Machine Learning on multi-objective DRL and offline RL for JSP.
Research on neural combinatorial optimization and DRL, particularly for iterative double auction mechanisms and vehicle routing. Contributions include diffusion-guided frameworks for auctions and cluster-aware attention modules. Published in Transportation Research Part E.