DRL for Combinatorial Optimization – IJCAI-ECAI 2026 Tutorial

About the Tutorial

Abstract

Combinatorial optimization problems (COPs) arise across transportation, manufacturing, healthcare, and many other domains, but their NP-hard nature and discrete solution spaces make them less suitable to traditional gradient-based methods.

Deep reinforcement learning (DRL) has emerged as a compelling alternative, offering the ability to automatically learn high-quality solution strategies with minimal manual design. The primary focus of this tutorial is a hands-on, step-by-step guide to developing end-to-end DRL solutions for COPs. Participants will be walked through the four essential design pillars 1) instance encoding schemes, 2) Markov decision process formulations, 3) neural network architectures, and 4) reinforcement learning training algorithms, with vehicle routing and job shop scheduling serving as concrete running examples throughout.

Attendees will leave with the practical knowledge to design and implement a DRL-based solver solution from scratch, a research overview of how these methods extend to complex real-world variants, and an overview of the open challenges that define the field.

Program

Tutorial Schedule

Part 1 35 min

Motivation and Design Pillars

05 min

Motivation and Background

Why DRL for combinatorial optimization? Problem landscape and real-world relevance.

15 min

The Four Design Pillars: A Conceptual Overview

Deep dive into instance encoding, MDP formulation, and neural architectures.

15 min

Overview of DRL Algorithms

Brief intro to policy gradient methods, actor-critic, REINFORCE, and modern variants.

Part 2 55 min

DRL for Classical COPs and Beyond

15 min

DRL for Classical COPs

Routing (VRP/TSP) and scheduling (JSP) as concrete running examples.

15 min

Beyond Classical TSP and JSP

Extending to variants with multiple objectives, constraints, and uncertainties.

15 min

DRL for Real-World Problems

DRL as an end-to-end paradigm for solving problems from diverse domains.

05 min

Conclusions and Q&A

Open questions from the audience.

Part 3 1 h 30 min

Hands-On Exercise

15 min

Open Challenges and Discussion

For example: scalability, generalization, and hybrid methods.

5 min

Recap and Setup

Environment installation and notebook walkthrough.

10 min

Code Walkthrough: Instance Encoding

Representing problem instances.

10 min

Code Walkthrough: MDP Formulations

State, action, transition, and reward design.

10 min

Code Walkthrough: Neural Network Architectures

Suitable NN architecture design for problem.

10 min

Putting It All Together

End-to-end training loop.

25 min

Exercise: Exploration and Experimentation

Explore the environment and problem charachteristics.

5 min

Wrap-up

Key takeaways and pointers to further reading.

Resources

Tutorial Materials

All materials are based on the tutorial paper authored by the organizers. Slides, notebooks, and code will be released here.

📄

Tutorial Paper

Wu, Bukhsh & Zhang (2025). Deep Reinforcement Learning for Combinatorial Optimization: A Tutorial.

→

📊

Slides

Presentation slides for all four parts of the tutorial.

Coming Soon

💻

Jupyter Notebooks

Hands-on coding exercises covering all four design pillars with example(s).

Coming Soon

🐍

Code Repository

Full Python codebase with setup instructions.

Coming Soon

People

Organizers

ZB

Zaharah Bukhsh

Assistant Professor · TU/e

Research on deep reinforcement learning for learning-based optimization methods addressing resource utilization, planning, and scheduling. Publications in JMLR, Machine Learning, ICML, ECAI.

Google Scholar DBLP

YW

Yaoxin Wu

Assistant Professor · TU/e

Research spans deep learning, combinatorial optimization, and integer programming. Published in ICML, ICLR, NeurIPS, AAAI, KDD. 2000+ Google Scholar citations. Area chair at major ML venues.

Google Scholar DBLP

YZ

Yingqian Zhang

Associate Professor · TU/e

Chair of BNVKI (Benelux AI Association). Initiated the Data Science meets Optimisation workshop series at IJCAI (2019–2026). Tutorials at SIKS, EURO PhD School, ACP Summer School, SAIL Spring School, and IJCAI 2024.

Google Scholar DBLP

JR

Jesse van Remmerden

PhD Student · TU/e

Research on DRL for complex scheduling and planning, bridging theoretical models and real-world deployment. Published in Neural Computing and Applications and Machine Learning on multi-objective DRL and offline RL for JSP.

Google Scholar DBLP

YY

Yue Yu

PhD Student · TU/e

Research on neural combinatorial optimization and DRL, particularly for iterative double auction mechanisms and vehicle routing. Contributions include diffusion-guided frameworks for auctions and cluster-aware attention modules. Published in Transportation Research Part E.

Contact