IJCAI-ECAI 2026 · Tutorial Track · Half Day

Deep Reinforcement Learning
for Combinatorial Optimization

A hands-on, step-by-step guide to designing end-to-end DRL solutions for combinatorial optimisations problems!

Abstract

Combinatorial optimization problems (COPs) arise across transportation, manufacturing, healthcare, and many other domains, but their NP-hard nature and discrete solution spaces make them less suitable to traditional gradient-based methods.

Deep reinforcement learning (DRL) has emerged as a compelling alternative, offering the ability to automatically learn high-quality solution strategies with minimal manual design. The primary focus of this tutorial is a hands-on, step-by-step guide to developing end-to-end DRL solutions for COPs. Participants will be walked through the four essential design pillars 1) instance encoding schemes, 2) Markov decision process formulations, 3) neural network architectures, and 4) reinforcement learning training algorithms, with vehicle routing and job shop scheduling serving as concrete running examples throughout.

Attendees will leave with the practical knowledge to design and implement a DRL-based solver solution from scratch, a research overview of how these methods extend to complex real-world variants, and an overview of the open challenges that define the field.

Tutorial Schedule

Part 1 35 min

Motivation and Design Pillars

05 min
Motivation and Background

Why DRL for combinatorial optimization? Problem landscape and real-world relevance.

15 min
The Four Design Pillars: A Conceptual Overview

Deep dive into instance encoding, MDP formulation, and neural architectures.

15 min
Overview of DRL Algorithms

Brief intro to policy gradient methods, actor-critic, REINFORCE, and modern variants.

Part 2 55 min

DRL for Classical COPs and Beyond

15 min
DRL for Classical COPs

Routing (VRP/TSP) and scheduling (JSP) as concrete running examples.

15 min
Beyond Classical TSP and JSP

Extending to variants with multiple objectives, constraints, and uncertainties.

15 min
DRL for Real-World Problems

DRL as an end-to-end paradigm for solving problems from diverse domains.

05 min
Conclusions and Q&A

Open questions from the audience.

Part 3 1 h 30 min

Hands-On Exercise

15 min
Open Challenges and Discussion

For example: scalability, generalization, and hybrid methods.

5 min
Recap and Setup

Environment installation and notebook walkthrough.

10 min
Code Walkthrough: Instance Encoding

Representing problem instances.

10 min
Code Walkthrough: MDP Formulations

State, action, transition, and reward design.

10 min
Code Walkthrough: Neural Network Architectures

Suitable NN architecture design for problem.

10 min
Putting It All Together

End-to-end training loop.

25 min
Exercise: Exploration and Experimentation

Explore the environment and problem charachteristics.

5 min
Wrap-up

Key takeaways and pointers to further reading.

Tutorial Materials

All materials are based on the tutorial paper authored by the organizers. Slides, notebooks, and code will be released here.

📄

Tutorial Paper

Wu, Bukhsh & Zhang (2025). Deep Reinforcement Learning for Combinatorial Optimization: A Tutorial.

📊

Slides

Presentation slides for all four parts of the tutorial.

Coming Soon
💻

Jupyter Notebooks

Hands-on coding exercises covering all four design pillars with example(s).

Coming Soon
🐍

Code Repository

Full Python codebase with setup instructions.

Coming Soon

Organizers

ZB

Zaharah Bukhsh

Assistant Professor · TU/e

Research on deep reinforcement learning for learning-based optimization methods addressing resource utilization, planning, and scheduling. Publications in JMLR, Machine Learning, ICML, ECAI.

YW

Yaoxin Wu

Assistant Professor · TU/e

Research spans deep learning, combinatorial optimization, and integer programming. Published in ICML, ICLR, NeurIPS, AAAI, KDD. 2000+ Google Scholar citations. Area chair at major ML venues.

YZ

Yingqian Zhang

Associate Professor · TU/e

Chair of BNVKI (Benelux AI Association). Initiated the Data Science meets Optimisation workshop series at IJCAI (2019–2026). Tutorials at SIKS, EURO PhD School, ACP Summer School, SAIL Spring School, and IJCAI 2024.

JR

Jesse van Remmerden

PhD Student · TU/e

Research on DRL for complex scheduling and planning, bridging theoretical models and real-world deployment. Published in Neural Computing and Applications and Machine Learning on multi-objective DRL and offline RL for JSP.

YY

Yue Yu

PhD Student · TU/e

Research on neural combinatorial optimization and DRL, particularly for iterative double auction mechanisms and vehicle routing. Contributions include diffusion-guided frameworks for auctions and cluster-aware attention modules. Published in Transportation Research Part E.