Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Posts

publications

Paper Title Number 1

Published in Journal 1, 2009

This paper is about the number 1. The number 2 is left for future work.

Recommended citation: Your Name, You. (2009). "Paper Title Number 1." Journal 1. 1(1).
Download Paper | Download Slides

Paper Title Number 2

Published in Journal 1, 2010

This paper is about the number 2. The number 3 is left for future work.

Recommended citation: Your Name, You. (2010). "Paper Title Number 2." Journal 1. 1(2).
Download Paper | Download Slides

Paper Title Number 3

Published in Journal 1, 2015

This paper is about the number 3. The number 4 is left for future work.

Recommended citation: Your Name, You. (2015). "Paper Title Number 3." Journal 1. 1(3).
Download Paper | Download Slides

Paper Title Number 4

Published in GitHub Journal of Bugs, 2024

This paper is about fixing template issue #693.

Recommended citation: Your Name, You. (2024). "Paper Title Number 3." GitHub Journal of Bugs. 1(3).
Download Paper

PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation

Published in arXiv preprint arXiv:2509.19128, 2025

We present PipelineRL, a fully pipelined on-policy reinforcement learning stack for large language models that keeps accelerators saturated without generating stale, off-policy trajectories. The system overlaps rollout, preference modeling, and policy updates to deliver up to 2× faster convergence on long-context reasoning benchmarks while retaining stability guarantees for PPO-style algorithms.

Recommended citation: Alexandre Piché, Ehsan Kamalloo, Rafael Pardinas, Xiaoyin Chen, Dzmitry Bahdanau. (2025). "PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation." arXiv preprint arXiv:2509.19128.
Download Paper

Alex Piché

Sitemap

Pages

Page Not Found

About

Archive Layout with Content

Posts by Category

Posts by Collection

CV

Markdown

Page not in menu

Page Archive

Portfolio

Publications

Sitemap

Posts by Tags

Talk map

Talks and presentations

Teaching

Terms and Privacy Policy

Blog posts

Jupyter notebook markdown generator

Posts

publications

Paper Title Number 1

Paper Title Number 2

Paper Title Number 3

Paper Title Number 4

PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation

talks

AI safety and reward misspecification

Probabilistic Planning with Sequential Monte Carlo methods

LLMs can learn self-restraint through iterative self-reflection

PipelineRL: RL training speed through the roofline