Decoding the Controversy: A Closer Look at OpenRLHF vs. veRL

Hey there! Have you ever heard of OpenRLHF and veRL? It’s a bit of a buzz right now in the machine learning community. The original author of OpenRLHF recently stepped in to clear the air amid the drama, and I thought it was worth sharing what’s going on.

So, here’s the scoop. The author likens OpenRLHF to KartRider, the classic racing game, while veRL’s FSDP is compared to QQ Speed, suggesting it’s more of a copycat. This is quite the claim!

### Performance Similarities
The author argues that there’s no real difference in performance between OpenRLHF and veRL. Both frameworks use similar technology, like vLLM for inference and ZeRO3 for training. Interestingly, if you think there’s a significant performance gap, the author suggests that might be due to how it’s being used.

Just a heads-up: as of now, FSDP is a bit faster, but big updates are coming to DeepSpeed. So, the game isn’t over yet!

### On HybridFlow Free Scheduling
He also touches on something called HybridFlow. Basically, any RLHF framework that uses Ray can achieve this free scheduling. The author insists that what veRL calls HybridFlow is just a fancy term for Ray’s Placement Group API. OpenRLHF did this first, plus it has features that help avoid crashes when training large models.

### Who Came Up With What?
Another interesting point made is that the Hybrid Engine idea was actually introduced by DeepSpeedChat. Both frameworks support it now, showing some common ground in the community.

There’s also a focus on the combination of Ray, vLLM, and other technologies that makes for a straightforward and effective RLHF training solution. The original architecture was open-sourced by OpenRLHF, and veRL reportedly took inspiration from it.

### Respecting Originality
In the end, the author emphasizes the importance of recognizing originality in the open-source world. OpenRLHF is a zero-budget project compared to larger commercial ones like veRL. There’s a call for all of us in the community to appreciate contributions rather than undermining each other.

It’s fascinating to see how these frameworks develop and how creators defend their work. If you’re interested in RLHF, this is a space to watch! What are your thoughts on these claims? Is it fair to compare them, or do you see other nuances?

Let’s chat!