A Better Practice to Define Reward Model with HuggingFace's transformers
Cong Chen University of Edinburgh There are various implementation of reward modeling in RLHF(reinforcement learning with human feedback), each has different pros and cons. Inspired by some open-sourced works about reward modeling, I would like to share one of the best practice for reward modeling. For those who are not familiar with reward modeling and RLHF, I recommend take a look at the Huggingface rlhf blog1 or OpenAI rlhf paper2....