refactor: Task3 reward model changed, agent adjusted for new model 48661cd ajaxwin commited on 23 days ago