refactor: Task3 reward model changed, agent adjusted for new model 48661cd ajaxwin commited on 24 days ago