Question d’entretien chez Huawei Technologies

How do you use RL to optimize LLM?