LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards (May

Channel: AI Paper Slop
26 views • 2w ago