Speculative Decoding: When Two LLMs are Faster than One

Channel: Efficient NLP
34K views • 12/06/2024