Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
2023-09-11
-
Artificial Intelligence,
Information Processing | Computing,
Research
Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng and Tri Dao (together.ai) unveil Medusa, a simpler, more user-friendly framework for accelerating LLM generation.