长上下文LLM问答的主流技术路线总结
Published:
This post explores four key techniques for enhancing large language models (LLMs) in long-context scenarios. It begins with Retrieval-Augmented Generation (RAG), which retrieves relevant knowledge snippets to serve as context. It then discusses sparse attention mechanisms, such as BigBird and Longformer, which improve efficiency by connecting only selected tokens. The post also introduces context compression methods like MemoryBank, enabling LLMs to retain essential user information across dialogues. Finally, it highlights MemAgent, a system that recursively summarizes long inputs and leverages memory for reasoning, reinforced using GRPO.