Summary
Google researchers have introduced Infini-attention, a technique allowing large language models to process infinite text length without increasing memory and compute needs. It extends the models' 'context window', improving performance. The technique maintains quality over one million tokens without extra memory, using a 'compressive memory' module for extended inputs. In tests, Infini-attention outperformed other models, maintaining coherence with less memory.