I have recently released the initial version of the chat-splitter crate. The purpose of this blog post is to offer an overview of what chat-splitter is, the issue it addresses, and how it operates.


The context length problem

Large language models face challenges when processing lengthy sequences of text. State-of-the-art models, like OpenAI’s GPT-4, have a maximum token limit that restricts their ability to handle contexts beyond a certain threshold. The maximum context window typically ranges from 2k tokens for legacy GPT-3 models to as high as 32k tokens for the most recent GPT-4 model, and potentially more1. This limitation presents a significant obstacle when generating coherent and meaningful responses. For example, in the context of building a chat application, what should be done when the conversation becomes too long to fit within the model’s context window?

What is chat-splitter?

chat-splitter is a Rust library designed specifically to split chat messages for OpenAI’s chat models. Its main purpose is to ensure that these messages consistently adhere to the context length limitations imposed by large language models, as discussed. By handling the splitting process, chat-splitter aims to simplify the development of applications that utilize these models.

The code for chat-splitter is open-source and can be accessed on GitHub. Additionally, it is available on crates.io for easy integration into projects. For comprehensive documentation, please visit docs.rs.

Integrating chat-splitter into your project

To incorporate chat-splitter into your Rust project, you can easily add it as a dependency using Cargo.

$ cd your/rust/project
$ cargo add chat-splitter

Next, create an instance of the ChatSplitter object:

use chat_splitter::ChatSplitter;
let cs = ChatSplitter::default();

There are two methods available to control the splitting process: by setting a maximum message count and by specifying a maximum token count2:

let cs = ChatSplitter::new("gpt-3.5-turbo")  // same as ::default()
    .max_messages(20)
    .max_tokens(1000);

ℹ️ The max_tokens parameter in this context refers to the number of tokens that will remain in the context window for the large language model to complete. It is the same max_tokens value that you would use when making requests to the OpenAI API.

The main function is split, which categorizes chat messages into two groups: those that fit within the context and the remaining messages3:

let chat_messages: Vec<_> = /* way too many messages */;
let (rest, fit) = cs.split(&chat_messages);

For more detailed information and practical examples, please refer to the chat-splitter documentation and the GitHub repository. In particular, you may find the examples/chat.rs file to be helpful.

How does it work?

chat-splitter leverages tiktoken-rs for token counting and is compatible with any message type that implements the IntoChatCompletionRequestMessage trait. The crate provides built-in implementations for chat message types defined by async-openai and tiktoken-rs. However, it is also effortless to implement the trait for your own custom message types4.

The algorithm used by chat-splitter first splits the chat messages based on the message count, which is a simple process. It then performs token-based splitting, which involves token counting and determining a split position that maximizes the number of messages within a given token budget. Token counting relies on tokenizers from tiktoken-rs, and a naive implementation with a time complexity of $\mathcal{O}(n)$5 would not be suitable for certain real-world scenarios6.

To ensure optimal performance, chat-splitter utilizes a binary search algorithm implemented in indxvec, which has a worst-case time complexity of $\mathcal{O}(\log_2{n})$. This approach significantly improves efficiency compared to a naive implementation that would require up to 2048 counts, which is the maximum message count limit in OpenAI’s API. With chat-splitter, the token counting process will never exceed eleven iterations in this scenario.


I hope you find chat-splitter beneficial for your projects. I welcome any suggestions or contributions through pull requests on the GitHub repository . If you have any questions or need further assistance, please feel free to reach out to me on Twitter.


  1. Researchers are actively working on addressing this limitation. For example, Anthropic has recently announced a Claude model with a 100K context window↩︎

  2. Additional methods of controlling the splitting process may be introduced in the future↩︎

  3. While the rest messages may be disregarded in many applications, there are interesting use cases for them, particularly in the context of long-term chat memory↩︎

  4. Perhaps there is a demand for a dedicated crate for this purpose? 📦 ↩︎

  5. In this context, $n$ represents the number of messages considered in the chat after the initial splitting by messages. Therefore, it is important to note that $n$ can be significantly smaller than the actual input slice. ↩︎

  6. For additional information related to this topic, please consult this issue in the tiktoken-rs repository. ↩︎