Mon Sep 23 17:11:56 UTC 2024: ## Storia.ai Unveils New Code Retrieval Benchmark for AI Copilots

**Storia.ai, the company behind the innovative code-chatting repository, has released a new benchmark designed to improve the accuracy and performance of AI copilots.** This dataset, developed in collaboration with Morph Labs, focuses on retrieving relevant code from entire codebases, rather than isolated functions, offering a more realistic evaluation of AI copilot capabilities.

**The benchmark, consisting of 1,000 question-answer-document pairs focused on Hugging Face’s Transformers library, enables researchers to test various retrieval strategies and techniques.** Storia.ai’s initial findings using this benchmark highlight key areas for optimization:

* **Embeddings:** OpenAI’s “text-embedding-3-small” model consistently outperforms other options, including Voyage’s code-specific embeddings.
* **Reranking:** NVIDIA’s reranker emerges as the top choice among popular proprietary APIs, including Voyage, Cohere, and Jina.
* **Retrieval Methods:** Dense retrieval, where documents are selected based on their cosine distance to the query, significantly surpasses both sparse and hybrid retrieval methods. Sparse methods, relying on exact string matching, proved counterproductive, leading to the selection of irrelevant natural language files like READMEs over crucial Python code.
* **Chunking Strategy:** A chunk size of 800 tokens strikes a good balance between performance and indexing time, with only marginal gains observed from smaller chunk sizes.

**These findings are crucial for building more robust and accurate AI copilots that can effectively understand and retrieve information from complex codebases.** Storia.ai emphasizes the importance of providing flexibility and finding optimal defaults, making it clear they are not just dumping code but building tools to make it work efficiently.

The full benchmark dataset and further analysis are expected to be made public soon.

Read More