Stoplight
Design, document, and build APIs faster.

The industry-standard benchmark and dataset for semantic code search and neural code representation learning.

CodeSearchNet is a pivotal research project and dataset developed by GitHub and Microsoft Research to evaluate the state of semantic code search. As of 2026, it remains a foundational benchmark for Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems specialized in software engineering. The technical architecture revolves around a collection of 6 million methods across six programming languages (Go, Java, JavaScript, PHP, Python, and Ruby), with 2 million of these paired with high-quality natural language documentation. The project provides not only the data but also baseline neural models based on Transformers, Bi-RNNs, and Self-Attention mechanisms. In the current market, it serves as the primary dataset for fine-tuning 'Code-to-Text' and 'Text-to-Code' models, enabling developers to build tools that understand the intent behind code rather than just matching keywords. Its integration with Weights & Biases (WandB) allows for standardized experiment tracking, ensuring that modern AI architects can objectively measure Mean Reciprocal Rank (MRR) improvements when iterating on code-search algorithms. Despite newer datasets like 'The Stack', CodeSearchNet's curated pairings make it indispensable for training intent-aware code intelligence systems.
CodeSearchNet is a pivotal research project and dataset developed by GitHub and Microsoft Research to evaluate the state of semantic code search.
Explore all tools that specialize in semantic code retrieval. This domain focus ensures CodeSearchNet delivers optimized results for this specific requirement.
Explore all tools that specialize in code summarization training. This domain focus ensures CodeSearchNet delivers optimized results for this specific requirement.
Explore all tools that specialize in fine-tuning code llms. This domain focus ensures CodeSearchNet delivers optimized results for this specific requirement.
Explore all tools that specialize in embedding generation. This domain focus ensures CodeSearchNet delivers optimized results for this specific requirement.
Explore all tools that specialize in zero-shot code search evaluation. This domain focus ensures CodeSearchNet delivers optimized results for this specific requirement.
Explore all tools that specialize in code completion. This domain focus ensures CodeSearchNet delivers optimized results for this specific requirement.
Includes six distinct programming languages with standardized JSONL formatting for cross-lingual training.
Includes implementations for Neural Bag-Of-Words (NBoW), Bi-RNN, CNN, and Self-Attention (Transformer) models.
A standardized evaluation framework for measuring the effectiveness of code search results.
Native hooks for WandB to track hyperparameters, loss curves, and evaluation metrics in real-time.
Data is hosted on public Amazon S3 buckets, optimized for high-speed downloads for distributed training clusters.
Encourages the development of models that map both natural language and code into a shared vector space.
Supports containerized environments to ensure environment parity and reproducible results.
Clone the official GitHub repository github.com/github/CodeSearchNet
Ensure Python 3.x and PyTorch are installed in a virtual environment
Download the raw dataset from the public S3 bucket using the provided setup scripts
Install dependencies using 'pip install -r requirements.txt'
Configure Weights & Biases (WandB) for experiment tracking and logging
Run the data preprocessing scripts to convert raw S3 files into JSONL format
Initialize the baseline model (e.g., Transformer or NeuralBagOfWords) using the 'train.py' script
Monitor training convergence via the WandB dashboard
Execute the evaluation script to calculate the Mean Reciprocal Rank (MRR) on the test set
Export trained embeddings or models for integration into production search systems
All Set
Ready to go
Verified feedback from other users.
"Highly regarded in the AI research community as the gold standard for code-search benchmarking, though some users note it requires significant GPU resources for full training."
Post questions, share tips, and help other users.
Design, document, and build APIs faster.
Digital developers who are actually easy to work with.
Open Source LLM Engineering Platform

The Open-Source Framework for Reinforcement Learning in Quantitative Finance.

Enterprise-grade Python library for modular backtesting and quantitative financial market analysis.

Static bytecode analysis to identify potential defects and vulnerabilities in Java applications.