Getting more from each token: How Copilot improves context handling and model routing

Wed, Jun 17 · 7:41 PM UTC 2 min read

Compiled by KHAO Editorial — aggregated from 1 source. See llms.txt for citation guidance.

★ Tier-1 Source

Figure 1: Three HyDRA operating points illustrate tunability: (Peak) exceeds Sonnet at 12.9% savings; (Agg.) balances quality for 72.5% savings.

As Copilot takes on more agentic work, from planning and editing to debugging, reviewing, and calling tools across longer sessions, efficiency means more than using fewer tokens.

Key facts

The team trained the routing model on conversations across 16 language families, including CJK, European, and others
Auto with task intent is already live in Visual Studio Code, github.com, and mobile
Two improvements in GitHub Copilot for VS Code are doing most of the work here
A session may need access to MCP tools, terminal commands, file operations, workspace search, and product-specific actions

Summary

Increasing efficiency starts with reducing what Copilot has to repeat from turn to turn, including context, tool definitions, and cached state. The team are working on both: improving the Copilot harness so more of each session goes toward the task itself, and expanding Auto so Copilot can pick the model that fits the work without asking developers to make that choice every time. In longer GitHub Copilot sessions in VS Code, the harness prepares a lot of recurring information for the model: instructions, repository context, conversation history, available tools, and the current state of the task. Two improvements in GitHub Copilot for VS Code are doing most of the work here.

Read full article at GitHub Blog →

#Copilot #GitHub