×

Building a Low-Cost AI Coding Assistant with Continue.dev, DeepSeek, and Ollama

image of Lanhui Chen
Lanhui Chen

September 17

Building a cost-efficient AI coding assistant doesn’t have to mean paying for expensive subscriptions. This article explores how developers can combine Continue.dev with DeepSeek’s affordable cloud API and Ollama’s local CodeLlama models to create a lightweight, hybrid coding assistant. The setup balances cloud power for complex reasoning with local models for fast autocompletion, all while keeping costs under control. Along the way, it highlights practical lessons—like avoiding config pitfalls, monitoring token usage, and leveraging discount pricing—that make AI-driven coding both effective and budget-friendly.
image of Building a Low-Cost AI Coding Assistant with Continue.dev, DeepSeek, and Ollama

Like many developers, I have been using ChatGPT as a coding assistant. It is powerful, versatile, and often produces good results. However, there is one recurring problem: whenever I need to work on real code, I have to switch away from my development environment, copy snippets into ChatGPT, paste back its suggestions, and repeat the process. This creates friction and sometimes breaks my focus.

On top of that, ChatGPT can only work with what I give it in the prompt. If I forget to include important details from the codebase, it often produces an incomplete or inaccurate solution. Tools that integrate directly into the codebase, such as some commercial AI coding assistants, can solve this problem, but they usually come with two trade-offs: higher subscription costs and a stronger dependency on their ecosystem. For my workflow, I wanted something more lightweight, more under my control, and more cost-efficient.

The idea was simple: try Continue.dev as the interface, connect it to a cloud model for heavy reasoning tasks, and use a local model for autocomplete and smaller edits. The goal was not to match the speed and quality of the top-tier paid tools, but to create a “low-spec” version that is good enough for daily development without a heavy financial commitment. Since I would not be relying on it to generate entire projects from scratch, performance could drop a little as long as it stayed usable.

The Setup

I chose DeepSeek as the cloud model because of its affordable token pricing and support for a large context window. For the local model, I used Ollama running CodeLlama 13B variants with one tuned for “instruct” style tasks and another for code completion.

In Continue.dev’s config.yaml, each model was assigned specific roles:

  • DeepSeek: Used for chat, complex edits, and applying larger changes.
  • CodeLlama 13B Instruct: Local backup for chat and edits when I wanted to avoid cloud usage.
  • CodeLlama 13B Code: Autocomplete within the editor.

I also added a few rules to keep the AI’s answers relevant to my tech stack, but I avoided overly long prompt instructions to reduce token usage. For example, instead of dumping an entire “style guide” into every request, I kept it short and targeted. Below is a sample config setup that will work if the API-key is correct.

name: Local + Cloud Minimal
version: 0.0.1
schema: v1

models:
- name: DeepSeek Coder (Cloud)
provider: deepseek
model: deepseek-coder
apiKey: ${{ secrets.DEEPSEEK_API_KEY }}
roles: [chat, edit, apply]
defaultCompletionOptions:
temperature: 0.2
maxTokens: 2048

- name: Ollama CodeLlama 13B Instruct (Local)
provider: ollama
model: codellama:13b-instruct-q4_K_M
roles: [chat, edit, apply, autocomplete]
defaultCompletionOptions:
temperature: 0.2

- name: Ollama CodeLlama 13B Code (Local Autocomplete)
provider: ollama
model: codellama:13b-code-q4_K_M
roles: [autocomplete]
defaultCompletionOptions:
temperature: 0.2

rules:
- Keep responses concise and focused on actionable code.
- Prefer PHP and CodeIgniter conventions when relevant.

context:
- provider: file
- provider: code

Challenges Along the Way

The first issue was configuration complexity. Continue.dev is flexible, but that flexibility means you need to manually set up each model, role, and parameter. For someone coming from a plug-and-play tool, this can feel overwhelming, especially when they provide an older version of the UI that misleads the user. The interface looks user friendly, seems to allow the user to choose from the models listed, and the only thing the user would need to do is enter the API key. However, in reality, the function is not working and requires the user to manually create the config file for API connection.

The second and more frustrating problem was token overconsumption. My original expectation was to use ten or twenty thousand tokens on average per request and a maximum of 2M tokens per day at most. In reality, something strange happened: generating a simple Java “Hello World” program ended up consuming more than twenty thousand tokens. This made no sense for such a small request, and suggested it could consume up to hundreds of thousands of tokens for a complex request, which made it unusable.

Even worse, I discovered that a misleading example in Continue.dev’s own documentation caused me to set a stop parameter in the config that looked harmless but created a serious bug. The stop: [“\n”] setting made the AI stop generating output every time it reached a newline in the code. Since code almost always contains newlines, the model would cut off instantly after the first line of its response. This led me to re-ask or retry multiple times, which massively inflated token usage.

In one debugging session, I burned through more than 200,000 tokens in less than an hour just trying to fix this issue. Looking back, that was the single biggest factor behind my early high costs.

The Turning Point
After removing the problematic stop setting and adding a sensible maxTokens limit, the abnormal token drain stopped. Now, generating a Java “Hello World” takes less than 1,000 tokens. Analyzing a small-sized file with around three hundred lines of code costs around 4,500 tokens, and most of that is input context rather than output.

This matters because, as DeepSeek’s pricing table shows, input tokens are much cheaper than output tokens, especially during discount hours. In my typical usage, about three-quarters of all tokens are input, split roughly half between cache hits and cache misses, and the remaining quarter are output tokens.

Cost Analysis
Let’s do some math with DeepSeek’s discount hour pricing:

  • Cache hit input tokens: $0.035 per million
  • Cache miss input tokens: $0.135 per million
  • Output tokens: $0.55 per million

If I use 1 million tokens in a day, with the typical split of 3/8 cache hit, 3/8 cache miss, and 1/4 output, the cost works out to:

(0.035 * 0.375) + (0.135 * 0.375) + (0.55 * 0.25)
≈ $0.21 per day

At that rate, even moderate usage costs only a few dollars per month. The numbers only start to look worse if I push usage above 5 million tokens per day, which is rare in my workflow. In extreme cases where I am working on a large refactor and asking the AI to analyze huge portions of the codebase multiple times, the cost could exceed the flat subscription price of some commercial tools, but those situations would be the same for those AI tools, and it’s very possible to hit their limits faster since not all cloud models are priced as low as DeepSeek.

The main advantage of buying API access directly is that usage is controllable. Some days I might spend several million tokens; on other days, when I am mostly writing code myself, I might spend only a few hundred thousand tokens. A fixed-price subscription cannot go lower on light-usage days, but a pay-per-token setup naturally scales down.

Practical Lessons Learned

  1. Avoid overly long system prompts or rules unless they are absolutely necessary. Every extra token in your instruction adds to the cost.
  2. Watch for configuration pitfalls like the stop parameter bug I encountered. A single wrong line in your config can silently multiply costs.
  3. Use discount hours if your schedule allows. The price difference is significant.
  4. Balance cloud and local models. Use the local model for quick autocompletion and small edits; save the cloud model for large context or reasoning tasks.
  5. Measure before optimizing. I only realized the cost structure after checking usage breakdowns, which showed how much was input vs output.

Final Thoughts

Continue.dev, when paired with the right combination of cloud and local models, can be a very effective and budget-friendly coding assistant. DeepSeek’s API pricing makes it especially attractive for this setup, as long as you keep an eye on token usage and avoid configuration mistakes that can inflate costs.

The hybrid approach of cloud plus local gives flexibility. The cloud model handles complex reasoning with access to a full codebase context, while the local model ensures you always have autocomplete and small edits without hitting the API. For developers who want more control over their spending, this setup can offer a great balance between capability and cost.

If your usage pattern is highly consistent and you are pushing millions of tokens every day, a fixed subscription tool might still make more sense. But if you want to keep the option to scale down, experiment with different models, and maintain independence from a single provider, building your own AI coding assistant with Continue.dev, DeepSeek, and Ollama is worth trying.

AI Generative Ai Tools Deepseek Productivity Ollama