Benjamin Lannon

Select a theme. Click on the overlay or the button again to exit

GitHub Issues: Angry AI Spam Edition

Posted on:

In the past few years across the developer ecosystem, coding agents and AI powered auto complete has become a popular tool, but with them comes the separation of what they are capable of doing vs what people expect them to do. In particular, one of the first and major players in this space, VS Code with Copilot, has become a interesting case study for people getting quite upset that either the capabilities are not what they wanted, or the extreme of the coding agents deleting files or running commands that the user doesn't want.

How did we get here

In August 2021, Microsoft and OpenAI announced Codex, a model built on top of the GPT-3 series specifically for coding use cases (Not to be confused with the current model / CLI tool also called Codex). Alongside this, GitHub Copilot was also founded where it was a step forward beyond standard intellisense. Instead of just providing tab completion based on syntax & semantics of a given language and the tooling surrounding it, you could write comments or function signatures and Copilot would fill in the function. For those who were careful and heavily scrutinzed the code it outputted, it could possibly speed up the development flows of developers and reduce.

That said, it was constrained due to one of the biggest issues with LLMs, hallucinations. Early on, you could ask it to write a function using open source tools and Copilot would be happy to write a function call to an open source library that just didn't exist. As well, ChatGPT didn't arrive to the scene until November 2022, over a year after GitHub Copilot was unveiled, and these models were still early on that they couldn't say go browse the web and only relied on what existed inside their training data which wasn't always up to date.

Where we are today

Jump ahead 4 years and we're in a much different state than back then. From AI powered autocomplete being the main interface, the ecosystem grew with the addition of agents, Model Context Protocol, tool calling, among others. Now, if you want to, you could ask a tool like Copilot to implement a large scale feature just via a prompt in a chat box. It can try to use your design language, tooling that you have installed, and reach out to the web and APIs and not be reliant on just what is inside the underlying model's training data any more.

With that though, comes the crux of the issue in 2 spaces, either the model is too dumb for what we want it to do, or more reckless than what we want and goes outside the boundaries we would likely want it to. On the first, say we have a website and want to implement a new page, at first it likely will work fine, but over the course of time in a conversation, the context window will just fill up given a lot of the models have only 200k token window. It may end up throwing out your design language spec and just go with what the model thinks would be best. An example of this could be using TailwindCSS in a React project but after a few turns, it just decides to implement the styling with inline CSS. On the backend, this could look like implementing functions that already existed in your codebase for certain data crunching, and rather than using existing functions, it duplicates logic and complicates the project. Also, given these models are probabalistic, if you are using a very niche library or framework, the model is likely to make stuff up compared to if you used a more used library that was referenced a lot more in the training data.

Now going on to the latter, say you are working with Copilot and give it access to your terminal. Without strict guardrails that the models themselves can't get around, you are giving the LLM possible admin access to your machine, and what is stopping it from running rm -rf / and wiping your computer. As well, if you say had credentials to a remote production database, what is stopping it from clobbering the data in such, which I hope there would be backups for. At least for the tool I tend to use when letting an agent have access to a terminal, Claude Code, it will prompt you any time it wants to take some action and you can say yes or no to each. If that gets too tedious, Claude Code in this case has a --dangerously-skip-permissions flag that will always allow every command to go through without any prompts, but as it says in the name, you're then at the will of the model possibly running code or commands that you didn't want it to. I'll go over solutions to this particular issue later on in this post.

At the end of this, you may have a bunch of angry devs who are not happy with the LLMs after trying to use them. Enter the github issue tracker for VS Code. Given VS Code is one of the biggest open source text editors of the past decade, it was bound to have just the scale of a bunch of devs who get upset at these two situations. At least as of December 2025 when I wrote this, I could fairly quickly open the issue tracker and found the 7th newest issue titled "ChatGPT4.1 absolutely sucks!" complaining about the things I just mentioned including:

Broke al code, takes no responsibility, cannot perform simplistic taks or follow patterns without lying about it. not only unhelpoful, destructive and childish.

I'm not trying to call out this one developer in particular, but you can then scroll through the rest of the issue tracker and get similar issues like this. Some with more extreme / vulgar language, and others complaining that the models are garbage.

I'm not defending VS Code in this case exactly, but behind the scenes, if you're using Copilot and the generated code or commands run by a model say from Anthropic causes issues and frustration, Copilot may have some blame, but it is the case that the underlying model also is the issue as well.

What about the other coding agents

As part of this exploration, I was curious if the other tooling in this space had similar issues. I browsed the github repos for Claude Code, OpenAI Codex CLI, Gemini CLI, and Cursor and it seems like a mixed bag but nothing to the extreme of VS Code's repo. Given normal users can't see if a github issue was deleted by the owners of a repo, I can't verify if these projects just have good moderation teams, or just because they have smaller userbases it is not as extreme as VS Code.

Going through each:

What should we do?

For the first issue, I feel a "solution" to this is just better understanding of how these language models function. Undestanding the concepts of turns, context windows, and the places the model are still early on for. Also, using the tools in places you are comfortable in is a good place as you can pick out the nuance and subtle bugs of the output from these models if you have worked in that domain previously. I'm not saying you can't use a tool like Copilot to learn more about domains you aren't as knowledged in, but you are more likely to run into bugs and errors that you may not know how to immediately resolve.

For the second issue, when giving a coding agent access to your terminal and be able to write and modify code and files on your system, you should look into some type of sandboxing or container tooling to prevent your system from being wrecked. VS Code supports Dev Containers which allows you to take your current project and surround it in a docker container with your folder volume mounted in. With this, you now can run whatever tools you wish and if it goes to the extreme of something like rm -rf /, the only thing that would be "deleted" is your workspace which likely is backed up via version control on GitHub or other remote git platforms. Claude Code for example has a .devcontainer folder in their github repo where they give an example of this, but also include a firewall script that sets up the linux iptables tool to block any traffic outside of domains you trust. As well, if you use Docker Destkop, they offer a Docker Sandboxes feature that implements this similar setup to dev containers of encapsulating your project in a container. As well, in a more general sense, if you are having the coding agents connect to databases or APIs you manage, try to use development environments rather than the production systems to prevent descructive actions.

It is likely over time these sandboxing techniques will become easier to implement / possibly automatically built into these applications, but only time will tell. Until then, you just have to keep in mind the capability of these tools and have a secure environment such that if things go wrong, you can easily recover from such.