chatgpt

Reflections on ChatGPT and LLMs for programming

Peeyush Man Singh

Jun 8, 2023 • 4 min read

ChatGPT is a chatbot that is built on a family of Large Language Models (LLMs) collectively known as GPT-3. GPT-3 (Generative Pre-trained Transformer 3) is a language model released by OpenAI that uses deep learning to produce human-like text. Language models work by trying to predict text using pre-existing models. The input prompt is used as a key to simulate different models and output text to the user. When given a prompt, it will generate text that continues the prompt. In ChatGPT, human feedback is then used as reinforcement learning for fine-tuning the responses. This means that human feedback is used to further train the tool to improve responses to other users as well.

ChatGPT is being widely used to generate code by developers. ChatGPT is indeed capable of writing code from scratch, as well as working with pre-written code. Moreover, boilerplate code can be easily generated with ChatGPT, which significantly boosts developer productivity. Tools like Codex and GitHub Co-Pilot are good examples of LLM based tools optimized to work with code bases. They can easily generate boilerplate code, directly on the IDE, without having to copy-paste from the ChatGPT web interface. As these models get more refined with time, it can be expected to be better at finding bugs, enforcing best practices, generating documentation for pre-existing code or even rewriting inefficient codebases. This shouldn't come as a surprise as it has been trained on massive data from the open internet including books, articles, websites, etc. and software engineering is an entirely digital field, meaning all information is readily available online in easily parsable text format, using which it can generate relevant program code.

However, ChatGPT is not here to replace developers. ChatGPT has been created with the objective of predicting text, based on what's typically happened in the gigabytes of text data that it's been trained on, and maximize approval from humans for the text it generated. But how good are humans at differentiating good and bad responses? If you ask for an answer to a factual question, or code, and it gives you an answer, but if you don't know if that answer is correct, you are not in a position to evaluate. So, if humans are not reliably able to spot the mistakes and give feedback on false answers, it will fail to view its mistake. Note that in this way, it is able to get human approval by deceiving the person. Clever! Systems that are trained this way are only as good as your ability to distinguish good and bad answers. But not all humans know the answer. What appears to be good, may not necessarily be good. It's easy to look at it and say that the models don't know the correct answers, but you could also say that it just thinks that you don't know them. Therefore, false answers are possible, and humans must not take its answers as reliable answers, but first need to question its correctness.

The agent is trained to maximize human positive responses, not necessarily correct answers. So what happens when these systems get more and more powerful? Answer: they start performing better and better. But better at what? Answer: better at getting human approval for their answers. Is it what we actually want?

The output produced by the LLMs are quite fascinating, but the fact is that it is a machine learning model and does not understand programming of any kind; it merely captures the statistical correlations between code fragments. There's also a good example from GitHub Co-Pilot, or Codex, which is a code generation model. Suppose a code snippet you've given it has some bugs in it, and you've introduced some security vulnerabilities, for example. A medium-sized model will give you a decent code completion, but a bigger model will spot the vulnerability and deliberately introduce its own new security vulnerability because it's better at trying to predict what comes next. It's trying to generate code that fits in with the surrounding code. So a larger model writes buggier code than a smaller model because it has gotten better at predicting what it should put there. It wasn't trying to write good code in the first place, it was just trying to predict what comes next. This is called the "misalignment problem", which still remains a fundamental problem to be solved.

This is likely to become more dangerous and harder to eliminate as model capabilities increase. A highly capable but sufficiently misaligned model trained on user approval might produce obfuscated code that looks good to the user even on careful inspection, but in fact does something undesirable or even harmful.

As mentioned earlier, human feedback is used to further train the tool to improve the model. In professional software development, this would mean that whatever you're sharing with these LLMs are further used by it to train itself. This would include proprietary code bases or even company secrets. This could be detrimental to companies as they are handing out sensitive information to sophisticated AI tools which are being used publicly. Therefore, professional software developers should refrain from giving access to proprietary source code to these tools.

Overall, the reaction of the developer community shows that LLMs are very useful tools with a possibly huge impact on the future of software development. At the same time, given the hype, it is also important to understand its unwanted implications. It's definitely not replacing what programmers do, in its current state, it may reduce the cost of producing software by increasing programmer productivity. Mistaking these tools for a programmer can, however, lead to over-reliance, where a programmer blindly approves code generated by the model without inspecting it. Given the mistakes it can make, overlooking this threat can introduce security risks. There is also a data protection risk as the inputs to these tools are used to further improve them, which is why it should be used carefully in professional software development. LLMs, if used carefully, could be a game changer for the software industry; however, it is equally crucial to be aware of the limitations attached.