Unveiling the power of GPT-4

It's been almost a year and a half since the OpenAI's first release of Chat-GPT, and its effects across the tech industry and beyond have been undeniable. Within a year, the app has quickly become a productivity staple, sparked an AI burst, and boosted OpenAI to one of the fastest-growing consumer apps of all time.
image of Unveiling the power of GPT-4

AI chatbots in action!

However, long-time followers of AI development will know that Chat-GPT started out as nothing more than another Large Language Model (LLM) based chatbot, which have existed and been under development since as early as the 1960s. So how did Chat-GPT stand out so much from the rest, and what advantages did it have over its predecessors? In this article, we will take a look under the hood of GPT-4, the deep learning model behind the latest version of Chat-GPT (and a plethora of other GPT-based chatbots), and try to understand just how OpenAI revolutionized the field of AI Chatbots.

To understand the success of GPT-4, we should first examine its preceding models. Prior to 2017, the leading deep learning models for Natural Language Processing (NLP) by far were Recurrent Neural Networks (RNNs). RNNs are a type of Neural Network designed to handle sequential or temporal data. These models differ from regular neural networks as they include a hidden state that tracks previous inputs to capture dependencies between different elements in a sequence. In NLP, this can be used to predict the next word in a sentence based on previous words seen by the model. However, this model had significant limitations when handling longer sequences, as its mathematical structure meant that earlier elements were weighted much less, and would eventually “disappear” (this is often referred to as “vanishing gradients”).

In 2017, a team at Google released a paper titled “Attention is All You Need” (Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Lukasz Kaiser, Illia Polosukhin). This paper introduced a new “Transformer” model, which dramatically improved upon RNNs with one simple idea. The idea was called the self-attention mechanism, which essentially allowed the model to pay attention to all states of the encoder, regardless of how far back in the sequence they were. With this addition, the new models could interpret longer and more grammatically complex sentences with significantly higher accuracy. Most LLMs that you hear about today use transformer-based models, including, of course, GPT (which stands for Generative Pre-trained Transformer). Other examples include Google’s Gemini, Meta’s Llama, xAI’s Grok, and Anthropic’s Claude.

One other factor that sets GPT apart from its predecessors could lie in its training data. OpenAI has not fully disclosed its training data, but estimates place the number of parameters used to train GPT-4 at around 1.8 trillion parameters across 120 layers — 10 times larger than the training set used for GPT-3. By comparison, Google’s BERT, which was developed around the same time as GPT, was only trained on 110 million parameters. According to OpenAI, these parameters were drawn from publicly available and third-party data, likely including CommonCrawl, RefinedWeb, Twitter, and Reddit. Up to this point, the evidence seems to suggest that bigger really is better — one reason that AI developers have been racing to create increasingly large models. However, DeepMind’s RETRO (Retrieval-Enhanced Transformer) has managed to consistently outperform much larger models, showing that smaller, more optimally trained models have potential as well.

So far, we’ve seen how Chat-GPT has outperformed other models, but we can also see significant improvement with each new version of GPT as well. Aside from improved accuracy and performance, GPT-4 has several new capabilities, including interpreting image inputs, improved steerability, and better safety features. Steerability refers to the chatbot’s ability to take on different styles or personalities when prompted. However, like its previous versions, GPT-4 still suffers from “hallucinations,” sometimes providing false facts or using poor reasoning. OpenAI has hinted towards future versions of GPT, which will hopefully address these issues.

Over the past year and a half, OpenAI’s Chat-GPT has rapidly ascended to become a productivity staple and a catalyst for transformative growth within the tech industry and beyond. Its success, epitomized by the latest iteration, GPT-4, can be attributed to the revolutionary Transformer architecture introduced in 2017, which overcame the limitations of traditional RNNs with its self-attention mechanisms. Boasting an unprecedented scale of training data and parameters, GPT-4’s prowess underscores the pivotal role of data and model size in achieving breakthrough performance. As we continue to witness the remarkable strides made by AI, we should be inspired by its capacity to revolutionize industries, enhance efficiency, and enrich human experiences.