top of page
  • Cathy Huang

What is ChatGPT and how is it so good? (Large language models)

On November 30th, 2022, Stanford dropout Sam Altman’s Open AI launched a chatbot software that revolutionized Artificial Intelligence of the 21st century. ChatGPT spread like wildfire across the entire world; by January 2023, it had become the fastest-growing consumer software application in history. Today, ChatGPT is known for its many controversies regarding the software itself and how it is used. However, we should be keen to recognize and appreciate the magic behind this large language model.


What are large language models?


ChatGPT, which stands for Chat Generative Pre-trained Transformer, is a large language model-based chatbot. Large language models (LLMs) are generative AI types that use a transformer network. A LLM is trained to understand and generate text in a human-like fashion.


What are transformers?


A transformer model is a neural network that tracks relationships in sequential data, like words in a sentence. It is made of multiple transformer blocks, also known as layers. The layers work together to decipher input and predict the best output at inference. Google first introduced Transformers in the 2017 paper “Attention Is All You Need.”


So how does this all work?


A LLM uses the transformer model to receive an input, encode it, and then decode it to produce an output prediction. Encoding is the process of transforming the categorical (qualitative) values of relevant features into numerical ones for the models to work with. Before an LLM can receive input and generate an output, it requires training to be able to perform specific tasks.


Training an LLM involves various techniques taken from machine learning (a branch of AI that allows computers to imitate how humans learn). LLMs are pre-trained using textual datasets from sites across the internet (e.g. Wikipedia, Github). These datasets consist of trillions of words, sentences, and phrases; the quality of writing will affect the model’s performance.


The model engages in unsupervised learning. It processes the datasets fed to it without any instructions. During this period, the LLM’s AI algorithm will learn the semantic meaning of words and the relationship between words and distinguish words based on the context. For example, the model would learn to understand that “right” is the opposite of “left” or a synonym for “correct” based on the context.


In unsupervised learning, the model will also learn to predict the next word of a text. By analyzing the language of various datasets, the model can find rules in the human language because there is redundancy in language. The model doesn’t explicitly store these rules, instead acquiring them implicitly through examples. An LLM can learn grammar. Grammar denotes how words are utilized in language; the model will find patterns in how certain words are utilized. For example, models will find that the words “a” and “an” will always precede nouns or adjectives modifying nouns. Therefore, the LLM can accurately use these indefinite articles when generating text.


Conclusion


A large language model is based on the transformer model that understands the relationship between words to generate text. By analyzing trillions of online databases, the model can identify patterns of redundancy in human language. The model applies these rules to ensure accurate syntax and logical flow when generating text.


Reference list

Merritt, R. (2022). What Is a Transformer Model? [online] NVIDIA Blog. Available at: https://blogs.nvidia.com/blog/2022/03/25/what-is-a-transformer-model/.


NVIDIA. (n.d.). What are Large Language Models? | NVIDIA Glossary. [online] Available at: https://www.nvidia.com/en-us/glossary/data-science/large-language-models/.


Tam, A. (2023). What are Large Language Models. [online] Machine Learning Mastery - Making Developers Awesome at machine Learning. Available at: https://machinelearningmastery.com/what-are-large-language-models/.


www.elastic.co. (n.d.). Feature encoding | Machine Learning in the Elastic Stack [8.7] | Elastic. [online] Available at: https://www.elastic.co/guide/en/machine-learning/current/ml-feature-encoding.html#:~:text=Machine%20learning%20models%20can%20only.


www.elastic.co. (n.d.). What is a Large Language Model? | A Comprehensive LLMs Guide. [online] Available at: https://www.elastic.co/what-is/large-language-models.


Comments


bottom of page