- Gianfranco's Newsletter
- Posts
- An Approachable Primer on the Layers of AI and LLMs
An Approachable Primer on the Layers of AI and LLMs
Unraveling the Art and Science of Large Language Models: A Deep Dive into Prompt Engineering and Human-AI Collaboration
From Fundamentals to Functions: An Approachable Primer on the Layers of AI and LLMs
Midjourney prompt: intelligent machines in the style of Salvador Dali --ar 3:2
Generative AI and Large Language Models (LLMs) are stealing much of the oxygen in venture firm offices, founder forums, and, worst of all, tech Twitter.
While the recent excitement around generative artificial intelligence and LLMs has seemingly arisen out of thin air, neither is new to computer science. What's with the renewed interest?
Before we can discuss why, we first should understand the foundational underpinnings of the “what.”
This essay seeks to provide an accessible primer on the fundamentals of AI. We will discuss the building blocks that give rise to these chatbots, their functionalities, and how prompt engineering—an emerging field—profoundly influences their performance.
The remaining sections offer insights into Artificial Intelligence, Machine Learning, Deep Learning, and the facets within Deep Learning that give rise to LLMs.
Artificial intelligence, recently the most talked-about subfield of computer science, concentrates on developing "intelligent machines" - machines designed to undertake tasks that traditionally require human intelligence.
At its core, AI is about creating mathematical models that learn from data (inputs) which then generate predictions or decisions (outputs) based on that data. In the same way that a skilled storyteller weaves a tale from simple words, these models navigate through a complex web of data. They interpret patterns, make sense of the information, and guide us toward insightful predictions or helpful actions.
Until recently, many technologists held less concern over the potential for artificial intelligence to upend human roles. These technologists attributed guarded optimism to the dominance of Artificial Narrow Intelligence (ANI) — specialized AI systems adept at singular tasks but lacking broader cognition. For example, while a car may have self-driving capabilities, that “intelligence” didn’t extend to other domains like writing essays or financial analysis. Yet, beyond these 'task-specific' models, the notion of Artificial General Intelligence (AGI) cautiously looms. AGI is viewed as the holy grail of AI, a system capable of understanding, learning, and applying knowledge across various tasks at or beyond human competence.
Moving down the artificial intelligence hierarchy, we encounter machine learning. This branch focuses on crafting statistical models tailored to address specific problems. Data scientists are at the heart of this process, as their role encompasses creating, updating, and maintaining these statistical models. The creation process is multifaceted, beginning with refining datasets destined for model training. These datasets form the backbone for training these models, acting as inputs to the chosen models and statistical algorithms that interpret outputs from this data. In other words, data scientists employ the right data to formulate the right questions to help make data-driven decisions.
To better appreciate how impactful machine learning is in our day-to-day, it's useful to highlight some of the modeling types used in machine learning. Understanding these different models is crucial for grasping the breadth of machine learning applications:
Classification models categorize data into different labels to predict specific outcomes. An everyday example is your email inbox, which uses classification models to automatically sort incoming emails into "spam" and "non-spam."
Regression models identify relationships between variables and predict outcomes based on continuous values. A practical application can be found in real estate platforms like Zillow, which use regression to predict a home's market value based on features like square footage, number of bedrooms, year built, and location.
Clustering models find natural patterns in data to group similar items together. These models underpin many of the recommendation engines media companies leverage. For example, clustering is employed by music streaming services like Spotify, which uses machine learning to analyze your listening history and suggest songs that align with your music taste.
Next, we reach deep learning, encompassing natural language processing and generative AI. This specialized field's primary function is to create new data that closely resembles the data it has been trained on. LLMs reside within this sphere.
The development process of these models mirrors those in machine learning but with an added degree of complexity. To accomplish this, data scientists engage in rigorous data preparation, identifying, compiling, and filtering suitable datasets for generative AI models to learn from. However, the aim here isn't just to make predictions or categorizations but to produce novel content indistinguishable from the training data.
LLMs are trained on expansive volumes of textual data, which equips them to produce text that is strikingly similar to human-generated content in response to prompts - the input text or phrases that guide the model to generate the desired output. The primary strength of these models lies in their ability to comprehend context, decipher linguistic nuances, and deliver detailed, pertinent outputs.
Highlighting some typical generative AI models underscores the reach and potential of this field:
Text generation models generate human-like text. They've found applications in diverse areas, such as drafting emails, penning articles, creating chatbots, etc.
Image generation models, like Generative Adversarial Networks or Diffusion Models, can create unique, realistic images. They're being used in areas like art creation, video game design, and even in medical simulations for training purposes.
Music generation models can compose new music pieces, mimicking the style of the music they've been trained on. Services like OpenAI's MuseNet exemplify this application, creating unique musical pieces spanning various styles and periods.
A special mention must be made of the tech Twitter community, which has embraced LLMs as essential in streamlining day-to-day workflows.
The following subsections analyze the components that make LLMs possible.
The Monomers of Large Language Models
An LLM is like a seasoned author who has devoured an enormous library of books - about 10 million books worth of data. This "digital author" utilizes its accumulated knowledge to generate new narratives or provide insights about the texts it has digested, much like a human author would.
Generative AI models consist of a few building blocks:
Neural Networks: Imagine a model's neural network as a labyrinth of intertwined plot lines, similar to the web of thoughts and ideas in an author's mind. Each node, like an idea, receives and processes inputs from other nodes and then propagates an output to the next node. It's akin to one idea sparking another in an author's mind, forming a chain of thoughts.
Diagram, Neural Network
Weights: In a network of ideas, akin to our neural network, weights are like the emphasis we place on different thoughts. Similar to how an author might focus more on certain plot points, weights regulate how much one idea influences the next. So, much like how an important event in a novel might heavily impact the storyline, weights determine how significantly one node affects the overall “story” the network creates.
Diagram, weights
Deep Learning: Think of deep learning as a multilayered story, where each layer represents a chapter in the narrative. Much like how you would unravel a story chapter by chapter, the layers of nodes in a neural network process information in a layered manner to reach a conclusion. Yet, these aren't ordinary chapters. They're dynamic, learning and adapting from the information processed through each layer, refining themselves to better predict or generate - a distinctive feature of deep learning. It's this progressive discovery and refinement of ideas that place deep learning a notch above traditional neural networks.
Diagram, deep learning
Transformer: Consider the Transformer model as a master reader of a convoluted novel, capable of effortlessly jumping between and linking different parts of the story. Each plot point (data point) is interconnected and can directly exchange information, much like how our eyes dart around a page, creating associations and connections throughout the story. Enabled by a mechanism known as "self-attention," the model can gauge the significance of each word in context with all other words in the text, not just those immediately preceding or following. This unique characteristic, like a reader cross-referencing chapters and passages, allows for a deeper understanding of sequential data and is the foundation of advanced models like GPT-4.
Diagram, Transformer Architecture
Tokens: When the model processes text, it segments text into smaller, more manageable pieces called tokens. Tokens can be words, subwords, or characters resembling the individual words or phrases of a story. Each token is assigned a unique numeric identifier, translating words into a numerical representation.
Diagram, tokens
Embeddings: These numeric representations (vectors) of tokens are then positioned in a multidimensional space akin to a narrative landscape (see image for 2-D representation), where semantically similar tokens are grouped together. Visualize it as a literary universe, where each word is a character. Characters with shared traits form families or clusters.
Diagram, embeddings and graphing per Databricks
Training these models is akin to refining a novel's draft, where the characters (inputs) and their traits and motivations (weights) are continuously refined in line with the unfolding narrative (training data), all striving to anticipate and effectively deliver the next plot twist (output).
While LLMs can produce diverse content, they may occasionally veer off track, producing content that is inaccurate or ungrounded - much like early drafts of any piece of writing - known as “hallucinating.” While a writing draft is excusable, presenting factually incorrect information with confidence is not, especially when lawyers use it for court citations without researching the outputs.
These peculiar instances occur when the model, despite its proper guidance, produces outputs syntactically and contextually well-formed but entirely unanchored to reality, particularly notable due to its confidently delivered but misplaced output. People shouldn’t outsource sound judgment.
Prompt Engineering and The Human-Computer Interaction
Understanding the mechanisms of LLMs is essential, but the crux lies in our ability to guide these models toward meaningful outputs. This forms a bridge, connecting the intricacies of LLMs with the broader realm of human-computer interaction. Much like a compelling plot steers a well-structured novel, our proficiency in prompt engineering - guiding these models - will shape the future of human-computer interaction.
Prompt engineering is the practice of refining the inputs that direct these large language models, steering them toward creating content that matches the desired output. This practice is akin to an editor providing an author with a narrative structure or stylistic guidance, helping shape the output of these advanced language models.
Guiding an AI model's output is akin to an author navigating the weave of a novel's plot. The model, taking on the role of the author, composes its narrative under the guidance of this “plot,” defined by the user’s prompt. Consequently, the output from the AI mirrors the narrative arc and structure defined by the “plot” (prompt).
To convey what it’s like to work with an LLM, you should think about it as having a dialogue of sorts: you set the stage with a prompt, determining what the model should do. Below are a few examples of how best to achieve the expected outputs.
1. Zero-shot learning is like asking a dancer to freestyle to an unfamiliar song. For example, you might ask an LLM to "Compose a short poem on how venture capitalists can increase deal flow." The model might generate a response like:
In an arena of ideas, play fair,
Speak wisdom, show grit, and sincere care,
Courage to risk, sight for the unseen,
Seeds well-nurtured yield ventures green.
2. Few-shot learning is like giving a dancer a few key moves before they start improvising. This technique provides the model with several examples before the actual task. For instance, you might prime the model with translation examples:
English: The dog is in the park. French: Le chien est dans le parc.
English: She is reading a book. French: Elle lit un livre.
English: They are playing soccer. French: Ils jouent au football.
Translate the following English sentence to French: “The cat is on the table.”
This technique steers the model toward the desired output using tailored examples.
3. Fine-tuning is a process similar to instructing a dancer on the precise choreography for a given routine. In machine learning, a model needs a labeled dataset to learn and accurately perform specific tasks. Think of a labeled dataset as a guidebook for the model, where each piece of data - an image, text, or audio - is attached with a related label or category. This allows the model to comprehend the connection between the input data and the desired output, learning from the examples, and applying this knowledge to new, unseen data.
One insightful aspect of fine-tuning in prompt engineering is the role of transfer learning. Transfer learning, in conjunction with fine-tuning, involves leveraging a pre-trained model and adapting it to a narrower subject or a more focused goal. This technique allows developers to enhance task-specific performance by aligning the model with domain-specific data, making the model more adept at generating accurate and contextually relevant outputs for specific tasks. One such example for which transfer learning has made a significant impact on performance and utility is in code generation.
One of the most popular of these code-generation tools, GitHub Copilot, is powered by OpenAI Codex, a descendant of GPT-3, which has been trained on billions of lines of source code from publicly available sources, including code in public GitHub repositories. The fine-tuning process, in this case, involves adapting the pre-trained model to offer real-time code suggestions and even entire functions directly in the user's editor.
When discussing fine-tuning in the context of prompt engineering, it's essential to emphasize the importance of data labeling. Data labeling, or data annotation, is a critical step in developing high-performance ML models. Labeled datasets act as a guidebook for the model, enabling it to better map the connection between the input data and the desired output, learn from the examples, and apply this knowledge to new, unseen data.
To summarize, fine-tuning plays a vital role in AI by enhancing pre-trained models' accuracy and performance, especially when augmented by transfer learning and data labeling. By emphasizing the importance of fine-tuning in the context of prompt engineering, you can drive home the point about the ability to fine-tune things and highlight the significance of this process in the development and application of AI models.
Diagram, the result of fine-tuning a language model
4. Chain-of-thought prompting is like a choreographer giving specific instructions, guiding the dancer step-by-step toward the final performance. Chain-of-thought prompting can be used effectively for tasks that require sequential steps or multiple pieces of information. Here's an example of this technique:
Prompt 1: "Write a professional greeting for a letter addressed to a client named Alex."
AI Response: "Dear Alex,"
Prompt 2: "Now, write a line expressing gratitude for their past collaboration with our company."
AI Response: "We would like to express our sincere appreciation for your valuable collaboration with our company in the past."
Prompt 3: "Now, mention our new marketing proposal for them."
AI Response: "We are excited to share with you our new marketing proposal, which we believe aligns perfectly with your brand's vision and objectives."
In this conversation, the AI takes each input, processes it with the context of preceding prompts and responses, and then crafts a suitable response. This step-by-step process allows for a cohesive and contextually-sensitive conversation.
Conclusion
LLMs—our present-day digital authors—represent an intersection of language and machine learning beyond mere word associations. They interpret and generate narratives, opening pathways to unique interactions with AI. As we continuously refine these models and enhance our communication techniques with them, we find ourselves at a point where technology and natural language coalesce.
What gives me the most excitement within LLMs specifically, though, is more philosophical than mathematical: natural language is no longer limited to just human interactions but acts as a key to deciphering the immense capabilities of our artificial counterparts, inevitably redefining the parameters of human-computer interaction.
Acknowledgments
I wish to express my deepest appreciation to those who have contributed their expertise and time towards the editing and enhancement of this essay. Your invaluable input and astute insights have been fundamental in refining the final version of this work.
Specifically, my gratitude extends to:
Content: I am immensely grateful to Adam Kaufman (Up2 Fund), Zoe Enright (ClearView Healthcare Partners), Camden McRae (Nextwave X Partners), and Nick Ruzicka for their insightful suggestions to improve content clarity and coherence.
Subject-Matter: I extend my sincere appreciation to Brandon Cui (Mosaic ML), Fan-Yun Sun (Stanford Ph.D, Computer Science), whose expertise on Generative AI and LLMs has significantly enriched the depth and accuracy of this essay.
Proofreading: A special note of thanks to Samuel Wheeler (Redacted) for his meticulous attention to detail and commitment to linguistic precision.
Sources:
https://link.springer.com/article/10.1007/s42979-022-01043-x
https://en.wikipedia.org/wiki/Artificial_general_intelligence
https://filice.beehiiv.com/p/how-ai-has-turned-my-vc-work-into-flow-part-ii
https://academic.oup.com/schizophreniabulletin/article/45/Supplement_1/S32/5305655
https://www.semanticscholar.org/paper/ed3af4470000c0b8c6129256fa482e44dc0aee73
https://www.tensorflow.org/tutorials/images/transfer_learning
Reply