One of the buzziest companies in AI at the moment is productivity app Notion. In February, four months after a beta debut, Notion publicly launched a new AI assistant, called Notion AI. It is an impressive tool that can summarize notes, generate action items, improve grammar and spelling, and assist with creative tasks such as brainstorming.
The tool took off fast, generating nearly two million waitlist signups after the alpha announcement leading up to its general availability in February 2023. Notion AI has been well ahead of the curve, considering that ChatGPT launched in November 2022.
How did they move this fast? Notion co-founder, Simon Last, began prototyping in October 2022 and, two weeks later, came up with an early proof-of-concept, before staffing a team.
This is a pattern I’m seeing among companies building new AI products: they’re working with small, scrappy teams, to deliver features that enhance their capabilities with the latest large language models. Today, after launching numerous features, the Notion AI team is still small and made up of a mix of product, machine learning, and infrastructure engineers, as well as a new brand of engineer: the LLM native or AI engineer.
I was excited to get the opportunity to speak with Notion’s recent recruit, AI engineer Sophia Xu, who joined in May to continue the team’s work on Notion AI.
Sophia is one of the first people to define what this new generation of AI engineers does. In the past, she studied mathematics and natural language processing at McGill University. More important than her academic background, however, is her experience building her own AI tools and products. Her first product, using early ML models, was a Chrome extension that scanned web pages and provided additional information about various subjects mentioned on a page. In 2022, she built Unigraph which showcased her skills building with LLMs and led to her current role at Notion.
Sophia sat down to answer a few questions I’ve been wondering:
What does an AI engineer actually do all day?
While this answer varies a lot depending on the engineer, Sophia told me that most of her time is spent generating (or attempting to generate) structured data using AI models. To accomplish this, she uses a variety of techniques such as few-shot learning and chain-of-thought prompting. I was curious to understand more about these techniques, so I investigated with the help of a prompt engineering guide that Sophia suggested.
Per the Lil’Log guide, few-shot learning “presents a set of high-quality demonstrations, each consisting of both input and desired output, on the target task. As the model first sees good examples, it can better understand human intention and criteria for what kinds of answers are wanted.”
For example, if you wanted to train an LLM to give concise summaries of book genres using few-shot learning, you could enter the following:
Prompt: "Fantasy" → "A genre involving magic, mythical creatures, and often set in alternate worlds."
Prompt: "Mystery" → "A genre revolving around solving a crime or uncovering secrets."
With these examples, if you later ask about "Science Fiction", the model might reply with "A genre exploring futuristic concepts, advanced technology, and often set in space or other planets." By training models in this way, they’re better able to generalize using less information.
Chain-of-thought prompting, a technique developed by researchers at Google, is particularly valuable for AI engineers because it forces an LLM to show its reasoning. LLMs are prone to producing erroneous information. This is a major frustration for anyone working with LLMs, but it’s even more frustrating when the LLM doesn’t provide the steps that led it to this false conclusion. Chain-of-thought prompting asks the LLM to produce step-by-step information which provides critical insight into how it reached its final conclusion.
The reality is that working with large language models is tedious, painstaking work, requiring a lot of trial and error. Results are non-deterministic, and answers can change drastically if the prompt varies even slightly.
What is Sophia’s favorite tool?
The tool Sophia works with most is OpenAI Playground, an interactive web platform that allows users to experiment with OpenAI's models, such as GPT variants, to see the capabilities of the models firsthand and understand their behavior in real-time. It’s a sandbox for engineers to access the GPT models directly. For AI engineers, this tool is particularly valuable because it provides a lot of flexibility when testing the effectiveness of different prompts.
What’s the most challenging aspect of Sophia’s work?
One of the biggest challenges AI engineers currently face is one that Sophia regularly encounters in her daily work: validating AI-generated results.
Evaluating non-deterministic results is especially difficult because you cannot write effective unit tests. To address this, Notion is hacking together its own tools as well as experimenting with using language models to validate language model results, but this can be unreliable.
This reminds me of something my friend Sam Shah, VP of engineering at Databricks, told me in an earlier newsletter: “The best solutions for generative AI are in places where it’s difficult to construct the artifact, but easy to verify the result.”
If you’re an engineer who wants to become an AI engineer, what does Sophia advise you to do?
Sophia suggests following people who are building AI products on Twitter and paying close attention to what they’re talking about. But more importantly, she says, go build stuff. If you want to be an AI engineer, “you should tinker and experiment,” she told me. “Prompt engineering can be really fun.”
My conversation with Sophia reminds me of the early days of web engineering in the late 1990s, when I first worked at an internet startup. Back then, it was impossible to have a lot of experience as a web engineer for one obvious reason: the web simply hadn’t been around all that long.
Today, the field of AI engineering is a lot like those early days building out the web. There isn’t a guidebook on how to become an AI engineer. Nor is there a clear cut definition of what an AI engineer is or does. Many of the people working in this fast-evolving field are learning about AI through conversations with other engineers, toying with new products and features, and hanging out on Twitter. To succeed in this emerging world requires not so much academic rigor as it does scrappy perseverance, deep curiosity, and most importantly, a passion for the future of AI.
The only way to gain experience working in AI is, as Sophia says, to spend a lot of time tinkering and experimenting. Go forth and play.
Thanks for reading,
Tamar
Good framing -- "difficult to generate but easy to verify" is a helpful way to think about which problems are well-suited for today's LLMs, and applies also to the human-in-the-loop vulnerability detection in your moat recent post.