Brief summary
In an age of vibe coding and LLMs, do we really need to care about documentation? Do we need to spend time and energy producing it — time when we could just be shipping code? Of course we do; particularly if we want to communicate and share software with other humans.
To discuss documentation in 2025, Technology Podcast host Lilly Ryan is joined by Heidi Waterhouse, a very special guest with an esteemed and varied career in technical communcation.
In this episode, Lilly and Heidi tackle the challenges of documentation in a world increasingly infused with AI-generated code and text, explore whether prompt engineering is really just technical writing in disguise and examine the difficulties of writing for highly specific audiences.
They also cover Heidi's Progressive Delivery, an upcoming book about bridging the gap between software delivery and business value. It's due to be released in the latter part of 2025 and written alongside James Governor, Kim Harrison and Adam Zimman.
Find out more about Heidi Waterhouse by visiting her website.
Learn more about Progressive Delivery.
Episode transcript
Lilly Ryan: Hello, and welcome to the Thoughtworks Technology Podcast. I'm one of your regular hosts, Lilly Ryan. Joining me today, we have Heidi Waterhouse. Heidi is a public speaker, a documentarian, writer, collaborator, progressive delivery thinker, and many other kinds of things. She's working in marketing advisory, and her second book, Progressive Delivery, is coming out in November 2025. She joins us today to talk about AI and documentation, a topic that I think many of us are thinking about quite a lot at the moment, and if we're not, we should be. Heidi, welcome to the show.
Heidi Waterhouse: It's good to be here.
Lilly: Your second book, as I said, is coming out in November. Your first book, though, was in 2021, which was Docs for Developers, and you've been a technical writer for, I think, most of your career, if I'm not wrong. What is documentation to you?
Heidi: Documentation is the way we describe software to humans, just like code is the way we describe software to computers. Software is something that sits in the middle of our work processes and our humanity and our computers. When I'm talking to people about documentation, I really want to figure out why they're using it. The primary reason that almost everyone uses documentation is because they're angry. By the time you're reading documentation, you're usually pissed off because you haven't been able to do a thing with software.
Knowing that makes me understand that what people are trying to do with both software and documentation is get their work done and not use a computer. There are actually very few people in the world who wake up in the morning and think, "I would like to use a computer."
Lilly: That's fair and reasonable. I can't remember the last time I actually thought about doing it that way, but I do inevitably end up using one. I think I end up engaging with documentation, frankly, multiple times a day at this point. When it comes to software development and the people who are putting the software together, there is documentation meant for software developers. Software developers are, in some cases, expected to produce documentation. Do we need it?
Heidi: It depends. I think that we need it if we want to communicate with other humans. I think if you are building a vibe coding interface only for yourself, you don't need documentation. I think if you want any other human to be able to figure out what you were thinking, then you're going to need some documentation. I think if you ever want to make it interoperable, then you're going to need documentation on its endpoints and how other people could access it. When you say, "Do we need documentation?" the answer is very much it depends, and the reason that we're using software dictates a lot of that.
Lilly: There's a growing undercurrent of discussion, perhaps, that we don't explicitly need to write documentation anymore because we're able to give the code that has been written directly to a large language model, and it can produce the documentation for us, or for example, if it's something that is able to generate text very easily, we can just hand it off to a system that can do that rather than have a human doing it. Personally, I still think there's a lot of value in human beings doing this.
You mentioned documentation being something that glues all of the processes together with the people. Do you think that there are any cases where it is possible to just generate documentation and have that documentation be helpful? Where do you think that this sits in an age where people have machines that can generate reams of text for them that look very plausible?
Heidi: I think APIs are a great example. The open API standard is an easily readable way to write about what your API does. There is nothing stopping anyone from putting a chat interface on the front of it and saying, "What's the call to retrieve a date?" That would work fine because it's an extremely well-known standard that everyone uses, and we are not asking a computer to generate anything. We're asking it to retrieve something that's extremely well-indexed, and because those two things are true, I would feel confident having a chatbot that returned information about a well-architected API to me.
Where I don't feel comfortable is where we're using generative AI to guess at the intentions of something that does not have explicit intentions. API documentation has explicit intentions and is well-structured, but instructions on a lot of other things are a lot squishier. What are the intentions behind a corporate dress code? Sure, that's the sort of thing that AI advocates will tell you you don't have to write anymore, you can just say, "Write me a standard corporate dress code."
You could get that, but you would not have the context necessary to make it make sense. The LLM would not understand what country you lived in or what formality level your office had, or why there are gender problems with assigning clothing to one way or the other. There's a ton of things that seem like you should be able to just run up a standard thing, and an LLM should be able to handle it, but it is exactly the place that we think we can trust it that it becomes dangerous.
Dress codes are an easy example, but what are the intentions behind medical treatment? What are the intentions behind even-- I was reading a thing about food ordering services and what happens if they make up a description of Kung Pao chicken that leaves out the fact that it might have nuts in it? Now we have a real problem.
Lilly: Yes, I think that the relationship that LLMs have to truth, it matters in a lot of contexts, but it definitely matters in a context where part of the intention of documentation in general and speaking in generalities about documentation is very difficult anyway, but one of the main things that it is intended to convey, regardless of what type of documentation it is is some kind of truth or fact or critical piece of information about that software, if we're talking about software documentation itself or about that code, that rule, whatever it is.
You mentioned earlier a couple of different concepts I wanted to get into some more in terms of how it helps us with documentation, indexing, as well as structure. Documentation can come in many different types of formats. Indexing is how a system would then break it down and pick it back up again for retrieval in the LLM sense. In other senses, what are we talking about?
Heidi: One of the interesting things about our understanding of indexing is that we have conflated two different terms, and they're very different. When a computer tells you that it's indexing a page, what it's actually doing is making a concordance of the page, and it's just looking at how many times-- It doesn't even know Word. Word as a concept is very fuzzy for a computer, but how many times a word appears on a page or in a text. Then it has a heat map.
The place that people are familiar with concordances is frequently they are used in religious texts, so that you can say, "I would like to know every place in the Christian Bible that we reference goats." You can find every place in the Christian Bible that references goats. There are concordances that do that. It doesn't tell you anything about that text except where it lives. An index, on the other hand, is a human-generated contextual map of what's going on.
There's this beautiful passage in Kurt Vonnegut's Cat's Cradle, where he's talking to an indexer on a plane, and she goes on about how the shape of an index gives you the shape of an author's mind and that she is in conversation with this text by making the index. What an index does is it gives you context. Back in the stone ages, when I started doing technical writing, we did hand indexing. You'd be writing along, and there would be some automatic indexing. All headings were indexed.
Obviously, if you've put in a heading that says, "Installing the database," that's a useful thing to find in an index. You could also highlight and index specific phrases so that people could find them later, because you didn't want them just searching on the word database. You wanted them to be able to find the place in the text that they needed it. We're talking 700, 1000 pages of text. You don't want them having to flip through every time you mention the word database. You want the crucial thing.
Interestingly, you could also, as you were indexing it, add synonyms or other pointers. My favorite story about this is, for years, you could not search the Microsoft help site for the phrase "Blue screen of death." Even though, literally decades, that is what people called it when a Windows machine failed. It would come up with the blue screen of death. That's not what Microsoft called it. They had never indexed that term. What they called it was a critical error message. As a Windows user and not a Microsoft person, it would never occur to me to look it up that way.
They finally started indexing it that way probably 10 years ago, and it increased people's happiness, even though their computer had just blue-screened because they couldn't find the information that they wanted. Sometimes indexing includes things that aren't even in the text, that are implied. An LLM is the closest that we think we are getting to that, that we can feed it a bunch of text, and the boosters say that it really integrates this with all the other things that it knows and gives us that inferred information, but that is way out on the limb of supposition. It's so dangerous to say, "Well, the LLM told me that," because, no, it's just statistically probable that those were some words that came up together.
Lilly: It's interesting to look at the concept of indexing when it comes to search and retrieval from a technical point of view and what LLMs are doing under the hood, apart from the generative responses in terms of how they would ingest information and retrieve it in order to show it back to you, regardless of whether they do that by just listing out what they have linked to or they're trying to synthesize something out of the text itself.
Those webs of semantic association, I think it's interesting where, if you were taking a corpus that included people complaining about the blue screen of death, that probably would have an association with the state that you're talking about, where the machine has crashed. It would have come into the picture earlier because we've got this ambient sense of things. We're not waiting for somebody inside of that corporate context to come along and pick it up.
It's so highly dependent on what's in the corpus in the first place, either the one that it's been trained on or the one that you've provided to it in order to augment it, that it doesn't replace the fact that human beings want people to know certain things about this particular thing, just as with a corporate dress code. It will also have ingested a lot of information, again, depending on the training corpus, about historical gendered presentations of a corporate dress code.
Given the prevalence of those over time, and the fact that those things have changed within a shorter period of time, it's more likely to regurgitate something that takes into account the heavier weight of that larger piece of information and all of those biases. What do you think the position is of striking that balance, I suppose, between getting the benefit of having something that will make semantic associations based on a wide pool of information, and be able to perhaps preemptively pull out associations that the people who are looking at it may not yet have realized are there?
How much of it is really about the human beings telling the system what to do? Where do we need to strike that balance between automation and curation?
Heidi: I don't think we know yet. I don't think we will know for a while. I do think that human in the loop is a super valuable way to make sure that we're not adding horrifying things, but that also means that we're asking humans to look at horrifying things. It's rough. I once worked on a team that had an absolutely unfiltered email box. Y'all, you do not know what is happening in your email. You only see about 5% of the email that gets sent to you.
Lilly: Oh, the background noise of the internet, the spam and the phishing, and all that junk?
Heidi: Yes. That is beautiful, sophisticated machine learning. We pretty much never lose email that we want. If we do, we have workarounds for it, but we also don't see most of the terrible stuff. That is because machine learning plus human integration allows us to filter out a bunch of it. We can pattern-match that enough so that we're not seeing terrible things. We're still getting spam, but it's also getting sent to the spam bucket. Again, that's a tiny percentage of the actual things that were sent to your email address.
When we're thinking about the balance of human curation and automation, I think that we need to work with our automation buddies in order to set parameters that make it easier for us to filter. One way I think about this is the confluence search is notoriously terrible. If you're looking for something in confluence, what it has done is indexed the entire confluence space with no semantic weighting. It doesn't know if it's in something that you looked at recently or something that's eight years old. It doesn't know if it's somebody on your team or somebody who isn't on your team.
The new things that are coming along, like Glean, and we've talked about this, are using a kind of sophisticated automation that allows us to say, "Okay, semantically, this person sits closer to this other person in the structure of the organization. They share teams. Let's make it more likely that that result rises to the top and less likely that somebody that they have never sent an email to is in that first page of responses." If it's more recent, it's probably more useful than if it's less recent. All of these things allow us to do search that feels much more useful to us and also isn't actually hiding any information from us.
Lilly: This is where we're talking about not necessarily large language models in general, but search and retrieval and other types of machine learning and AI that don't involve the generative elements in terms of coming back up with the responses at the other end of things, right?
Heidi: Yes.
Lilly: This is something that, for me, is perhaps one of the lesser-told stories, I think, where a lot of the values that I personally think people are finding in some of these technologies is about navigating information that has been difficult to search because of the problem that you just articulated. Not necessarily because we want the answer to be shown to us with a synthesized paragraph of text, but that it's just the problem is hard. It's hard to find things. It's always been hard to find things. Information science is a very old science for this reason.
Human beings have always struggled with this as long as we've had information that we needed to retrieve, which has been about as long as we've had human societies, I suppose. To be able to do this in a way that does have that kind of balance, I think has only emerged in the last decade or so, and I think has only really gotten into people's hands in the last couple of years through a lot of the generative AI stuff, but it's not actually the generative element that's working here.
One thing that I have seen people struggling with when it comes to folks who write documentation for a living or folks who really care about documentation and who value the ability to communicate clearly is that when you do search and retrieval and the retrieval has synthesized texts on top of it, where you have got an LLM generating the output and LLMs, as has been discussed widely, are not necessarily ever going to treat truth as something that is-- they don't have that concept of truth, and this is one of the very hard problems in this field, but they will reconstruct the information and be able to word it in partially novel ways.
When you're writing documentation, historically, you would have known that people would look at what you had written, and the precise words you had written would verbatim be the thing that was teaching them how to use software. That's no longer the case, where people will be pointed to the chunk of information that describes how this software might work in this particular way, or this feature might do this thing, but the words that you chose as the writer are not what is presented to the user.
How do you deal with that as somebody who is a professional writer and a professional technical writer, knowing that the agency over what users are being told, not just in terms of facts, but in terms of wording and the way that we craft wording choices, have been significantly changed?
Heidi: I think as a professional writer, you learn really fast not to be precious about the way you word things, because it will get changed, and you don't own it. I think it's very useful for people to understand that most technical writing is work for hire. Much like writing for a newspaper, you turn in your copy, and then it's not yours anymore. It is not a representation of you, and you don't own it. Although you can take samples of it for your portfolio, what happens to it after you finalize it is not up to you.
In that sense, I'm not very precious about how people are reading my words. I just want to get the message across to them. If that is better in a different language, if that is better in a simplified way, as long as it's accurate, I don't care. The thing that scares me the most is that I don't have any confidence that, again, the truth is going to be preserved in the translation. Without that confidence, then I get protective of I want things to appear the way I meant them because otherwise, who's deciding it?
Lilly: There have definitely been a lot of issues with package maintainers and product owners who are responding to complaints that users have about their products because they have interacted with the documentation via an LLM, which has given them information about the product that's false, about features that don't exist. I can see how this becomes a real problem, not just for the people who are writing the docs, but the people who are doing the support and the people who are trying their best to make sure that everything works, not to mention that it's extremely frustrating for the end user.
Is there a way that you've seen this work well, where we know that this technology has these flaws baked in, that there's a lot of effort going into trying to mitigate that, but at the same time, it's not something that can be entirely patched? Where do we go from there?
Heidi: I think that we stopped trying to make what I think of as an Android, a machine that does all of the things reasonably well. What we work on instead is using all of this compute power and all of these statistics to make robots that do one or two things well. I think it's amazing that we can do queries in natural language. That's honestly astonishing. It was a real generational difference. I still type queries in in Machinenglish because that's how I learned to search, and my kids never have. By the time they were searching competent, it became possible to say, "How do I do this in Roblox?" It became possible to say, "What is the capital of India?"
That natural language searching is immensely powerful. I absolutely want it applied to all of my documentation. How do I install SQL 7? Not SQL 7, because that's pretty old, but "How do I" is a hugely powerful way to approach documentation. What's the command for? Immensely powerful. I want to keep that. What I don't want is for the retrieval augmented generation to go off on its own and say, "And actually two small rocks a day is the FDA recommendation." That's the point where it becomes problematic.
Let's keep natural language search and let's keep context syntax aware search, and let's say, "Okay, I've retrieved something and I don't understand it. Can you make me a video about it?" Explaining things to people about their cell phone while you're on the cell phone with them is a nightmare as anybody who has tried to do this knows. Like, "Okay, no, you have to slide from the right side of the screen in order to get your password keeper to unlock." Frequently people are looking at it on their phone. Then we have this recursion problem.
If the instructions are not making sense, or if they're in unreadably small text of gray on gray, say, "Read it to me out loud," or "Make me a video," or "Show me an example." That seems like a great use, but I wish there were a way that we could turn a fantasy dial where we say, 100% fantasy is mid journey, like, "This dog is having an acid trip. Okay, fine." 0% fantasy is, "I'm not even going to make these into sentences. These are just text bits that I have retrieved."
What I want is documentation that's 20% fantasy, where it's smoothing out the text segments so that they can go together and make sense in response to a query, but it's not generating anything. That's not how it works currently, but it would be interesting.
Lilly: One of the reasons I wanted to have this conversation with you is because I think that everybody has become a technical writer even more than they already were, without realizing because of the way that so much work has focused on crafting prompts for LLMs. When people are talking about how to develop software and augment it using LLMs, when they're trying to describe something to a system, they are creating documentation, whether or not they realize that's what they're doing.
I think people are increasingly thinking about it in this way, but some folks aren't yet. If this is news to you, you are writing documentation when you're doing this. I firmly believe that prompting is a form of documentation. It's a very valuable one because what you do in that process is that you are articulating to yourself and you're having the conversations with others, or you ought to be, about how you want the system to work, about how it does work, about what needs to change with it in the future, and that these pieces of information become very valuable parts of the stack.
We've seen folks talking about how prompt libraries are becoming a thing. In my view, that's documentation. What I wanted to know is from your perspective, Heidi, as somebody who is a technical writer, what advice would you give to people who have just realized that they are also technical writers?
Heidi: Welcome. I think what I would say is, again, I go back to the thing where documentation is the thing that sticks together what a computer understands and what a human understands. Code is half of it, and documentation is the other half of it. When people have realized that thinking through a problem is creating the business case, is creating a structure for how they want to solve it, is allowing them to have a dialog with-- it's not another person, but it feels like it. We all feel like that when we're talking to a chat. That really gives us a lot of opportunity to say, "What did I learn?"
I want documentation that is prompts to consist of two things. I want the pure flow of how it actually ends up working because that's the finished product, but I also want to retain all of the false starts and dead ends, because that is how you're thinking. When you say, "Make me a calendar about 1875, and when there was a comet." It tried that, but it gave you a Gregorian calendar, and maybe that's not a good fit for what you're trying to do because you're working with texts that are not Gregorian.
That dead end is still valuable information, just like negative information from experiments is valuable information. Without keeping that in our documentation, we're losing a lot of why we didn't go a certain direction. When people are doing a lot of prompt writing and thinking through a product, I think they end up with a finished product, and then they copy it and they try and use it again, but I think that they should have a personal file of, "Things that I've tried and found out didn't work," because that is experimental data that they should retain.
Every writer I know has an ugly, messy, terrible folder or flat file of things that they cut out of their documentation that they hated to lose. We all hoard these things because you never know when they'll be useful. Do we reread this? We do not, but we keep it because it was part of the learning process. If you figure out that writing prompts is documentation, I think it makes you more considerate about how you are trying to implement a solution for a problem you have, so you have to say the problem.
Frequently, the problem gets dropped out of the prompt in the end. You don't say, "I'm trying to write a historical fantasy about comets." That gets dropped off the front, but knowing that business case in a loose sense gives you a lot of information about why the rest of the prompt works the way it does. It's frequently inferred. We don't write it down.
Lilly: You're arguing that we need to, and we also need to keep those records for ourselves, even if they don't need to be in a prompt in the future.
Heidi: Yes. I'm helping my mom. For 30 years, she's wanted to write a program that is a spell-- She did a customized spelling thing. As a teacher, if one of her students misspelled something on one of their other papers, she would pull that out and put it into a personalized spelling list for them. Then that would be their spelling list for the week. If they didn't have enough words to fill it-- There was a list that she would pull from that was appropriate to their circumstance, but it was incredibly manually intensive to enter that for 30 students.
The original idea is, I want a personalized spelling list for each student. Would that get recorded if we were sitting down to do prompt engineering, or would that just live in our head?
Lilly: It's a great question to be asking.
Heidi: Yes.
Lilly: With the example that you just gave, you're talking a lot about a very specific application of that idea or that technology, which leads me to consider one of the important parts of documentation in terms of who we're writing it for, what format it should take. We've already talked about prompts as a format, being novel format for documentation, and that when we're writing a prompt or using an LLM to iterate on system requirements, that's not necessarily an audience. An LLM is a type of audience, but it's not a human audience necessarily.
When we're trying to write, we're writing for ourselves and we're also writing for other people, other users, other developers, future versions of ourselves, and so on. Where do you think we need to put our focus when we're writing, and who is documentation for ultimately? Where should it be living? Where should it be going? Who should we be speaking to?
Heidi: It should be where it's going to do the most use. We used to have a very rich ecosystem of context-sensitive help. I just got a brand new sewing machine and it's very smart. It's actually not a sewing machine. It's a sewing computer according to the extensive documentation that it came with. One of the things that you can do as you're sewing with it is tap a little information icon, and it will tell you where in the menu you are. It has what we call breadcrumbs to tell you how far down in the menu you are.
It knows which needle you have in. It knows a bunch of contextual information. I don't have to flip through the 250 pages of manual to figure out which foot I'm using to do what. It says, "Oh, you have this kind of foot in. These are the available options." That contextual help is where we should all aim to put our documentation, wherever it's going to be most useful.
In classical documentation that I was taught to write, all documents start with, "This is the installation guide. It is intended for system administrators. It is not for end users." Most of us who have been reading documentation for a long time never noticed that. We don't even see it anymore, but it exists in a lot of places still because it's very useful to say, "Look, you're not in the right place. This is for the people who have a server room or a cloud or whatever. They're operating at a different level than the end user."
I think it's really important for us to say that in all of our documentation, like, "This is instruction intended for beginners. This is a quick tutorial on how to do a specific thing." I think maybe we should consider adding it into our prompts. I haven't done a lot of experimentation in this, but do we say, "Write me a memo for an executive who is at a Fortune 500 company that says roughly these things." Specifying the audience is the number one thing.
I know that I am having trouble with a document, and I can't write it. 80% of the time, it's because I don't know who I'm writing it for. I don't know what their context is. I don't know what they understand. I don't know what they're trying to accomplish. Identifying the audience unsticks me, like, "Oh, okay." That was one of the things that I asked in the preamble to this show is, "Who am I talking to?" What you told me that we're talking to technologists means that this conversation is different than it would have been for we're talking to AI enthusiasts, or we are talking to executives, or we are talking to middle schoolers. Those are all very different conversations. Lots of the same content, but a different framing, a different understanding of what they already know.
Lilly: I think that some folks who are deep into prompt engineering have maybe been used to thinking about this in terms of telling an LLM at the other end who it is supposed to be. "You are an expert at JavaScript," for example. I think that if you're new to thinking about audiences that flipping that around and saying, "Cool, all right, I know who I want to be talking to in order to answer this question. Who should I be talking to when I'm trying to explain it? Who am I imagining I'm explaining it to?" should hopefully help as well.
I also haven't tried necessarily suggesting that — there have certainly been cases where I have suggested to an LLM, "You should imagine that you are a middle schooler and what kinds of questions would you ask me about this sort of thing could probably be a good way to flip that around and get yourself into the headspace of thinking about an audience if that's not something that you have been used to considering in what you're doing."
Heidi: Yes. It is my theory, and I would have to do a lot of experiments to prove this, that identifying the audience and the goal of your documentation up front when you're doing a prompt or any other kind of documentation really allows a lot of other things to fall into place and makes it much less intimidating to start writing.
Lilly: That's my hope, too, is that it's not that necessarily we need folks to start writing. I think that what my hope is is that we will get people to realize that they are already writing, that they're doing quite a lot of this work already in the work that they might be doing right now if they are working with prompts in any kind of fashion. That's a form of writing. It's a form of documentation. It's something that I think we could get used to thinking about in that context. If that's not already on your radar, I would suggest it.
If we wanted to learn any more about documentation in general, if this is something that has piqued your interest, Heidi had a great book that came out in 2021 called Docs for Developers, which you should pick up and check out. It's got a lot of very useful information in there. You've got a book coming out in November, Heidi, which is Progressive Delivery, which is about this, but also about the squishy human parts. Can you tell us a bit more about it?
Heidi: Of course. When I spent a bunch of time in the DevOps world, I always felt like there was something missing, and the observability community helped point out what it was. We're not using user experience as part of our DevOps loop. We stop at DevOps as if it were okay to throw things over the wall to users in the same way that we understand we can't just throw things over the wall to ops. What Progressive Delivery is about is about incorporating the user and the user experience into the entire software delivery life cycle so that we understand how our software is making an impact on people and what it's going to do when it gets in the hands of real humans, instead of the imaginary humans that we have constructed.
Lily: Excellent. Definitely make sure you look out for that one as well. We'll put links in the show notes to all of that stuff. Thank you very much for joining us today, Heidi Waterhouse, and talking all about AI and documentation. There's a lot more we could get into here and a lot more that I'm sure we will have to talk about in the future. Many thanks again.
Heidi: You're very welcome.