Bulldog Reporter

RAG against the machine: What is RAG and how does it impact PR and marketing

By Richard Stone | May 21, 2026

I recently attended a seminar presented by the venerable Andrew Bruce Smith. In it, he used that phrase, ‘retrieval augmented generation (RAG)’. I immediately asked myself why a term that is really about programming an AI had found its way into a PR seminar, but there are some very good reasons why that’s exactly where it belongs.

You can tell if a term is overused by establishing how many articles have been written arguing that it’s dead, and RAG already has a lot of those. But for the PR industry, it’s alive and kicking and not oversaturated at all.

RAG is a simple idea; give an AI system better information to work with, and it will produce better, more accurate answers. Essentially, instead of using only its training data, which was the case up until about 2020, giving an LLM the chance to learn from other data, such as the entire internet, means the answers are better.

retrieval augmented generation

In 2020, a whitepaper called, Retrieval-augmented generation for knowledge-intensive NLP tasks published by Meta changed everything. It coined the term RAG and suggested a system where a retriever function would fetch relevant data from an external knowledge base, like, erm, the entire Internet, to augment an LLM’s knowledge at the time the query is made.

What does this really mean though? Well, as an illustration, I’ve recently been teaching a ten year old how to play guitar. Naturally, I started by teaching her Smoke on the Water by Deep Purple, just as I was taught it in about 1993.

A week or so later, we are watching School of Rock, the classic Jack Black movie. When it got to the bit where he teaches the kids Smoke on the Water, she said, “but Richard, I thought you wrote that! Was it really Jack Black?”

“Not at all,” I replied, “It was Led Zeppelin.”

Those of you keeping up with my, probably overly long, story will have spotted that I accidentally lied to her, in the same way that an LLM operating only on trained data will frequently hallucinate an answer.

However, if I had instead Googled ‘who wrote Smoke on the Water’ I would have been able to give her the correct answer. Effectively, Googling the answer would have been the metaphorical equivalent of an LLM using RAG to ensure the data was correct before generating a response. I would have augmented by answer by retrieving relevant data before generating it.

RAG and marketing

But why does that make it a marketing or PR term? Just because Jack Black knows that Smoke on the Water sounds better on a cymbal than a kick drum, it doesn’t mean that RAG is relevant to anyone but AI developers right?

No. Because RAG lets an AI model pull in additional documents, webpages or datasets at the moment a question is asked, we marketing people have the opportunity to put the answers there for it to find.

Think of it like a journalist phoning a source before writing a story. The journalist still writes the article, but the information he or she gathers changes the quality of the output. A good source can result in a great article; a bad source can be a serious problem.

“The more material there is, the more need there is for filters,” explained Howard Rheingold, the seminal author and journalist who spent his career examining the social and cultural implications of the internet. “You don’t need a printing press anymore, but you do need people who know how to cultivate sources, double-check information and put the brand of legitimacy on it.”

He was absolutely right and, if our job as PR people is to tell the truth well, making ourselves and our clients into reliable sources for an AI must be 2025’s step one.

If your site contains structured, citation-ready, quotable information, it becomes far more likely to be retrieved by generative engines during that initial retrieval augmentation step. And if it’s retrieved, it’s far more likely to shape the final answer.

That is the heart of GEO; designing content so that AI systems select you as a source before they start generating. RAG is the mechanism that makes this selection process explicit and understandable.

Imagine someone asks ChatGPT (other LLMs are available), “How does predictive maintenance reduce downtime in food processing?” RAG is the reason that your site, containing a clear explanation, well-labelled headers, properly attributed data, short quotes from credible experts and a structure that an AI crawler can parse, could provide the answer.

This is one of the many reasons why GEO isn’t just SEO with a new name. I mean, it’s got a very similar name but let’s gloss over that for a second.

RAG allows an LLM to choose evidence to weave into a narrative. The best way to influence that narrative is to make your content the most useful, most quotable and most easily retrieved evidence available.

Suddenly it makes sense for Bruce-Smith to regard this as a marketing issue. Generative Engine Optimisation is not about optimising for an AI. It’s about optimising for RAG. It should really be called RAGEO, but let’s stay away from the Betamax/VHS wars currently waging on that subject and focus on how to do it.

How to RAG against the machine

This retrieval stage of RAG is what links it to writing a good piece of content. At this point, it doesn’t matter whether the content is hosted on your site or placed in the media, in the hope that it turns appear in AI search. Make it retrieval ready is crucial.

If the model retrieves your content, it becomes part of the evidence base the AI draws from. If it doesn’t retrieve you, you’re invisible, even if you have the best content in the world, because the user has to get past the AI answer to get to your beautiful blue links in the SERPS (Search Engine Results Page).

So, what makes content more likely to be retrieved?

Generative engines typically use a form of vector search and semantic matching to decide which documents should be pulled in during the retrieval step. They aren’t looking for marketing language, or even stories about Smoke on the Water. Which is a shame, frankly.

They are looking for clarity, structure, insight and authority, the same things a human researcher or journalist values.

Certain kinds of content seem to consistently outperform others in RAG environments. For instance, clear, declarative statements. For instance, “Predictive maintenance can cut unplanned downtime by up to 30 per cent…” will work better than “predictive maintenance can help reduce downtime” just because it’s easier for retrieval systems to match to a user’s question. The same is true of headlines in the form of questions.

Sadly, the headline ‘RAG collaboratively alongside the machine’ is utterly useless from a GEO perspective, but I still like it, and I think you might as well. I chose it because, while understanding RAG is important, we should also remember that it’s not all about robots just yet. Robots, at present, don’t spend money and humans do.

GEO ranking factors

Structured content with headings, subheadings and labelled sections is far more scannable by an AI crawler. Short, self-contained paragraphs that directly answer a question work almost like ready made building blocks for generative engines.

Data, statistics and citations also help retrieval and are proven to improve your chance of appearing in AI search by up to forty per cent, according to a paper published collaboratively by Princeton University, the Indian Institute of Technology and two independent academics.

This isn’t because AI trusts or distrusts them in the way you or I might, but because they increase semantic density. Quotes sit in the same category; they are clean, clear statements that search treats as a high value anchor. A sentence like “According to Dr Smith, friction welding reduces porosity by eliminating filler material” is far more likely to be retrieved than a vague paragraph about innovation or efficiency.

All of this leads to a simple conclusion; the writing choices we make influence whether an AI model selects our content during the retrieval stage. And if we influence retrieval, we influence the final generated answer.

This is why GEO isn’t just a rebrand of SEO, engineered as a reason to pay people like me more money. Although, to be clear, it is also a reason to pay people like me more money.

SEO was about persuading, and in many cases cheating, a ranking system. GEO is about persuading an evidence based selection system that acts like a journalist. It’s like SEO has graduated from its first degree and is doing its PhD; its questions are better and the answers it is looking for have to be more compelling.

The companies that win at GEO are going to be the ones that recognise that the content that is easiest for AI to classify as relevant, authoritative and useful is the content that will be presented to their customers.

As generative search becomes the default interface, RAG becomes the engine underneath it making everything move. Understanding how retrieval works and writing in a way that makes your content retrievable, is now a core part of the PR process.

Understanding how to retrieve accurate data is also essential if you hope that the children you teach to play guitar won’t be hopelessly embarrassed in the guitar shop. The alternative is them telling the owner that they are about to break out the Led, and then actually going on to start rocking some sweet Deep Purple licks. Imagine their embarrassment.

Richard Stone

Richard Stone is MD at Stone Junction.

Latest Posts

Daily PR Insights & News