Thoughts on Content Aggregation

TL;DR – ChatGPT is the last occurrence of a long trend for content aggregation and using other people’s content without linking the source. Reasoning:

Aggregated content is worth money

The modern human generates content all the time. It’s very clear that we do it when we post photos on Facebook or write lengthy text on our personal blogs but it’s not just that. Our browser history is content, our search history is content, and our cookies are worth something to the advertisers. We generate content by clicking the TV remote, and most likely by even speaking in the presence of a smart device with a microphone. We generate content every time we click on our phones beyond the password screen. This content is aggregated and transformed to be used by whoever can convert it to revenue with some privacy-related exceptions.

Once an engine has a database of aggregated content, it can attempt to monetize it by finding consumers looking for parts of that content, or ideally by monopolizing an audience in a specific area. Big data, stored in a way that allows quick access is like a black hole, curls the space around it and makes things happen that would otherwise be impossible. This doesn’t change this content’s nature – it’s aggregated from external sources and can’t exist without these sources. Google, for example, produces very little public content.

For the majority of the existence of the Web, the market for such aggregation was dominated by tools that would also link to the information source and share the traffic or the profit so the information source survives. Wikipedia demands a source for everything and the sources are part of each article. Google links to websites. Foursquare would link to that nice restaurant’s website. Some services would directly share revenue. We grew up with this approach and it sounds fair. But it seems like it is going to be challenged again.

Aggregators are becoming the source of truth

I observed some questionable developments around aggregation over the last 5-10 years. Google, for example, has been motivated to keep clicks within the service and it shows. They built an information source called Knol, developed a mechanism for hosting the entire web called AMP, built Google Maps, and integrated summaries in the search results. I can now learn everything about my favorite actors, for example, and see their photos without ever leaving Google. I can make 10-15-20 content-related clicks and still never hit one that leads outside of Google.

When individuals do that very same thing, it raises eyebrows. People would copy/paste and do slight modifications for homework, write a paper, trick crawlers, farm Karma on Reddit, or who knows what else. Other people have tried and succeeded in authoring books with slightly modified content from other sources. Rewording text, translating it from a foreign language, and editing photos can make them hard to trace back to the author has been a practice that’s frowned upon and sometimes challenged with legal actions.

Now that ChatGPT appeared, “AI” is the new big thing. It does not look like an intelligent bot to me, though. It looks like the ultimate copy/paste engine, no wonder it’s so good at writing homework. It has no own knowledge but it appears like it knows everything. It successfully uses other people’s creative efforts and then shares it like it just knows it out of nowhere, not citing the sources or sometimes citing without providing links. It has the knowledge, just chooses not to share, unless asked. I’ve not checked how many people work on it, I would assume thousands, but I doubt any of them are content creators. Expert scrapers – probably, experts in aggregation – likely, big data – most certainly, experts in human language processing – absolutely.

Consequences of this trend

In case ChatGPT completely replaces Google, the traffic to the original creators will decrease, although thanks to Google’s tactics, it might not decrease by much. Why should a visitor read a lengthy blog post if a bot can present a brief summary of that effort without even mentioning who created it? The aggregators and the consumer are both benefitting, it’s just that the content creator is now turned into a free “trainer” of someone’s bot. We’ll all start consuming AI-rewritten text until that breaks too. Given that AI’s text has no creativity, will, or its own ideas, every time one of us consumes it, it diverts thought and effort away from the actual creation effort.

The trend is concerning but I doubt we can do much about it. Nothing is more powerful than an idea whose time has come. ChatGPT is part of the negative trend of aggregators claiming ownership of people’s content but it may also be convenient. It’s convenient in the same way as The Pirate Bay – it has most movies created with that small issue that the service disregards the will of the people who created the movies. Just like The Pirate Bay, Google, Bing, and ChatGPT can’t exist without the people who created the underlying content they use to generate all these clicks.

I personally hope that people will push back against AI’s content rewrites and focus on services that are fair. I also hope that the disruption that’s coming will crack the near-monopoly over search. Some good things might grow in the cracks, or may not.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s