Thoughts on Content Aggregation

TL;DR – ChatGPT is the last occurrence of a long trend for content aggregation and using other people’s content without linking the source. Reasoning:

Aggregated content is worth money

The modern human generates content all the time. It’s very clear that we do it when we post photos on Facebook or write lengthy text on our personal blogs but it’s not just that. Our browser history is content, our search history is content, and our cookies are worth something to the advertisers. We generate content by clicking the TV remote, and most likely by even speaking in the presence of a smart device with a microphone. We generate content every time we click on our phones beyond the password screen. This content is aggregated and transformed to be used by whoever can convert it to revenue with some privacy-related exceptions.

Once an engine has a database of aggregated content, it can attempt to monetize it by finding consumers looking for parts of that content, or ideally by monopolizing an audience in a specific area. Big data, stored in a way that allows quick access is like a black hole, curls the space around it and makes things happen that would otherwise be impossible. This doesn’t change this content’s nature – it’s aggregated from external sources and can’t exist without these sources. Google, for example, produces very little public content.

For the majority of the existence of the Web, the market for such aggregation was dominated by tools that would also link to the information source and share the traffic or the profit so the information source survives. Wikipedia demands a source for everything and the sources are part of each article. Google links to websites. Foursquare would link to that nice restaurant’s website. Some services would directly share revenue. We grew up with this approach and it sounds fair. But it seems like it is going to be challenged again.

Aggregators are becoming the source of truth

I observed some questionable developments around aggregation over the last 5-10 years. Google, for example, has been motivated to keep clicks within the service and it shows. They built an information source called Knol, developed a mechanism for hosting the entire web called AMP, built Google Maps, and integrated summaries in the search results. I can now learn everything about my favorite actors, for example, and see their photos without ever leaving Google. I can make 10-15-20 content-related clicks and still never hit one that leads outside of Google.

When individuals do that very same thing, it raises eyebrows. People would copy/paste and do slight modifications for homework, write a paper, trick crawlers, farm Karma on Reddit, or who knows what else. Other people have tried and succeeded in authoring books with slightly modified content from other sources. Rewording text, translating it from a foreign language, and editing photos can make them hard to trace back to the author has been a practice that’s frowned upon and sometimes challenged with legal actions.

Now that ChatGPT appeared, “AI” is the new big thing. It does not look like an intelligent bot to me, though. It looks like the ultimate copy/paste engine, no wonder it’s so good at writing homework. It has no own knowledge but it appears like it knows everything. It successfully uses other people’s creative efforts and then shares it like it just knows it out of nowhere, not citing the sources or sometimes citing without providing links. It has the knowledge, just chooses not to share, unless asked. I’ve not checked how many people work on it, I would assume thousands, but I doubt any of them are content creators. Expert scrapers – probably, experts in aggregation – likely, big data – most certainly, experts in human language processing – absolutely.

Consequences of this trend

In case ChatGPT completely replaces Google, the traffic to the original creators will decrease, although thanks to Google’s tactics, it might not decrease by much. Why should a visitor read a lengthy blog post if a bot can present a brief summary of that effort without even mentioning who created it? The aggregators and the consumer are both benefitting, it’s just that the content creator is now turned into a free “trainer” of someone’s bot. We’ll all start consuming AI-rewritten text until that breaks too. Given that AI’s text has no creativity, will, or its own ideas, every time one of us consumes it, it diverts thought and effort away from the actual creation effort.

The trend is concerning but I doubt we can do much about it. Nothing is more powerful than an idea whose time has come. ChatGPT is part of the negative trend of aggregators claiming ownership of people’s content but it may also be convenient. It’s convenient in the same way as The Pirate Bay – it has most movies created with that small issue that the service disregards the will of the people who created the movies. Just like The Pirate Bay, Google, Bing, and ChatGPT can’t exist without the people who created the underlying content they use to generate all these clicks.

I personally hope that people will push back against AI’s content rewrites and focus on services that are fair. I also hope that the disruption that’s coming will crack the near-monopoly over search. Some good things might grow in the cracks, or may not.

Superhero movies

I’ve been reading /r/comicbooks and found this gem that made me laugh and think

Two superheroes I’ve never heard of happened in a well-known superhero universe and, of course, they saved Earth.

I wonder, what makes people like superheroes so much? Most of them are so overpowered that it takes a major leap of the imagination to find them worthy opponents and make the shows. And why would there be any opponents anyway? The superheroes are so willingly fighting each other that each of these superhero universes (Marvel, DC, etc) should just naturally reduce itself to a state with 1 superhero overlord and no opposition like a natural Squid game.

The supervillains are also way too easy in the sense that the entire evil is focused on one person or a person and their handful of helpers. The modern-day great evil is usually living in the shape of ideas that poison people’s minds. “My country is better than your country, and half of your country used to be part of my country” – as an example, but there have been plenty of variations. Bulgaria on three seas. Кримнаш. There’s no way to personalize the rot when it’s an idea, even if the idea has been spread by carefully organized propaganda. This makes the supervillains and superheroes boring.

Nothing is more powerful than an idea whose time has come.

Victor Hugo

Poor Man’s Bitcoin

The communism withdrew from Bulgaria in 1989 and when the political police became unemployed, all sort of weird new things popped up to fill the gap. Grocery stores and supermarkets. New TV channels. More than one kind of ice cream. Fortune tellers. Horoscopes. Insurance racket. Chainmail too – you rewrite this letter 5 times and put it in 5 mailboxes and you’ll live a long and happy life. You don’t and you’ll die in pain. Multiple testimonials included.

One kind of chain mail had a price tag and wasn’t supposed to work, but worked for a while. Let’s call it The Poor Man’s Bitcoin. Here is roughly how it worked, excuse my faint memories for any inaccuracies.

There’s a sheet of paper, cut from a notebook, handwritten by a person who we can call “The Seller”. That sheet of paper contains the rules of the chainmail and 6 home addresses or PO boxes. Rules are as follows:

  • You need to send 2 Leva to the last 5 people in the chain, and also 2 Leva to the person who invented it (the number varied)
  • The way you prepare new copies is by rewriting the sheet and filling your name at the bottom of the list of 5 people in the chain, removing the first one
  • It contains terrible curses that will reach you if you violate the rules, sell more or less than 5 sheets of paper, or don’t mail money to the people in the chain
  • In order to get your money back, you need to temporarily become “The Seller”.

If you follow the rules closely, you should quickly get your money back – just find 5 people to buy your sheet of paper and letters with money would start flowing. Does it sound familiar?

  • It inflated algorithmically
  • Its intrinsic value was zero
  • The only things that kept it living for some time were people’s feelings and beliefs, mostly greed and fear of missing out
  • Some people loved it and were willing to fight that they’ll become rich once they get back their thousands of letters
  • It worked because people didn’t really understand how it works and look past the first 5 or 25 people who would give them money
  • Each transaction benefitted the author of the scheme + the regular system that guaranteed money transfer (the post)
  • Everyone could start their own chain letter

Once most of us were exposed to it, the number of letters turned out to be disappointing, and most people realized that these curses that guarantee the distribution don’t work, it vanished to be replaced years later by lottery tickets. Same feelings but no need to be able to write.

I find it amusing that so many people consider Bitcoin a form of investment while it’s pretty much like the chainmail from the 90s. At least this is how I see things. I do not understand how it works.

If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.