Google Should Pay for Crawling

It’s time for the EU and other regulators to reconsider the deal we’ve made with search engines and how companies like Google are redefining it without consent.

Originally, we allowed Google and other search engines to index our content for free in exchange for traffic. This made sense: we paid for hosting, created content, and in return got visitors from search. They profited from ads and reordering the search results in favor of advertisers. But the rise of generative AI has changed the terms.

Now, Google uses our content not just to link to us, but to generate full answers on its platform, keeping users from ever visiting our sites. This shift erodes the value we once received. Meanwhile, Google and a small group of others continues to monetize the interaction through ads and AI subscriptions.

Google Search results these days barely feature any links and highlight internal content

SEO experts talk about “GEO” (Generative Engine Optimization), but the reality is that no clear playbook exists for it, and most content creators are seeing less and less return. There’s no proven way to optimize for Gemini or OpenAI’s models, especially when those tools don’t send much traffic back. The only instance of GEO I’ve seen was with a meme. Some prankster optimized (on purpose or not) a tweet about the size of Blue Whale’s vagina in comparison with a specific politician and Gemini picked it up.

At the same time, website owners still bear all the costs: paying for hosting, paying for content creation, and now even for AI tools that were trained on their own data. This suffocates the open web so that the LLM companies can sustain a hokey-stick growth to the trillions of valuation.

Should crawling be free in the AI era?

OpenAI at least pays for access to datasets. But many of these datasets were built through unrestricted crawling or by changing terms of services after the fact. Google doesn’t even do that. It simply applies its AI to search and displays that back to the user.

Regulation is needed yesterday. The EU’s Digital Markets Act already limits self-preferencing. Why not extend rules like it to web data? Possibilities include:

  • A licensing system for crawlers
  • Mandatory transparency around crawling and training data
  • A revenue-sharing model for publishers

And GEO will likely turn out to be more of the same endless content spam generation to feed it into the models, exploiting knowledge about how these models scrape data. It doesn’t feel useful yet and if that’s the future, we can only expect the enshittification of generative AI.

AI Translate

The advancement of AI in translation tools makes my blogging more difficult. I get strange results when trying to find a word and get a clear reminder that I should not overly rely on this tech.

The Bulgarian word for “Cleaver” is translated as “Satyr” because of the proximity in letters. The Bulgarian word for “Wild Plums” is translated as “Junkies”. The Lungwort plant is translated as lungs recurringly over a variety of tools.

I tried ChatGPT and it’s better but still fails, and you can’t really trust a tool that fails for unknown words without checking elsewhere.

Bing Translate can do formal or informal translations but both are questionable. The 3 words above produced 4 different mistakes and 0 correct hits.

I’m switching back to using a dictionary for now. The type of assistance I need is not served well by AI-based translation tools. Convenience-wise, they’re super quick and convenient but not accurate yet.

Should I worry about Google

The fellow WordPress.com blogger, Weirdo82, complains that Google hasn’t discovered their site yet. I decided to check what my own presence on Google is with the Google Search Console (WordPress.com’s support doc).

Google knows about my site but sends no traffic, based on ~2 days of data. Google’s knowledge of my site is limited to a very small number of indexed pages. Most posts aren’t there.

Why is that and should I care about it?

  • Google ranks sites based on an ever-changing algorithm. When Google started, the top factor for ranking was inbound links – if other sites link to you. This, however, has been abused by SEO experts from day one. People would buy countless assets, use them for linking and rank themselves high with some garbage content. So, Google pushed back by adding more and more factors and so far, the battle is ongoing. They are losing it in areas with a high commercial interest but no brands. Finding a human plumber in Bulgaria with Google is no go.
  • Blogs are by nature not great at coming up with unique searchable titles. When I blog about how pretty my cup of coffee was, should I realistically expect to be ranked? Probably not.
  • Blogs that are great with titles and topics that Google wants to see are generally unreadable by humans. I see the content written for bots all the time – 3-4 pages long so that it is considered quality content by the AI overlords. It would have multiple headers in the middle, each with a list from 1 to 10 or so. When I see that, I wish I had a block button to never see it again.

Google would send me 100s of hits per day to my former blog but all of these were to 10-year-old posts I didn’t care about. I value one comment on my latest post more than a 1000 of these Google clicks. So the answer is no – for a personal blog, Google doesn’t matter. Optimizing for their ever-changing algorithm would make my site worse.

I care about Google, what do I do

Google’s strategy is to make you pay for traffic, this is what made them so big. They farm people’s desire to be found, to sell products, and to grow. So, what else?

  • Bring blogrolls back – exchange links or quotations with people. Google doesn’t mention links as SERP factors anymore but they’re likely still using them
  • Write frequently – you never know which cringe post will be ranked and will boost your site by 100-200 views per day
  • Buy a domain early and register it for a long time
  • Get some social media shares, even if it’s just you sharing your own content. It likely counts.

I would keep acting like Google doesn’t exist for the context of this blog. It would be flattering to have more traffic but I still value human interactions (likes & comments) more than random traffic to old posts. These I get primarily from the Reader, Facebook, and Twitter.