Indeed, it seems that Google IS forgetting the old Web

  • Thread starter Deleted member 2362
  • Start date
  • This thread has been viewed 1381 times.

Deleted member 2362


Français 2018-01-17

XML pioneer and early blogger Tim Bray says that Google maybe suffers of deliberate memory loss. I may have found more evidence that this is the case.
Bray writes that: "I think Google has stopped indexing the older parts of the Web. I think I can prove it. Google's competition is doing better."

Bray's case: a 2006 review​

Back in 2006, Bray published on his own blog a review of a Lou Reed album. A few days ago, when he needed to quote again the URL of that review, he realized that he himself couldn't find it via Google "Even if you read them first and can carefully conjure up exact-match strings, and then use the "site:" prefix".

My own case, again from 2006​

Back in 2006, I published on one of my domains, digifreedom.net, the opinion piece "Seven Things we're tired of hearing from software hackers". A few years later, for reasons not relevant here, I froze that whole project. One unwanted consequence was that the "Seven Things", together with other posts, were not accessible anymore. I was able to put the post back online only in December 2013, at a new URL on this other website. Last Saturday I needed to email that link to a friend and I had exactly the same experience as Bray: Google would only return links to mentions, or even to whole copies, but archived elsewhere. I asked Google to reindex this whole website, but nothing changed. Yesterday afternoon, through BoingBoing I discovered Bray's post. As soon as I read it, I tried DuckDuckGo and got the same result: Google ignores my copy of my own post, DuckDuckGo correctly lists it as first result (click here for high-resolution screenshot):
Indeed, it seems that Google IS forgetting the old Web /img/google-forgets-duckduckgo-not.jpg

Different stories, same practical result?​

Unlike Bray's, my own post disappeared from the Web for a while, and then reappeared with the original date, but only after a few years, and in a different domain. This is an important difference which may mean that, in my case, part of Google's failure is my own fault. Still, for all practical purposes, the result is the same:

DuckDuckGo gives as first result the most, if not the only correct answer to whoever would be interested in that post today: the current link to the original version, on the (current) website of its author. DuckDuckGo gets things right. Google does not (not at the time of writing, of course).

Bray concludes that Google deliberately forgets old pages, because "indexing the whole Web is crushingly expensive, and getting more so every day" and Google "cares about giving you great answers to the questions that matter to you right now". His conclusion is that, if the Web should be "a permanent, long-lived store of humanity's intellectual heritage... it needs to be indexed, just like a library. Google apparently doesn't share that view."

I agree with that conclusion. For the same reason, I also find misleading the title of BoingBoing's report of this story: "Google's forgetting the early web". The two posts mentioned here are not "early web", nor really "old". Unless we're all missing something here, it seems more correct to say that Google forgets stuff that is more than 10 years old. If this is the case, Google will remember and index a smaller part of the web every year. Google may do so simply because it would be impossible to do more, for economical and/or technological constraints, which sooner or later would also hit its competitors. But this only makes bigger the problem of what to remember, what to forget and above all who and how should remember and forget.
 

Jessica3cho雪血⊜青意

ばかばかしい外人
Gold
Joined
Aug 11, 2021
Messages
1,331
Reaction score
3,242
Awards
236
Website
recanimepodcast.com
Google is a sham. They lie, deliberately, constantly, and do everything in their power to pull the wool over consumers eyes. The internet has become too vast too quickly for Google and they pretend they are the bleeding edge. They are a shell of a company that does not even know how to build a website that does what it claims it does. They waste all their money on late-to-the-show products that fail, all the while abandoning what made Google so big in the first place.

For reference: https://forum.agoraroad.com/index.p...n-village-proof-of-dead-internet-theory.3554/
 
Virtual Cafe Awards

inverse_square_matrix

Citizen of the Internet
Joined
Nov 3, 2021
Messages
36
Reaction score
71
Awards
11
Website
inversesquarematrix.neocities.org
Damn, I knew Google search was kinda shitty, but I thought that just doing some hacks to remove ads and improve the page was enough. I guess I will avoid Google search entirely when trying to find anything that's may not be in the central websites.
 
Virtual Cafe Awards
They index what will make them money and delete the crud that doesn't. They are a business that is accountable to shareholders and need to show consistent gains. I bet if gramps there with his 2006 website drove enough ad revenue, it'd be found easy on google.
 
Virtual Cafe Awards

14-27

Really? This is what you are going with?
Joined
Oct 20, 2021
Messages
182
Reaction score
376
Awards
63
Google was the gateway drug to the biggest mindkontrol operation in history. Most people trust Google results over their own frigging memories nowadays. Google routinely inserts history, and deletes it from public awareness. Some remember, and are called crazy.

Check out the studies where the majority of people answered test questions wrong, because they were deliberately exposed to other peoples wrong answers. Then swore up and down that the answers they gave were correct after the fact.

Google is literally the Ministry of Truth.
 
Virtual Cafe Awards
Google was the gateway drug to the biggest mindkontrol operation in history. Most people trust Google results over their own frigging memories nowadays. Google routinely inserts history, and deletes it from public awareness. Some remember, and are called crazy.

Check out the studies where the majority of people answered test questions wrong, because they were deliberately exposed to other peoples wrong answers. Then swore up and down that the answers they gave were correct after the fact.

Google is literally the Ministry of Truth.
I use ecosia because I care about trees and poor people who like trees. Every 3 NSFW searches equals one new tree planted somewhere poor. They use bing search results which often arent as good as the google results, but are often adequate enough for my purposes for NSFW and not.

Not sure what the deal is with them getting bing searches maybe they are free? Maybe its a tax writeoff for bing/microsoft.
 
Virtual Cafe Awards

14-27

Really? This is what you are going with?
Joined
Oct 20, 2021
Messages
182
Reaction score
376
Awards
63
Every 3 NSFW searches equals one new tree planted somewhere poor.

Man, back in the day I would have replanted the entire Amazon if I used Ecosia! In fact, there would be no arable land left in South America at all. bchmmmmm

I haven't used porn in ages, (like several of months at least), so I don't know if this is still true, but, Bing was way better at NSFW for at least a year in my experiences. Weird huh? They used to suck, but, Google is utterly useless now at finding ANYTHING related to the terms I enter. Except for the strictly technical searches, like BASH commands, bug fixing etc. They still seem to be king on that side of things. It is horrifically bad otherwise.

When I go "off Meds", I start to seriously entertain notions of being personally messed with, by the AI running Google nowadays. It feels like it is returning the screwiest, most unrelated nonsense it can possibly concoct. Especially, when I try to use quotes. I think I upset the AI when I used to type long schizoid rants into the search bars, hoping to antagonize the Glowies. I used to be a heavy drinker, and one of my self amusing pastimes, on the other side of shitfaced, was attempting to get the notice of the dicks snooping in everyone's business (so I could tell them in the most graphic terms, how little I thought of them). But, I swear, ever since then, something has been tailoring search results just to annoy me...

Garfield And Friends What GIF by Boomerang Official
 
Virtual Cafe Awards

s0ren

who cares
Bronze
Joined
May 25, 2021
Messages
353
Reaction score
1,051
Awards
115
honestly i don't think we can ascribe intentionality to "google". i doubt google is "forgetting the old web", it's more likely that the new members simply didn't grow up with the old web
That's not really how this works. It's not that people have directly decided not to index older things (in this case anyway, I'm sure that certain things are declared "do not index" for various political reasons). It's that they have designed their indexing process to exclude certain content via opaque criteria. I think @Pangolin's post is right that this is just based on revenue models. The problem with this is that people have come to rely on private search engines to be the method to navigate the never-ending sea of internet content, but these profit incentives make it so certain things will become inaccessible unless you know how to directly get to it. I think intentionality can be ascribed in that I'm sure those responsible for designing the indexing system know that this is the case and select not to care. The question really is whether we can assign blame/negativity to these actions. I think probably not, what we really need is a publicly funded/not evil corporation search engine.
 
Virtual Cafe Awards
That's not really how this works. It's not that people have directly decided not to index older things (in this case anyway, I'm sure that certain things are declared "do not index" for various political reasons). It's that they have designed their indexing process to exclude certain content via opaque criteria. I think @Pangolin's post is right that this is just based on revenue models. The problem with this is that people have come to rely on private search engines to be the method to navigate the never-ending sea of internet content, but these profit incentives make it so certain things will become inaccessible unless you know how to directly get to it. I think intentionality can be ascribed in that I'm sure those responsible for designing the indexing system know that this is the case and select not to care. The question really is whether we can assign blame/negativity to these actions. I think probably not, what we really need is a publicly funded/not evil corporation search engine.
I am not sure if publicly funded would even be of any benefit since the government would have a clear conflict of interest to bury items that are considered harmful to the ruling regime. Even if such a thing were possible, disagreement would ensue over how the search rankings are calibrated, assuming that the algorithm is visible to begin with.

It also opens potential freedom of speech issues if a website is rated lower than another. Maybe a search engine as known today isnt the right tool, but instead some sort of directory organized by categories and keywords to ensure that the website is easily found. Maybe even a decaying user rating for some of the better websites to ensure that there is a chance for the smaller less well known websites to become notable.

Plus this is the government we are talking about so expect it to cost at least tens of millions, overbudget, late as hell, and comprised of entirely shit-code written by only the worst programmers found behind a dumpster of an Arbys, and supervised by politicians with vested interests in keeping jobs in their area for reelection purposes.
 
Virtual Cafe Awards
oh and don't forget they'll probably purchase code and staff from some vendor who will overcharge and underdeliver and ultimately deliver some proprietary solution so you can't see the internals of the system. Ring any bells?

At this point the bells should be roaring.
 
Virtual Cafe Awards

s0ren

who cares
Bronze
Joined
May 25, 2021
Messages
353
Reaction score
1,051
Awards
115
@Pangolin, sorry I didn't mean public as in necessarily government-operated. Poor choice of words, something like an NGO could work too. What I think is important is to have a diversity of options actually, so a government-operated search engine, various private ones, NGOs, etc. Having a diversity of options will counteract censoring by having multiple places engaging in indexing and disincentive it via the market. You could then feasibly have second-order options which pull from this set of search engines to provide a curated and accessible experience. Only having private options right now dampens things.

The idea of alternatives is interesting but I can't imagine anything else that would work as effectively.
 
Virtual Cafe Awards