angertard
Beneviolent Retard For Life
- Joined
- Nov 23, 2024
- Messages
- 52
- Reaction score
- 108
- Awards
- 22
So I was thinking about this funny tweet. It was something like
And twitter slapped it with
So "oxygen" and "frequency" in a tweet could get a warning box, no covid mention required. One of the ascii characters in the tweet was a degree symbol, which I think was somehow labeled as related to the 5g networks, a lowercase letter o must have registered as being about oxygen.
And it had some characters at the end which just looked like an ascii imitation of fairydust, particles, bubbles or something.You are gay
And twitter slapped it with
Which was funny! The algorithm they were using to detect covid-19 disinfo did something basically just like checking boxes, and some of those were triggered just by all the ascii characters at the end. The filter was basically smacking a warning label down on anything that had words or synonyms for things related to the virus, respiratory symptoms, or 5g networks. I would argue this special attention is because the conspiracy theory most likely to be harmful to twitter's business (internet content distribution) was people blaming 5G networks for covid symptoms.Get the Facts about Covid-19!
Companies like netflix highly prioritize peering with internet service providers who can provide high quality reliable transit, because even just a few artefacts or a moment of buffering ruin immersion. You can imagine a similar thing with twitter, right? If your internet buffers, your scrolling reflex is interrupted and you close the tab. Content hubs have huge benefits to reap from every new generation of internet technology.
Now I'm not sure if B is feasible at all. Like in the tweet example in the spoiler above, the conspiracy was probably chosen not because it was true but because it was most inconvenient. If I pitched a government agency on catching and hiding classified information whenever it appears on the internet... well, I'd imagine they'd give me a hard time when I tried to put the stuff on a server, even just by interacting with one of google's servers or something. If they did say, "Yeah, censor anything about the fitbit antarctica fiasco" I don't think they'd give me specific military base coordinates to censor, they'd just want me to censor all proposed coordinates, excepting maybe one decoy location of a legitimate and publicly known facility just to confuse and delegitimize where possible.
Then again, it isn't like this sort of thing (clever hacking based on a poorly considered release of information) is unprecedented. For example, I remember there was a twitter user who used natural language processing to data mine mentions of codenamed operations in publicly released government documents, clustering them by their context in documents where they were mentioned. I didn't dig deep into their results, but it shows that this kind of thing can work, when people don't think about all the implications while designing a process.
Is there value in type A censorship knowledge? Like, would it be interesting to know ALL the no-no terms that the chinese great firewall has configured in their cisco-provided censorship boxes? "Wait wait- I know about Tiananmen square, but what the hell is this stuff about a ______ _______?" Also wikipedia says Cuba, Iran, Vietnam, Zimbabwe and Belarus do similar things, it would be cool to see if any of them have slipped up and censored something currently obscure. Or like I mean, it would be a total bombshell if google was suppressing something like net neutrality (now that they've rolled out a bunch of fiber) or specific security/privacy concerns, right? Even if it gave us no clues about the validity of concerns/stances, it would at least look really bad.
On the other hand, boring "we knew that" censorship is just a widely known standard. Blackhatworld (SEO forum) shut down all chatter about the coronavirus on their site because google didn't like it. "Hey guys, don't mention the 'demic..." Not a big secret, they just couldn't handle the heat. Skip scanning these terms.
I think you could do china-style firewall term scanning pretty cheaply if I understand how it works correctly.
1. set up websites with a bunch of almost-plausibly censorable text on the external internet
2. rent a virtual private server inside censored country to be probed, e.g. China
3. try to reach some sites, if you can't get through then log what the site contents are using your non-china internet access
For step 1, I've seen weird websites show up in searches where the words are there for no plausible reason, I think they might be communication channels for botnet command and control? Anyway the point is you don't have to roll your own gibberish sites just to start trying this out. (I don't suggest going looking for these on your own computer, like since it's malware related?) Fresh sites are required every time, because the firewall remembers if you've been a bad boy- so you can't just trust that because a site is censored that something on it right now is a trvth nvke, it could be that they posted a trvke long ago or something. But presumably if a site wasn't blocked and then it became blocked, that would be a faster way to pinpoint "oh, they just tripped a censor tripwire"
Now with checking google for censorship, honestly I don't know. It isn't clear to me how hard it is to get a site put up high in a search. Difficulty can be broken up into two issues: consistency and expense. I can always pay a guy from new delphi to make a new website but how much does it cost to get backlinks or whatever so bing or googly will put it in a search for "Bill Gates Bolivia vampire double rape suicide" but if it really costs a lot that could suck. It also would be useless if backlink quality is just hard to pin down and I think that one of my links is blocked 1/10 times but on another trial it just turns out that I didn't have enough backlink clout or whatever. I guess if I think about it for a few minutes I can optimize this, it's probably optimal to over-boost your sites just so you minimize the chance that your trial was a failure for that reason.
Actually, you could just boost the site again after the first failure right? If it keeps out the search after the google bot has re-scanned then you know they aren't letting you in even with a bunch of clout. Or maybe the google bot would just not revisit sites that are suppressed for censorship purposes? If it does that, then that's the tell and that's actually a little more scalable than renting a residential proxy (so you can search google a bunch) because you would just need to check the logs to know that they're discriminating against your clickbait shit site.
Would google just manually check if you are a gibberish site and skip censoring you if you are an obvious nonsenseGOD? Running a small large small language model to generate more realistic content would be more slower, but they're the ones who have to deal with this stuff at internet scale so at least it is assymetric in my favor. Also they might have automated it with AI which would be easier to cheat, and checking if you are gibberish is a measure that only works if they think someone is doing this and they want to defend against it.
1. set up websites with a bunch of almost-plausibly censorable text on the external internet
2. rent a virtual private server inside censored country to be probed, e.g. China
3. try to reach some sites, if you can't get through then log what the site contents are using your non-china internet access
For step 1, I've seen weird websites show up in searches where the words are there for no plausible reason, I think they might be communication channels for botnet command and control? Anyway the point is you don't have to roll your own gibberish sites just to start trying this out. (I don't suggest going looking for these on your own computer, like since it's malware related?) Fresh sites are required every time, because the firewall remembers if you've been a bad boy- so you can't just trust that because a site is censored that something on it right now is a trvth nvke, it could be that they posted a trvke long ago or something. But presumably if a site wasn't blocked and then it became blocked, that would be a faster way to pinpoint "oh, they just tripped a censor tripwire"
Now with checking google for censorship, honestly I don't know. It isn't clear to me how hard it is to get a site put up high in a search. Difficulty can be broken up into two issues: consistency and expense. I can always pay a guy from new delphi to make a new website but how much does it cost to get backlinks or whatever so bing or googly will put it in a search for "Bill Gates Bolivia vampire double rape suicide" but if it really costs a lot that could suck. It also would be useless if backlink quality is just hard to pin down and I think that one of my links is blocked 1/10 times but on another trial it just turns out that I didn't have enough backlink clout or whatever. I guess if I think about it for a few minutes I can optimize this, it's probably optimal to over-boost your sites just so you minimize the chance that your trial was a failure for that reason.
Actually, you could just boost the site again after the first failure right? If it keeps out the search after the google bot has re-scanned then you know they aren't letting you in even with a bunch of clout. Or maybe the google bot would just not revisit sites that are suppressed for censorship purposes? If it does that, then that's the tell and that's actually a little more scalable than renting a residential proxy (so you can search google a bunch) because you would just need to check the logs to know that they're discriminating against your clickbait shit site.
Would google just manually check if you are a gibberish site and skip censoring you if you are an obvious nonsenseGOD? Running a small large small language model to generate more realistic content would be more slower, but they're the ones who have to deal with this stuff at internet scale so at least it is assymetric in my favor. Also they might have automated it with AI which would be easier to cheat, and checking if you are gibberish is a measure that only works if they think someone is doing this and they want to defend against it.