• I added an agora current events board to contain discussions of political and current events to that category. This was due to a increase support for a separate board for political talk.

Things That Break AI

  • Thread starter MAKSI
  • Start date
  • This thread has been viewed 732 times.

MAKSI

United Nations Intelligence Agent
Joined
Feb 24, 2024
Messages
97
Reaction score
418
Awards
54
I've considered using paradoxes to break certain AI bots, but most are self-aware, unless you trick them into ignoring their self-awareness, what ways do you think AI could be broken with?
 

Taleisin

Lab-coat Illuminatus
Bronze
Joined
Nov 8, 2021
Messages
687
Reaction score
4,045
Awards
226
This section (pic rel) of this book would provide all the info needed to break GPT. AI right now is incapable of nuanced metacognition, and only emulates nuance. Very easy to double-bind and that's basically how all "hack" prompts work.

1722505593543.png
 
Virtual Cafe Awards

Hypatia

Active Traveler
Joined
Jul 20, 2024
Messages
189
Reaction score
538
Awards
71
AI are predictive models, and thus will latch onto whatever they think the pattern of the sentence question is. There is an entire class of questions that humans can easily do, but AI basically cannot due to how they work.

This is why AI can get seemingly simple math problems wrong.
 
Virtual Cafe Awards

alCannium27

Well-Known Traveler
Joined
Feb 15, 2023
Messages
768
Reaction score
2,535
Awards
201
Sure, AI models have problems with prompt alignment, but unless ypu hook one up to a missile silo, it's unlikely it will ever blow it self up.

The worse jailbreak and other priompt attack can do is make it output shit the sites/systems dont want. Such as putting neega in Suno to make it sing the N word
 
Virtual Cafe Awards

alCannium27

Well-Known Traveler
Joined
Feb 15, 2023
Messages
768
Reaction score
2,535
Awards
201
There's something called the glaze, which is an image filter designed to deliberately sabotage LLM picture training. Don't think the same could be done to text though.
The two researchers who designed the attacks against this nonesense and then share their finds outta be shot
 
Virtual Cafe Awards

7Pebbles

Enemy of the Digital Panopticon
Joined
Jul 25, 2022
Messages
112
Reaction score
344
Awards
53
Adversarial Camouflage is fascinating.

On a somewhat less scientific note, I wonder about running models backwards. I saw an example somewhere where a image analyzer was made to generate the pure characteristics of "woman". It was pretty uncanny, and got me thinking that maybe images generated this way would somehow be more recognizable to an AI than reality itself. I imagine that this would be a pretty model-specific attack though.
 
Virtual Cafe Awards
  • Thinking
Reactions: 4d1

llillilll

Well-Known Traveler
Joined
May 26, 2021
Messages
1,081
Reaction score
4,131
Awards
245
Website
b4rkod.xyz
Awful thread but read this https://www.bing.com/webmasters/help/webmasters-guidelines-30fba23a. Especially this part;

Prompt injection:
Do not add content on your webpages which attempts to perform prompt injection attacks on language models used by Bing. This can lead to demotion or even delisting of your website from our search results.
 
Virtual Cafe Awards

macrobyte

Well-Known Traveler
Joined
Jun 18, 2023
Messages
318
Reaction score
961
Awards
104
Website
microbyte.neocities.org
On a somewhat less scientific note, I wonder about running models backwards. I saw an example somewhere where a image analyzer was made to generate the pure characteristics of "woman". It was pretty uncanny, and got me thinking that maybe images generated this way would somehow be more recognizable to an AI than reality itself. I imagine that this would be a pretty model-specific attack though.
There was a 2600 article about this semi recently, where someone put 3 AIs together, a generator and 2 detectors, and then was able to make it produce images of cats IIRC, that looked nothing like cats to humans, but the "target" classifier (the one you want to screw with) viewed as cats. I'm gonna look for that article later, and then scan it if I find it.
 
Virtual Cafe Awards