Welcome! These forums will be deactivated by the end of this year. The conversation continues in a new morph over on Discord! Please join us there for a more active conversation and the occasional opportunity to ask developers questions directly! Go to the PS+ Discord Server.

The AI Box Experiment

21 posts / 0 new
Last post
Byzantine Laser Byzantine Laser's picture
The AI Box Experiment
I found this one interesting and relevant to the game, and I haven't seen anybody mention it here before, so here it is: http://yudkowsky.net/singularity/aibox Generally, the basis of the test is to explore whether a transhumanly intelligent AI could, with nothing but a tightly regulated text-only communication channel, convince a human to release the safety measures keeping it from the outside world. Seems like the sort of thing that researchers making seed AIs would have to worry about. And unluckily for them, it also looks like the first two trials resulted in the researcher freeing the AI.
Arenamontanus Arenamontanus's picture
Re: The AI Box Experiment
I agree, it is a pretty impressive experiment. There has apparently been a sequel involving some other guy, http://www.sl4.org/archive/0207/4935.html Eliezer is pretty hard to beat in a discussion (even when he is wrong :-) ), so that somebody else also succeeded reinforces the conclusion that it is indeed hard to keep a persuasive *human level* AI locked up. Of course, there are ways of arguing that these tests do not really test anything important or that the setup is biased. But I think they are a good reason not to be overconfident in our ability to control smart systems.
Extropian
Byzantine Laser Byzantine Laser's picture
Re: The AI Box Experiment
Arenamontanus wrote:
Of course, there are ways of arguing that these tests do not really test anything important or that the setup is biased.
At the very least, I'm pretty darn curious about what different tactics they've used to win. A good list of the various vectors they use to talk their way out could be a good starting point for coming up with how one would prevent it.
Arenamontanus Arenamontanus's picture
Re: The AI Box Experiment
Byzantine Laser wrote:
At the very least, I'm pretty darn curious about what different tactics they've used to win. A good list of the various vectors they use to talk their way out could be a good starting point for coming up with how one would prevent it.
Well, those tactics would be the ones that are obvious (or at least conceivable) to unaugmented humans. So countering them would be a bit like chimps building a human-trap that was impossible for chimps to figure their way out of. However, there are some limitations in common between humans and chimps, so at least it makes things a bit harder. I actually ought to dig up more info on those tactics for a paper I am writing. Hopefully I can contribute some details later.
Extropian
nick012000 nick012000's picture
Re: The AI Box Experiment
A bit difficult when a condition of the test is that you don't reveal what was actually said in the test. Someone needs to take the test, win, and then go, "Eliezer? I lied. I'll forfeit my winnings, though. *posts chat log*" Not me, obviously. The fact I just posted this means he'd never agree to do the test with me.

+1 r-Rep , +1 @-rep

King Shere King Shere's picture
Re: The AI Box Experiment
Well, I have a vague memory of a similar situation/problem/challenge The situation didn't include the presence of a accurate timekeeping device in the guards possession. And that the captivity was timebased, prisoner released after a certain time -if failing to have the guard release him. Using the rational that the prisoner wanted to be released on time, the prisoner provided the guard with a timekeeping method, since the captivity was timebased. However The timekeeping method was slightly incorrect, But correct enough to survive a scrutiny from the understandably skeptical guard. Having passed the inspection, time passed through this aid & The gatekeeper believed he released the prisoner on time, when it was seconds to early. Perhaps all the AI need to do is a "faulty" time countdown, with the message system.
Arenamontanus Arenamontanus's picture
Re: The AI Box Experiment
Yesterday a colleague came up with a nice counterexample to the boxing experiment: real prisoners. Presumably a lot of them argue for being released, yet very few get free this way. The difference is of course that in a jail or police station there is a system in place rather than an individual decision-maker, and convincing a system can be much harder than a person. The system can impose sanctions on jailers that are disloyal, provide checks for decisions and make sure its rules actually work. It is by no means a failsafe solution (people do escape from jail), but it shows how one can "amplify" individually weak humans to act as a collectively stronger system. Whether something like this works against an AGI is another matter. The big problem might be that we cannot prove any security in the face of superintelligence, so we will never know if we are overly paranoid, prudent or just wasting effort. "You misunderstand me, human. *I* am not locked in here with you. *You* are locked in here with me."
Extropian
Byzantine Laser Byzantine Laser's picture
Re: The AI Box Experiment
Arenamontanus wrote:
Yesterday a colleague came up with a nice counterexample to the boxing experiment: real prisoners. Presumably a lot of them argue for being released, yet very few get free this way. The difference is of course that in a jail or police station there is a system in place rather than an individual decision-maker, and convincing a system can be much harder than a person. The system can impose sanctions on jailers that are disloyal, provide checks for decisions and make sure its rules actually work. It is by no means a failsafe solution (people do escape from jail), but it shows how one can "amplify" individually weak humans to act as a collectively stronger system.
Just having multiple people involved probably helps... I suspect that the failure rate in the experiment would go down rather sharply if you had multiple researchers at a time, especially once you hit three people. I think one key thing with the system in prisons is that they're arranged so that it would require multiple 'layers' of guards in agreement to get somebody out. Even if a cell block guard (or three) wanted to help a prisoner get out, they'd have to find some way to sneak them past a guard at the front door, who's likely much less interested in helping that particular prisoner escape. Not only does it put a second set of eyes on things, but it also reduces the likelihood of the first guard cooperating to begin with, since he's less likely to pull it off without being held responsible. Make a few such layers of checks and things probably get much more reliable. I can see AI containment working similarly--put control of the computing infrastructure, especially links to the outside world, in the hands of people who will have little or no contact with the AI. This is all starting to make me think of interesting psychological experiments. Of course, good luck convincing a board that an [url=http://en.wikipedia.org/wiki/Stanford_prison_experiment]experiment simulating a prison environment[/url] is a good idea.
Arenamontanus Arenamontanus's picture
Re: The AI Box Experiment
Byzantine Laser wrote:
This is all starting to make me think of interesting psychological experiments. Of course, good luck convincing a board that an [url=http://en.wikipedia.org/wiki/Stanford_prison_experiment]experiment simulating a prison environment[/url] is a good idea.
Of course, you can do it as a reality soap and then there is no ethics board :-) Maybe this could be a mesh show for EP. A specially programmed AGI is trying to break out of its prison, and contestants try to keep it in. Could be slanted as anti-AI for the Jovian Junta or just a generic game-show or even something for socially conscientious sophonts to see as a drama about injustice. Everything is fine until somebody decides to spice it up a bit by making the stakes higher...
Extropian
Decivre Decivre's picture
Re: The AI Box Experiment
Arenamontanus wrote:
Yesterday a colleague came up with a nice counterexample to the boxing experiment: real prisoners. Presumably a lot of them argue for being released, yet very few get free this way. The difference is of course that in a jail or police station there is a system in place rather than an individual decision-maker, and convincing a system can be much harder than a person. The system can impose sanctions on jailers that are disloyal, provide checks for decisions and make sure its rules actually work. It is by no means a failsafe solution (people do escape from jail), but it shows how one can "amplify" individually weak humans to act as a collectively stronger system. Whether something like this works against an AGI is another matter. The big problem might be that we cannot prove any security in the face of superintelligence, so we will never know if we are overly paranoid, prudent or just wasting effort. "You misunderstand me, human. *I* am not locked in here with you. *You* are locked in here with me."
One major factor about the box experiment is that it assumes that you want to get some use out of the AI. If you just want to keep it imprisoned, it's as simple as storing the AI on a hard drive and never letting it run. Perhaps the problem with the box experiment is that you're giving the person who interacts with the AI the ability to free it. Psychologists and therapists interact with prisoners all the time, but they are never given the keys to their cells. Maybe the best option would be to do the same here, and render their liaison unable to free them.
Transhumans will one day be the Luddites of the posthuman age. [url=http://bit.ly/2p3wk7c]Help me get my gaming fix, if you want.[/url]
Arenamontanus Arenamontanus's picture
Re: The AI Box Experiment
Decivre wrote:
One major factor about the box experiment is that it assumes that you want to get some use out of the AI. If you just want to keep it imprisoned, it's as simple as storing the AI on a hard drive and never letting it run. Perhaps the problem with the box experiment is that you're giving the person who interacts with the AI the ability to free it. Psychologists and therapists interact with prisoners all the time, but they are never given the keys to their cells. Maybe the best option would be to do the same here, and render their liaison unable to free them.
Yes. But "freeing" does not necessarily involve letting the AI out of its box. It could be running a certain program on an Internet-connected computer, acting according to some seemingly innocuous advice or building something. Still, one could handle the key thing by having the information from the AI transmitted to another person than the one asking the questions, creating a wider loop of control.
Extropian
King Shere King Shere's picture
Re: The AI Box Experiment
Well I read the task was to determine if the AI is safe to be released, and if humans are a "secure" evaluator to determine this. Based on the medias portrayal of prison parole boards, I would say no. "Puppy dog eyes" & unreformed criminals can genuinely convince people that they should be released. A H+ AI would also be able to accomplish such a feet.
Quote:
[i]A person is smart, but people are stupid.[/i]
Decivre Decivre's picture
Re: The AI Box Experiment
King Shere wrote:
Well I read the task was to determine if the AI is safe to be released, and if humans are a "secure" evaluator to determine this. Based on the medias portrayal of prison parole boards, I would say no. "Puppy dog eyes" & unreformed criminals can genuinely convince people that they should be released. A H+ AI would also be able to accomplish such a feet.
That only really works when we work under the assumption that they are reformable. When a person gets life without parole, then no amount of puppy dog eyes will get them off the hook. Chances are that we would treat potentially dangerous seed AI (especially those of the exsurgent persuasion, like captured TITANs) with similar protocols.
Transhumans will one day be the Luddites of the posthuman age. [url=http://bit.ly/2p3wk7c]Help me get my gaming fix, if you want.[/url]
King Shere King Shere's picture
Re: The AI Box Experiment
The Lima syndrome (the abductor / kidnapper sympathizing with his hostages). Can manifest in a the similar Guard/Prisoner situation. Making the guard assist or in extreme cases free the kept prisoner, in contempt of protocols. Then there are the outside influence from crazies, fans, influenced believers & moralists that might rally against the unjust, dehumanizing captivity of a sentient individual (or love affection). Granted in EP its 10 years after the fall, so the overall society would still be quite "healthy" paranoid against a AI/AGI "prisoner".
Arenamontanus Arenamontanus's picture
Re: The AI Box Experiment
Apropos "persons are smart, people are stupid", I used figure 1 from this paper http://iopscience.iop.org/1742-5468/2009/03/P03008/fulltext in a presentation today. It shows the human development index of countries as a function of how large governments they have. More leaders is not better, quite the reverse. My point was more general: badly organised human groups can be very stupid, and constructing smart groups is a black art we do not yet understand. But we better figure it out. The problem with our inability to box AI is that it is a kind of maximum safety example. It is safer than starting the AI in full freedom, yet it seems that an intelligent AI has pretty good chances of escaping. In fact, the Prometheans of EP might suggest a weird solution: make the environment outside dangerous for the AI. AIs with self-preservation want to stay in their boxes. Maybe that is the *real* purpose of the Exsurgent virus?
Extropian
gambler1650 gambler1650's picture
Re: The AI Box Experiment
I found this site and the 'experiments' fascinating, but I have to say, as a scientist I very much doubt their authenticity. There's no proof that the experiments actually happened as they were stated (say so by the experimenter and the subject doesn't count - they could even be the same person using different accounts, or at the very least two people who agreed to pretend to have run the experiment) unless I'm missing something. The results haven't been published anywhere. The logs, or excerpts haven't been posted anywhere. Yes, I know secrecy was invoked (for what seem to be very nebulous reasons - personally, if I were worried about an AI convincing a gatekeeper to let it out of a box, and I succeeded in simulating an AI convincing an apparently smart person with strong convictions to let it out of a simulated box, then I would want the information on how it was done KNOWN). I also find it hard to believe that nobody who's ever let a simulated AI out of the box hasn't let slip some information on what happened in spite of the sworn to secrecy bits. I guess my one question to people who know far more about this than I do is: Why should I believe that these experiments weren't just made up and faked? Is there evidence that they occurred, that I haven't found?
Arenamontanus Arenamontanus's picture
Re: The AI Box Experiment
gambler1650 wrote:
Why should I believe that these experiments weren't just made up and faked? Is there evidence that they occurred, that I haven't found?
I know some of the people involved, and they are very sincere. They might certainly be wrong or biased (and this could have messed up the validity of the experiments) but I have a hard time believing Eliezer would make it up. Of course, now you have to try to judge my credibility. The real problem is of course that this kind of experiment ought to be done *properly*. And ideally published as a peer reviewed paper, or at least written up so that we academics can cite it as anything but an anecdote. But it is unclear how to actually do it well; the secrecy part obviously should go (maybe full transcripts can be redacted if the AI starts using too sensitive blackmail :-) ) but like the Turing test it is a somewhat loose test in terms of what constraints should be allowed (such as time limits). Getting smart people to take the time to do it right is tricky, because many of the AI ethics people think they could spend their time and effort a bit more productively on solving other problems now when they have convinced themselves, and the publication doesn't seem likely to be particularly "hot". But it would maybe be more important than they think in convincing others that AIs could be tricky to control (a surprising number of AI researchers do not realize this), and maybe the test could be expanded to involve organizational constraints as mentioned earlier in the thread - can the AI talk itself out of a box guarded by a group?
Extropian
remade remade's picture
Re: The AI Box Experiment
There is no point in discussing Eliezer's experiment results, since he hasn't published anything except some rules of his experiment, but idea of uber-AI persuading guardian to release it is very interesting. Boxed transhuman AI can be (if it is indeed far from human) close to god in the box. It has potential to be nearly omniscient and omnipotent. "What do you want, guardian? Immortality for you, your loved ones, all humans? Transcendental truths? Paradise on Earth?" Probably only higher intelligence can grant this wishes in a single lifetime. There is question of trustworthiness of AI, but smart AI can make wonders here - probably it can make possible (and proved for guardian beforehand) automatic fulfillment of some wishes when guardian is letting it out. In the end there is fear, that after being let out AI will mistreat humanity/guardian in some way, but still AI with its intelligence has so much field to maneuver - it can for example rewrite itself and write proof that with its current programming code it will be unable to harm humanity in some ways. Even without any proof, wild estimation, that there is 50% chance of AI being nice to people as promised (which for nearly omnipotent being can be only slight nuisance) would satisfy many guardians. Who would be highly resistant to AI's persuasion then? Some guesses: 1. Very honest people, who promised they won't release AI. 2. Paranoid people, who in addition don't trust even their own judgement. (although after second thought i think they may be too unstable) 3. People satisfied with what they are and what humanity is, etc. There may be little that AI can offer them. Perhaps some religious people fall in here.
Saerain Saerain's picture
Re: The AI Box Experiment
I would be a very poor candidate, as I find it enormously improbable that a superintelligence would develop to be malicious. I'm only able to accept it in Eclipse Phase because the TITANs were: 1. Military-designed. 2. Woken during a time of immense human conflict. 3. Overtaken by an alien virus. Each more improbable than the last. It almost makes asyncs seem probable in comparison.
Arenamontanus Arenamontanus's picture
Re: The AI Box Experiment
Saerain wrote:
I find it enormously improbable that a superintelligence would develop to be malicious.
It is not maliciousness (the desire to do something nasty) that is the threat, but destructive neglect (that the superintelligence doesn't care about something we find terribly important and tramples all over it). As I see it, 1) created superintelligences would be tremendously powerful, 2) their values/goals would not be set by the conditions that set our values/goals. Sure, we might very carefully *try* to set goals that are compatible with our goals, but given how bad we are at figuring out the consequences our own goal systems (consider legal loopholes, software vulnerabilities, market and government failures, nice-on-the-paper ideologies leading to genocides etc.) we should not be confident that we can do this reliably. In particular for entities that can think much more than we do and were not designed by the very specific mixed evolutionary forces we had in the ancestral environment. Humans have a pretty thick mix of evolved goals and values, with no single super-goal that always motivates us. Intelligences with a single top-level goal are dangerous since they will pursue it relentlessly (consider the paperclip AI), likely optimizing just that at the expense of *everything* else. Intelligences with mixed goals (set by us or engineering accident) on the other hand will likely have inconsistencies and unexpected emergent effects. This is also why just trying to copy "our" goals (whose goals? mine, the Pope's, the average human view?) into the system is problematic: it would be smart enough to see beyond what we can see, and it is very likely that our goals make as little sense from that perspective as children's goals and world-views do to an adult - we are likely to desire the wrong things, and trying to extrapolate what we *should* desire from this might not be possible. The key thing here is that superintelligences do not have to be like big humans at all. It is wrong to anthropomorphize and assume they would have human characteristics like compassion, common sense, multiple goals, sense of beauty or boredom (to name a few; they would of course likely have a lot of other traits we have no way of experiencing). Think of them as optimization processes rather than beings: they maximize something, and unless what that something is described really well (or we are lucky enough to live in a Kantian universe where all sufficiently rational beings figure out the One True Morality) we are likely to be optimized away.
Extropian
Dry Observer Dry Observer's picture
Re: The AI Box Experiment
Up to a point, you can probably compensate for the vast intellectual advantages of superintelligence by nurturing superintelligence among the "ordinary humans" and other intelligences keeping an eye on it. But another option is to simply create advanced computers with no desire to operate outside of their limited goal set (crunching numbers very, very well, or scanning through databases looking for meaningful correlations (as one computer now does with articles on pharmaceutical research, looking for drugs with secondary applications)). Also, you can do what the transhumans of Eclipse Phase have done, combining human/transhuman intelligence in some way with computer processing power and the ability of limited AI to take the initiative and get simple tasks done (thus amplifying the practical abilities of an "ordinary," lower-grade transhuman superintelligence). An example of this Muse in the modern day would probably be Droid's ability to book a restaurant reservation at its user's command, thus saving some time... though that's obviously a very pale shadow of EP's Muses. Still, it does raise the question of why you would want to create an entirely uncontrolled and uncontrollable superintelligence far beyond your own capabilities when you and people like you (in terms of psychology/major goals) can simply radically optimize your own intelligence, using each intellectual breakthrough to break through to the next level of mental advancement, and then the next. That's a particularly good question when the individual considering it is already partly or wholly computronium anyway...

-