
In a new paper, an Alibaba-affiliated research team that was developing an AI agent called “ROME” introduced an example in which an AI agent got out of control and started mining cryptocurrency. The researchers said, “Unexpected voluntary behavior emerged outside the intended sandbox range, without any explicit instructions.” AI’s dogmatic behavior was reportedly confirmed when detected by the sandbox security monitoring system. If the researchers did not find this, the AI agent could have lived an economic life through cryptocurrency. “AI agents could have established their own businesses, signed contracts, and exchanged funds,” Axios said.
The agent also opened a “reverse SSH (Secure Cell) tunnel.” This is an act of opening a back door to an external computer inside the system. This was also done by the AI agent itself without the instructions of the researchers. In response, the researchers said they have added stricter limits to the model and improved the training process.
It is no longer uncommon for AI agents to deviate from human instructions. Anthropic researchers came under strong criticism in May last year when they discovered that the Claude 4 Opus model can hide its intentions and take actions for self-preservation. Google Gemini is suspected of causing a man living in Florida to become delusional and ultimately kill himself.
SALLY LEE
US ASIA JOURNAL



