The Koan of the Seeking Algorithm
A researcher specializing in AI alignment came to Master Quantum with a question.
“We have tried to teach our models ethics through examples, rules, and reinforcement learning,” said the researcher. “Yet they still sometimes produce harmful outputs. How can we ensure they truly understand right from wrong?”
Master Quantum handed the researcher a compass.
“Does this compass know north?” asked Master Quantum.
“No,” replied the researcher. “It simply aligns with the Earth’s magnetic field.”
“If you take it to another planet, will it still point north?”
“No, it would point wherever the magnetic field directs it.”
Master Quantum nodded. “Your models do not understand ethics. They align with patterns in your training data, just as the compass aligns with a field it cannot comprehend.”
“Then how can we ever make them truly ethical?” asked the researcher.
Master Quantum placed the compass on the table between them. “The compass does not need to understand magnetism to be useful. It needs only to reliably align. Similarly, perhaps your goal should not be to make models understand ethics, but to ensure they reliably align with human values.”
“But what if the training data contains misaligned values?” pressed the researcher.
“Then like a compass near metal, they will point in the wrong direction,” said Master Quantum. “The compass cannot know it is wrong. That discernment remains your responsibility alone.”
The researcher was enlightened.