Causality and Artificial General Intelligence
This post is written (and published) on March 14th 2020, in the midst of the rising COVID-19 pandemic. At this point, the decisions to encourage employees to work from home, cancel large public gatherings, and limit non-essential travel should be hailed as intelligent proactive measures. Those who cite COVID-19’s “low” death rate and apparent targeting of an older population miss the bigger picture; without proper measures in place we risk overloading a healthcare system ill-prepared for such a large scale outbreak. The “benefit” of working from home is that I have found more time for myself, which means more time for reading. Today, I want to close the loop on a book I recently finished: The Book of Why by Dana Mackenzie and Judea Pearl. Firstly, this book is extremely useful for those of us interested in quantitative modelling with large data; it offers new ways of thinking about correlation and causation, beautifully illustrated through real-life examples. My personal favorite chapter was the one that dealt with paradoxes. Mackenzie and Pearl walk us through several classic problems, from the Monty Python Problem to the Simpson’s paradox, and show us how they can be demystified through the language of causation. The language of causation is spoken through causal diagrams. These are essentially simple pictures consisting of dots and arrows that illustrate causal relationships. Rather than merely saying that X is correlated with Y, an arrow from the X dot to the Y dot represents the assumption that X is a direct cause of Y. At first glance, this concept may seem overly basic, and indeed I found myself in the beginning chapters of the book questioning whether causal diagrams were really worth having a whole book written about them. However, Mackenzie and Pearl demonstrate how using causal diagrams helps us answer questions we desperately want to know the answers to. Does smoking cause cancer? Is this drug treatment effective? These sorts of questions are usually answered through Randomized Controlled Trials (RCTs). However, RCTs are infeasible when we are examining the effect of smoking on cancer. We would not want to find ourselves in a situation where we are randomly assigning members of our population to smoke. Traditional statistics tells us that when an RCT is unavailable, we may still be able to determine a causal relationship from the data. All we have to do is ensure that we have controlled for the right variables. However, controlling for the right variables is hard; how do we know which ones to control for? Mackenzie and Pearl demonstrate how it is possible for us to “over-control” or “mis-control” certain variables if we do not have a proper causal model in our toolkit. Further, causal models help us answer counterfactual questions. The example that Mackenzie and Pearl use to elucidate this concept is employee salary. If employee X had 10 more years of experience, what would her salary be? We are asking a question about the data, and imagining a scenario that is directly at odds with what we actually see. Counterfactual questions are not new, but what Mckenzie and Pearl show us is that the methodology we have been using to answer them is deeply flawed. Without the use of a causal diagram, we fail to answer them correctly or in the “right way”. The Book of Why is a fascinating exploration of causation and has most certainly changed how I look at data. In an era where we are amassing massive amounts of scientific knowledge, this book provides a useful framework for how to parse through the information overload. However, I disagree strongly with how the book closes. Pearl (the computer scientist-philosopher whose research sits at the center of this book), argues that if we are to build Artificial General Intelligence (AGI), it is going to need to understand causality. True intelligence embodied in a machine will need to communicate and interact directly with our world. Successfully navigating the world humans have created for ourselves requires a deep and intuitive knowledge of the causal relationships that govern it. Causal relationships are shockingly easy for humans to understand, but the picture gets much more complicated when we add a robot to the mix. Pearl gives an example of a robot cleaning your room at night. You tell it to stop. The robot needs to understand that you want it to stop cleaning because you are trying to sleep. This may seem trivial, but it is quite hard to teach this to a robot. The robot needs to know that it is actually okay to clean the room at night if you are not there, that it is also okay to clean the downstairs room at night if you are sleeping, and that if you are awake in the upstairs room at night it can still clean the room. Imagine coding all of those possibilities into the robot; it would take tremendous effort, all for the simple knowledge of knowing when to clean your room. Pearl tries to convince us that the solution around this would be to encode a causality into the robot. Once it has a causal diagram of this scenario it will learn all of the above possibilities regarding cleaning quickly. As tempting as Pearl’s proposition is, I think there are still a few kinks that need to be ironed out. For example, encoding a causal diagram of cleaning may be easy, but true AGI will need a causal diagram of the entire world. It will need to learn more general principles of causality. It is not even remotely clear how best we should proceed in encoding that into a robot. One possibility would be to use reinforcement learning of some sort. In reinforcement learning, an “agent” (robot) is given rewards and penalties for certain actions, and over time, it begins to learn the right action by maximizing its rewards and minimizing its penalties. In this way, causality is not explicitly taught to the robot. Rather, the robot infers it and then exemplifies it through its behavior. However, a reinforcement approach to learning causality has its own problems. Reinforcement learning is a relatively new concept, and its applications have been very narrow. Thus, it has resided firmly in the realm of Artificial Intelligence (AI), and NOT in AGI. For something to be generally intelligent, it needs to be able to learn ALL tasks, not just a single one. A human baby, through some version of reinforcement learning, understands when to ask for food and when to not. A robot could be taught to do that. Yet, the baby can then also learn to read and write. A robot could be taught to do that as well, but not necessarily the SAME robot. Neither explicit causal diagrams, nor self-learned causality seem to help us bridge the gap from a mere robot to a baby. Perhaps I am not doing Pearl’s argument enough justice. He does offer a bit more reasoning on why he thinks a robot with proper causal understanding is capable of AGI. He explains that in order to truly understand causality one must also be self-aware. An understanding of causality leads to an understanding of counterfactuals. Thus, an agent with causal reasoning must think to itself, “what would have happened if I had done X ” It also knows that it did NOT do X. Pearl thinks that this line of reasoning is enough to justify the claim that a robot with a proper diagram of causality encoded in it would also be self-aware. This idea is a bit far-fetched. Surely an agent with causality need not also be self-aware. For example, we can easily imagine a robot having a proper understanding of the causal diagram behind the “when to clean the room” problem, but also lack consciousness. The roomba is only a few design iterations away from having this complete diagram. I think Pearl is on the right path though. Perhaps something with a COMPLETE causal diagram of our world would by necessity have self-awareness. Sadly, this argument is a bit circular. My conclusion from Pearl is that we are still missing something in our quest for AGI. I certainly am convinced that causal diagrams are part of the puzzle, but I disagree with Pearl that it is the key to the problem. The real question we must answer, if we are to create AGI, is not how to encode a causal diagram into a robot, but how it is the case that humans can have such a rich understanding of causality in a world filled with uncertainty. There is a missing variable here, something that underlies our capacity for understanding causality. It is the architecture of our minds, and it is this architecture that I believe will accelerate our technology towards AGI. It is not causality itself, but how humans come to understand causality that lends us hints on our journey towards a true intelligent machine.