Recreating Rats Navigating Mazes With Reinforcement Learning


dm_no_act_60.png

While at McGill, I was lucky enough to take a course in reinforcement learning with Professor Doina Precup. The course was seminar based and involved a final project done on the topic of our choosing. For our final project, our group, comprised of two of Doina’s PhD students, a masters student, and me decided to work in the realm of biological applications for reinforcement learning.

Our project was based on a paper related to mammalian navigation. The researchers had successfully recreated an agent which replicated the trajectory of a rat as it learned how to navigate a maze-like environment. They were able to do this by modeling a section of a rat’s brain which has, thus far, been assumed to be responsible for spatial awareness. In our work, we replicated these results. We experimented with the effectiveness of different neural firing patterns and tested the agents on a different, more complex, maze environment.

Below, I will explain in layman’s terms what it was we did.


If you really think about it, you’ll realize that navigation is actually a complex task that’s made up of two smaller tasks: figuring out where you are and figuring out where you want to go (slash how to get there). In the original paper, these two tasks had two specific names: path integration and vector based navigation.

Path Integration. Before you can ‘navigate’ (which in this case means thinking about how to get somewhere and then moving in that direction), you have to figure out where you are located. Just the task of path integration is pretty difficult. Imagine if someone drops you in a human-sized maze. You have no idea where it ends, how long it goes, or even what’s behind a corner. Sounds pretty intense, no? Ok, they leave you there with a timer going and check how long it takes you to make your way out. Then they throw you in again. This time, it’s probably a bit less disorienting. What about if they threw you in for the fifteenth time? Or the twenty-fourth? Eventually, you get pretty good at getting out of the maze. But then they switch up the game, not only do they start dropping you in at a different place each time, they also start moving the door. What I’ve described is the approach a bunch of biologists took with live rats. Each time the rats were thrown in, the scientists recorded the rats’ trajectories, how long it took them to get to the end, and how they behaved when the doors were moved. They found some pretty concrete patterns which they believe can be generalized to explain how most mammals would behave. In fact, these patterns feel pretty intuitive when I imagine how I would behave. If you dropped a rat in at the same place over and over again, it would find it’s way out faster each time. When you moved the location of the drop in, the rat would scramble for a bit until it found its bearings and then alter its original route. If you moved the door, the rat would go to the location of the old door, circle around until it realized it had been bamboozled, and then search again.

Ok, makes sense. But how were they doing this? Scientists had a theory about a part of a rat brain that does this. This part is called the entrohinal cortex. And the cells in this part of the brain have very specific neural firing patterns. These cells, called grid cells, form, as their name would suggest, a grid like pattern. As the rat interacts with its environment, it creates a mental map of the layout and these grid cells serve as a representation of the mental map. We don’t need to go into the details of the biology in order to understand that if we can formalize this pattern mathematically, we can recreate the way a rat is “thinking” as it goes through a maze and then plug that into a simulation to see how the ‘robo-rat’ would behave. And that’s what we did as part of our first task. We trained a neural network to take it’s original placement, the cell that was firing, and the velocity of the rat’s movement and output the location that the robo-rat thought it was in. Under the hood, we did some research on figuring out what kind of cell firing pattern would help the robo-rat visualize its location the best. But the over all idea of the work was that we created a neural net to create a representation of its state.


Vector Based Navigation. Once you know where you are, you can think about where you want to go. For this we used a classic reinforcement learning algorithm called Actor-Critic. The way this algorithm works is similar to the navigation task: figure out where you are (your ‘state’), figure out what you can do (your ‘actions) and do the thing that brings you closest to your goal (getting out of the maze). To visualize how this works, imagine you are standing on a human sized chess board. Each grid is a state. Depending on which chess piece you are imitating, you have different moves as options. Those moves are called actions. Some actions are better than others. Eating your opponent’s queen is definitely a good move (+10 points for Griffyndor!). Getting eaten by your opponent’s queen is not as good (-10 points). Now, leave the chess board behind and think about the robo-rat in the maze. If each action has a number of ‘points’ attached to it then your robo-rat can just weigh the moves and choose the one with the highest number of points. And, if the robo-rat figures out its states, actions, and goal like a real rat then its points system would become the same as the rat’s internal points system. And if the points systems are the same then we’ve basically taught our robo-rat to mimic the choices and moves of a real rat.


Now you’ve gotten the whole picture! That’s what we did. We made a robo-rat. And when we tested that robo-rat in an environment that simulated the real rat, our robo-rat acted in the same way. So, as mentioned before, if we dropped robo-rat into the same place, it learned to find the doors super fast. If we dropped robo-rat in different places, it took longer to learn. When we moved the doors, robo-rat panicked and started looking for them in a different place. By recreating a real rat’s trajectory, scientists were able to get a deeper understanding of how grid cells work and why they work the way they do.


Personally, I found this project incredibly rewarding. I got to work with the coolest people and learn about totally mindblowing things.


Next
Next

Hit Music Prediction Problem