Reconstructing the Mind's Eye
How brain scans with the help of a neural network reproduce high fidelity mental images.
AI in medicine typically covers use cases of drastically improved diagnosis such as identifying skin cancers or reviewing X-ray imagery with stunning precision and speed. But AI is also at the centre of a breakthrough in the field of neural decoding with Brain-Computer-Interfaces (BCI). An upgraded approach, supplemented with AI, caught my attention: A team of researchers has augmented a well known technique with AI that allows them to reconstruct high fidelity visual perceptions from brain activity alone with minimal fine-tuning to the specific person. Yes, you read this right, they can essentially read minds with this (or at least certain aspects of it)!
I. Brief intro to BCIs
The idea of BCIs has been around for over 50 years and the purpose is exactly what the name suggests: an interface to channel interactions between a computer and a brain. There are different philosophies on how to go about this, from non-invasive, attempting to interact with the brain from the outside of the skull, to fairly invasive, by opening up the skull and implanting electrodes. Elon Musk’s Neuralink falls into the latter category, as does Synchron, who just recently showed one of their patients operating an Apple Vision Pro with their implant. Companies like Bryan Johnson’s Kernel take the non-invasive approach and work on technology to measure relevant brain functions without drilling into the skull.
Of course, if you think about it, either of those approaches is a tall order. Implanting electrodes in the very delicate structure of the brain with the necessary precision is a real engineering challenge for robotics. On the other hand localising brain activity in the 3D space across the different brain regions, is no mean feat either. Signals are spread across different regions and could be hidden by other parts of the brain which do not partake in the specific task at hand. For language, this can look a bit like this:
Source: Figure 1 b) from “Language is primarily a tool for communication rather than thought“, published in Nature on 19 June 2024.
II. The Innovation of MindEye2
The research that lead to MindEye2 takes a non-invasive approach based on fMRI.
Nota bene: fMRI – for functional magnetic resonance imaging – is a none-invasive neuro-imaging technique that indirectly measures neural activity by detecting changes in blood oxygenation. Areas in the brain that are working harder appear brighter in the scan.
MindEye2 represents a monumental leap from previous methods in neural image decoding, because it requires as little as 2.5% of fMRI training data to achieve high-quality visual reconstructions, whereas previous methods demanded extensive hours of brain scanning, which is difficult and uncomfortable for subjects.
This feat is accomplished by integrating AI technology into the process: The patterns of fMRI brain activity are translated into embeddings of pre-trained deep learning models and then used to visualise internal mental representations. This visualisation of internal mental representations, and more generally the ability to map patterns of brain activity to the latent space of pre-trained deep learning models, has potential to enable novel clinical assessment approaches and BCI applications. So far, practical adoption has been scarce, which is mostly due to single-subject models that do not generalise across different people.
1. Training, Fine-tuning and Stable Diffusion
MindEye2 takes an approach novel for the field, but familiar from advanced AI model training: it is trained on a vast corpus of training data from multiple subjects. This shared-subject approach means that the model learns from a diverse set of brain data. Once pre-training is complete, the model is fine-tuned with minimal data from a new subject. Like the more familiar LLMs that learn to generalise from a vast corpus of text data, MindEye2 learns to generalise across the patterns read out from different individuals’ brains, drastically reducing the need for extensive data collection from a new subject.
As brain data is mapped to the shared-subject latent space, the data is standardised across different subjects.
Nota bene: What’s a latent space? It is a simplified space where complex, high-dimensional data (like images or text) is represented in a more compact form. You can think of it as a way to capture the most important features of the data while reducing its complexity. For example, an image of a cat might be transformed into a latent space where it is represented by a few key features, like the shape of the ears and its whiskers, instead of every pixel. This makes it easier for algorithms to analyse and work with the data, helping them to identify patterns and make predictions.
For the image reconstruction from the data read out of the test subject, a (separate) Stable Diffusion XL model takes over, generating detailed and fairly accurate visual reconstructions of what the patient has seen. I mean it’s certainly not perfect, but let’s not forget that these images are reconstructed from a brain scan!
Source: Figure 4 of the paper.
2. Why MindEye2 is a Game Changer
The primary innovation of MindEye2 is its ability to align different subjects’ brain data into a shared latent space. This approach makes the model robust to variations in individual brain structures and ensures that it can handle data from any subject with minimal fine-tuning. The integration of high-level semantic content and low-level image details results in reconstructions that are both perceptually and semantically accurate and allow a much more detailed look into the functional patterns of the brain.
I readily admit that I can’t draw anything realistic, even if my life depended on it and the idea that in my lifetime I could possibly ‘beam’ my mental image onto a canvas is tantalising. Granted, I don’t foresee having an MRI available at all times, but think about another critical use-case: All of the sudden eye-witness memory might no longer be such an unreliable testimony!
3. Limitations and New Benchmarks
The technology still has notable limitations: fMRI is extremely sensitive to movement and requires subjects to comply with the task: decoding is easily resisted by slightly moving one’s head or thinking about unrelated information. Therefore, we need not worry about having out thoughts read out in the near future.
MindEye2’s AI backed approach outperforms previous models in various evaluation metrics such as pixel-wise correlation, structural similarity and retrieval accuracy. Its reconstructions are astonishingly refined by capturing both perceptual and semantic features, setting new benchmarks in fMRI-to-image decoding.
III. Conclusion
MindEye2 represents a significant advancement in neural decoding technology where seeks to accurately reconstruct visual perception from brain activity alone. By leveraging advanced AI models, it provides a practical solution for high-fidelity visual reconstruction from limited fMRI data. This breakthrough opens up new possibilities for clinical applications and brain-computer interfaces, where accurate and efficient decoding of brain activity is essential.
Note 1: Adjacent to this topic is the issue whether we are actually looking at the brain in the right way. Neuroscience is mostly the concerned with neurons and trying to explain brain activity by looking at the electricity patterns that are passed through them. I wouldn’t claim to know more about this than the people who spend their careers doing nothing else. However, one thought that I recently came across and feel it’s at the very least creative and worth exploring more is, what if we have misinterpret the role of the neuron or underappreciate the relevance of other brain tissue in out attempt to understand how the brain works? A decent, maybe not perfect analogy, here is the following: Imagine aliens had tried to understand humanity and human societies in the late 1800s or early 1900s by looking - almost - exclusively, at the messages passed around via the telegraph, would it have allowed them to build an accurate image of how society works? Maybe the focus on neurons leaves us with myopia missing the bigger picture.
Note 2: If you are curious about BCI, Lex Fridman just published an 8.5 hour conversation with Elon Musk and a number of Neuralink scientists. I haven’t listened to all of it yet, but its technical and detailed and all kinds of fascinating.