Abstracting, Fast and Slow

How a combination of LLMs and symbolic AI systems with discrete program search can lead the way first to better abstractions and then to AGI.

Sep 11, 2024

Over the past weeks we have looked at different aspects of the quest to Artificial General Intelligence (AGI): We compared biological and artificial intelligence across 6 defining aspects and have looked at seemingly simple problems current state-of-the-art LLMs struggle with to a surprising degree. In this Part 4 I want to link back to an idea introduced three weeks ago in Part 1, that intelligence is closely related to the ability of generalisation, which in turn is closely tied to abstraction, because abstraction is the engine through which a cognitive system can produce generalisation. The claim here is, that only a system which is able to abstract to high levels so to identify specific and precisely accurate solutions, but that also can do so in an efficient manner will lead to AGI. A promising suggestion in this regard is the combination of LLMs with symbolic AI systems, and here specifically discrete program search, which is the subject of this post.

A. Abstraction as a Core to Intelligence

1. Relevance

Abstraction or inductive reasoning is the operation of taking specific instances of experience or knowledge and mining them to extract patterns which can then be reused across different contexts. Complex experiences or input data are broken down to identify underlying rules, making them reusable for the system to adapt to new, unseen situations to which these underlying rules can be applied.

Another way of framing this thought is that intelligence is grounded in a sensitivity to analogies. With a high sensitivity to analogies, one can extract powerful abstractions from very limited experience and then apply these abstractions to navigate a high number of future situations.

It is important to acknowledge that the ability to abstract is not binary, as in either one can or cannot abstract. Abstraction building is more of a continuum from collecting factoids, organising knowledge in useful ways, to building models and abstracting to models of models – ‘meta-models’ – that allow adaptation, synthesis and application in new, unseen situations. Without abstraction, one would be limited to memorising point-by-point facts, which is not only inefficient and brittle when faced with novel challenges, it is safe to say impossible to navigate the world without, as one is exposed to new sets of facts continuously.

2. Levels of Abstraction

a) Abstraction can take place at different levels. At the base level there is simple memorisation of factoids and no abstraction at all, a ‘single use’ case if you will. One can think of a programming function without arguments:

def two_plus_two(): return 4                  
def two_plus_three(): return 5                  
def two_plus_four(): return 6

b) Numerous related factoids, can be organised into something that's more like an abstract function with a variable x (= an abstract for x) that encodes knowledge or interpolates between individual factoids:

def two_plus_x(x):
  if x = 0: return 2
  elseif x = 1: return 3
  elseif x= 2: return 4
  elseif ...

The problem here is that is doesn’t generalise very well, because it is a fairly weak form of abstraction. The list gets very long and if the list (knowledge) is incomplete, out-of-distribution data will lead this kind of system to struggle.

c) A stronger form of abstraction is then to turn organised knowledge into models that generalise strongly, which is no longer a mere interpolation between factoids. This no longer serves as an approximation, but will return the right result in 100% of cases, no matter what the input is. An example could be addition of two numbers (integers) using bitwise operations:

def addition (x, y):
  if y = 0: return x
  return addition (x^y, (x&y) << 1)

NB: This function first identifies the binary code for the numbers to be added together and keeps repeating the process of combining and carrying over the bits until y becomes 0. At that point x holds the final sum is returned.

d) Stages a)-c) above are not actually intelligence, but increasing levels of skill. The next level up to intelligence would be a system’s ability to create an abstraction by itself based on examples it is provided or has observed:

def find_model (numerous examples):
  [...]
  [...]
  [...]
  return model_function

e) Another level up would then be to autonomously generate abstractions in a maximally information efficient way, which means the system can master new situations and tasks based on relatively little experience or sparse information. That is probably what one would require of an AGI system, which would then combine high skill and strong ability to generalise.

def find_model (very few examples):
  [...]
  return model_function

f) On the spectrum of abstraction LLMs are probably situated somewhere between b) of organised knowledge and c) generalisable models. Their capability is certainly more than just a collection of individual factoids. LLMs have a lot of knowledge which is structured in a way that it can be generalised from previously seen situations to some, limited degree. They are unlikely further ahead, because if they were at the model stage (c), they should be able to safely add numbers or sort lists appropriately, without trouble.

NB: Of course there are ways to allow LLMs to call functions or tools, like a calculator, but it should be clear at this point that this is not quite the same ‘development step’ or a suitable replacement. It works for the moment, in well-defined scenarios, but not in a more generalisable way.

The next step up to model synthesis is still quite a tall order. Pure scaling of the existing approach does not seem to be promising in this regard. The question then is: How to get there? How could one build abstraction in machines?

B. Abstraction by AI systems

If you familiar with economic scientist Daniel Kahneman’s 2011 bestseller Thinking, Fast and Slow you would have previously explored how humans think and make decisions via two distinct modes of thought: System 1 for fast, intuitive and automatic thinking, whereas System 2 as its slow, deliberate and analytical counterpart. Kahneman explains how these two systems shape our judgments and decisions and dives deep into cognitive psychology to demonstrate how much of our thinking is influenced by automatic processes, to reveal the complexities of human decision-making.

Abstractions are happening in both systems, where the process is similar, but serve a different function.

1. Abstraction via System 1

System 1 abstractions are fast, intuitive and mostly based on recognising patterns in a continuous domain. One way of thinking about this is that an AI or cognitive process relies on continuous functions like distance metrics (e.g., dot products in LLMs or Euclidian distances, the most common way of calculating the length of a vector in machine learning) to rapidly assess the similarity between inputs. These are approximations often localised toward the right hemisphere of the brain and similar to System 1 thinking in humans, which processes sensory inputs for quick, automatic judgments based on learned patterns and heuristics. These System 1 abstractions underpin human perception, intuition and pattern recognition.

Transformers, and by extension LLMs, excel at this kind of abstraction and as such were a major breakthrough in AI research. They efficiently generalise patterns across massive datasets, enabling them to perform tasks like language generation, image recognition and large-scale data analysis.

If you look back to Part 2 on the optimisation algorithm employed by machine learning, gradient descent, it becomes intuitively clear that this kind of abstraction is deeply grounded in the geometry of higher-dimensional spaces: LLMs are trained by finding the minimum of a loss function in a high-dimensional space. This process is deeply geometric, as it involves calculating the direction and magnitude of the gradient – i.e. the slope of the loss function in different directions – and then adjusting the parameters of the model accordingly. Gradient descent effectively navigates the geometry of the loss surface to converge toward a (local or global) minimum.

NB: If you want to get simple introduction to this idea check out this video by 3Blue1Brown. This entire AI playlist and in fact the entire channel are really amazing!

At inference, when the model is used, relationships between these words are also analysed using geometric concepts such as cosine similarity or Euclidean distance.

From: https://easyai.tech/

However, this form of abstraction, while powerful for certain types of pattern recognition, struggles with reasoning and logical tasks that require discrete steps and not mere approximation.

2. Abstraction via System 2

System 2 abstractions, in contrast, are slower, more deliberate and involve logical reasoning. Comparisons are not made based on continuous values seeking approximate similarities, but rather through discrete logic, to seek exact matches of parts of programs or graphs. This is a more structured, logical process that mirrors how System 2 thinking operates in humans, when we engage in analytical tasks like problem-solving or planning. This mode is often localised toward the left brain-hemisphere and kicks in when a task requires careful, step-by-step analysis, by breaking down discrete entities and looking for exact matches.

The System 2 kind of abstraction can be thought of as rooted in topology instead, which is not concerned with exact distances or measurements, but with how objects are connected, identifying global properties that identify the object as a whole (and remain unchanged when you stretch, twist or deform objects without breaking them).

Turned into a computer-science analogy, System 2 abstractions are like identifying a specific part of a graph where input nodes can take different values or merging specialised functions into a new, more abstract program based on common structure. Both of these will be successful only, if and provided that there is ‘true’ match or overlay between the search space and the object for which the search is run.

Despite their prowess in pattern recognition, LLMs struggle with tasks that requiring precise, multi-step logical reasoning, or the reliance or creation of System 2 kind of abstractions, because the technology based on which they are built does currently not provide for this kind of generalisation. Moving beyond these limitations will be critical for advancing toward true AGI and of course the question is then: How can we go about this?

C. How to Build Type 2 Abstraction Capability

An interesting proposal to overcoming the limitations of LLMs with System 2 type of abstraction is to combine deep learning with discrete program search. While deep learning excels at recognising continuous patterns, discrete program search operates in the realm of exact, step-by-step logic.

1. Overview of Discrete Program Search as Symbolic AI

Discrete program search is a specific method within symbolic AI, which refers to systems that use explicit symbols and rules to represent knowledge and perform reasoning. This approach involves searching through a combinatorial space of possible solutions – represented as symbols or programs – defined by a Domain-Specific Language (DSL).

NB: A DSL is tailored to a specific domain or task, offering high expressiveness and simplicity for specialised use cases. This allows for more efficient and accurate problem-solving in familiar contexts, but the trade-off is that DSLs are typically less versatile than general-purpose languages. An intuitive example of a DSL is a first responder delivering specific medical details to ER staff, using a ‘language’ that conveys critical information succinctly. Similarly, DSLs are optimised for specific tasks but can be limited outside of their domain.

In a symbolic AI system, a DSL can be used to build a graph representing all possible solutions, where each node corresponds to a specific symbol, program or a part thereof. These nodes collectively form the search space. The edges of the graph represent connections between different potential solutions, guiding the search for the correct program that perfectly solves the problem.

NB: A graph is a powerful data structure that represents a set of objects (‘nodes’) and their relationships (‘edges’) between them. It is widely used to model real world structures like social networks or transportation routes and knowledge can be represented in this way too. In the context of symbolic AI and program search, the graph efficiently represents relationships between possible solutions, making it easier to navigate the search space and find the correct solution.

Program search through the graph is highly data efficient because it relies on a simple ‘yes or no’ feedback signal – whether a given program is correct or not to use the problem at hand. In contrast, machine learning algorithms require vast amounts of data to make high-confidence decisions, requiring dense sampling of the problem space to generalise effectively.

So why hasn’t symbolic AI or program search based on DSL taken off then? Unfortunately, depicting reality in this way – as you would have to for artificial general intelligence – even in an abstract form, will lead to an enormous search space, where the key hurdle becomes a combinatorial explosion of searchable solutions.

An intuitive example of combinatorial explosion can be seen in chess. Calculating all possible moves for the next few turns quickly leads to an overwhelming number of possibilities. At the start of the game, there are roughly 20 possible moves, but this number increases dramatically as the game progresses. Assuming 20 possibilities per move, strictly speaking, one would have to evaluate 20^10 = 10.24 trillion possible moves to consider all options for the next 10 moves.

Photo by William Fortunato

2. Combining Deep Learning with Discrete Program Search

And this is where the idea of combining machine learning and symbolic AI via discrete program search comes into play. Such a hybrid system leverages the fast, intuitive pattern recognition capabilities of System 1 abstraction, provided by models like LLMs, while enhancing these approximated results with deliberate, logical reasoning via System 2 abstraction through discrete program search.

This hybrid approach combines deep learning, which provides the ‘intuition’ to guide through vast search spaces using probabilistic curve fitting and low-level generalisation, with discrete program search, which applies precise logic to carefully examine and solve specific problems once the space has been narrowed down. This mitigates the challenges posed by combinatorial explosion and targets areas that require deeper inspection via step-by-step logic.

This takes inspiration from how we understand humans to reason and has similarity to the distribution of tasks inside the brain. Intuition and experience weed out the obviously bad ideas, closer inspection and careful reasoning is then applied to the remaining solution landscape. To return to the chess example from just now: a grandmaster would be capable of thinking 10 moves ahead, but of course is not searching through +10 trillion different moves. Experience accumulated – and abstracted to a high degree – from having played countless games narrows the focus to the more salient moves.

A good way of intuiting this proposal is how one would navigate a public transport app. Where the destination is north, one would immediately look toward the ‘upper part’ of the map to then focus more closely in selecting the exact station and then let the app provide the best route that meets the relevant criteria, such as least number of transfers, shortest walking distance, fasted connection etc.

Not every problem can be drawn on a map of this kind and few real-world domains follow strictly linear rules. Financial markets, for example, behave in a highly non-linear fashion, because prices are influenced by countless factors – global events, investor behaviour, market sentiment etc – that don’t follow strict cause-and-effect rules, or at least we haven’t discovered those yet. In such cases, where multiple factors interact in complex ways, deep learning’s ability to generalise across large data sets makes it a powerful tool for providing rough estimates in such complex environments.

However, to apply deep learning as a perception layer, this approach requires two components: a) An interpolative problem, where the system can estimate unknown values between known data points. And for this to work efficiently, one needs b) large amounts of data, to allow the deep learning algorithm to generalise efficiently

Where data is sparse discrete program search steps in. The critical piece of transmission at this juncture is that the discrete search space needs to be represented with a sufficient abstraction based on core knowledge foundations and principles such as those that underpin the ARC-AGI test, namely objectness, potential goal-directedness of objects, numbers and basic mathematical operations, geometry and topology.

For this hybrid system to be effective, the process must be bidirectional: deep learning’s System 1 approximation should guide the search space, while System 2’s more precise reasoning informs and refines the search. If the discrete program search encounters a bottleneck, feedback can flow back to System 1 for re-approximation and further refinement.

NB: Just as a sidenote, this kind of program search approach has proven to be most successful in the ARC-AGI challenge, where the current leaders have utilised methods described here.

Incorporating external verifiers into this hybrid system adds an additional layer of reliability by checking whether the outputs of the deep learning system and program search align with known rules or logical constraints. Because the real world has a high noise to signal ratio, research like VALERIAN, that seeks invariant feature learning from noisy datasets by limiting the learning to the relevant ‘clean’ overlap in the data, can further enhance this system’s ability to handle real-world scenarios, ensuring that the transition between the intuitive deep learning layer and the more rigorous discrete search layer are robust and reliable.

D. Conclusion

We know where LLMs perform well and where they fall short of AGI. They are greater System 1 operators, but they lack System 2 type abstractions and reasoning. Without tangible progress in this regard the path to AGI appears longer than most commentary would like to make us believe. François Chollet believes that the next breakthrough in this regard will come from an outsider, because the big AI labs are pouring all their effort and money into improving transformer technology. With all this attention transformer technology is receiving progress in combining symbolic AI and transformer technology has stalled, whereas the limitations LLMs exhibit, are the same as they were years ago.

Thanks for reading Guaranteed Dissent! This post is public so feel free to share it.

Abstracting, Fast and Slow

How a combination of LLMs and symbolic AI systems with discrete program search can lead the way first to better abstractions and then to AGI.

Discussion about this post