‘Attention’ Was Not All You Needed.

So, Nobody Knows Why AI Works, Right?

Yes and No.

The Fundamental Problem is that we have not found out much about the mystery that is the human brain.

Yet Transformers and Humans share eerily similar responses to natural language.

Let’s explain why.

Storing Information Representations In Transformers

We use a vector generated from an input of several words or a token to create an embedding for a transformer.

Therefore information – is stored as a vector.

Let us consider representation. This representational embedding played a much more fundamental role than attention. Representation in a decodable or workable form had to be reached first!

You may be thinking, how does this relate in any way to all that a transformer like GPT-4 can do when trained on the entire Internet?

The information is encoded by attention dot products and mathematical processing combined with contextual meanings and representations of the original data.

So we have a representation of this data stored in the Transformer.


The Secret Sauce

All neural networks solve problems according to the universal approximation theorem.

For those of you who don’t know what Universal Approximation is, we can define a Universal Approximator as any process that converts a specified input into a specified output with high accuracy – it approximates a function between the set of inputs and the set of outputs. That function has a form. But we don’t need to know the form because given enough data, the neural network can approximate any function for us.

What is the function? It’s the problem we’re trying to solve.

The neural network can approximate any function through its processing algorithm. Any Function – So Universal.

Architectures vary – algorithms vary – models vary – but the underlying principle is the same.

And this is why neural networks work when they have enough data, are given a suitable task, and are adjusted in the correct ways, an art you get better at with time.

What is the underlying representation of a dog in this LLM?

There is, definitely, one and in fact many.

Maybe x vectors, y contextual values, and z positional arguments for a single dog, along with the correct pathway of raw data flow architecture.

We just haven’t gone deep into that because it’s unnecessary for our purposes.

The system works.

That’s enough for us.

The system, is, thus, a black box.


But, How Does The Brain Store Data?

Just like before – we don’t know the exact details.

When our brains are created they create internal representations of external objects, concepts, and entities. Kind of like a child growing up.

For example, the concept of a dog may involve x neurons, y synapses, and zarchitecture along with a dynamic continuously evolving set of raw data transfer pathways between neurons.

My statement is that the human brain is also a black box. And the same principles that apply to GPT models can also be said to apply to brains – they are Universal Approximators too.

The approximator theorem obviously applies to our brains.

The same principles are at work.

The ideas in our brain are stored/represented by embeddings as well!

It’s just that these embeddings are naturally created.

Families of neurons, axons, and dendrites with ultra-low electrical energy.

Different forms. Different functions. But the same end result.

The system works.

That’s enough for us.

What more can we say we know?

Only what we can empirically observe.

Again, a black box.

Mysteries - not for much longer!

Mysteries – not for much longer!

So, Why Are They So Similar in Natural Language?

We have tried to model AGI as best as we can with a very complex processing procedure of all the data collected in digital form on the Internet. All the data in the world.

As far as sequence-sequence models and natural language tasks are concerned, I would say we have succeeded.

We have modeled how our brains work in understanding and processing information through a lot of incredibly expensive and systematic hard work.

I’m not happy.

I’m actually scared.

If this is what ChatGPT can do –

The workhorse behind all the research is the GPU.

The workhorse behind all the research is the GPU.

Then, What Will GPT-4 Achieve?

Software engineers at Microsoft have already identified several moments of sentience in the GPT-4 model.

They have even released a paper on this finding

News Page: https://www.microsoft.com/en-us/research/publication/sparks-of-artificial-general-intelligence-early-experiments-with-gpt-4/

Research Paper:

We are taking deep risks here.

When Geoffrey Hinton stepped down from his research post, I realized things were going really haywire.

Money fuels everything.

No amount of controls will stop the profit-centered companies in China and the USA.

We hope the future works out well.

AI decision-makers must put ethics above profit.

But when have entrepreneurs ever done that?



Hinton said something incredible.

“I have suddenly switched my views on whether these things are going to be more intelligent than us. I think they’re very close to it now and they will be much more intelligent than us in the future. How do we survive that?”

But for now, there is no stopping it.

To wind things up, let’s leave you with an interesting question to ChatGPT.


How would you go about achieving AGI?


Achieving Artificial General Intelligence (AGI) is a grand challenge that requires significant advancements across multiple scientific and engineering disciplines. While the exact path to AGI is still a subject of debate and exploration, I can provide you with a more detailed explanation of some key areas that researchers are investigating:

  1. Cognitive Architecture: Developing a cognitive architecture that can mimic human intelligence is a fundamental aspect of AGI. This involves understanding the structure and functions of the human brain and designing computational models that capture these processes. Researchers draw inspiration from neuroscience, cognitive psychology, and cognitive science to build architectures that can support high-level reasoning, perception, memory, learning, and decision-making.
  2. Machine Learning and Deep Neural Networks: Machine learning, particularly deep learning, has played a significant role in recent AI advancements. Deep neural networks have achieved remarkable success in perception tasks like image recognition and natural language processing. To progress toward AGI, researchers are exploring ways to enhance the capabilities of deep learning by improving its generalization, interpretability, and the ability to handle limited data and complex reasoning.
  3. Reinforcement Learning and Active Learning: Reinforcement learning is another key area of focus for AGI. It involves training agents to interact with their environment and learn from trial and error. Reinforcement learning techniques, combined with active learning, which involves actively selecting informative training samples, can help agents explore and learn from their environment more efficiently. Developing algorithms that can learn from diverse data sources and generalize knowledge to new situations is crucial for achieving AGI.
  4. Cognitive Robotics and Embodied Intelligence: True AGI would require more than just processing information; it would necessitate interacting with the physical world. Cognitive robotics aims to integrate perception, cognition, and action in robots to enable them to learn and reason about the world in a similar way to humans. Embodied intelligence research explores how agents can develop intelligence through interactions with their environment, leveraging sensorimotor skills and physical experiences.
  5. Symbolic Reasoning and Knowledge Representation: AGI systems should be able to reason symbolically and manipulate complex knowledge representations. Symbolic reasoning involves using logical rules, inference mechanisms, and structured knowledge representations to perform abstract reasoning and problem-solving tasks. Combining symbolic reasoning with statistical approaches, such as neural networks, is an active area of research to enable more powerful and flexible reasoning capabilities.
  6. Transfer Learning and Meta-Learning: AGI systems should possess the ability to transfer knowledge and skills learned in one domain to new and unfamiliar domains. Transfer learning techniques aim to leverage existing knowledge to accelerate learning in related tasks or domains. Meta-learning, or learning to learn, focuses on developing models that can learn how to learn efficiently, adapting quickly to new situations and tasks with limited data.
  7. Data and Computational Resources: AGI development requires vast amounts of high-quality data for training models. Access to diverse datasets across multiple domains is essential to capture the breadth and depth of human knowledge. Furthermore, AGI development demands substantial computational resources for training and running complex models. Advances in hardware, such as specialized AI chips and distributed computing systems, can significantly accelerate AGI research.
  8. Ethics, Safety, and Explainability: As AGI progresses, it is crucial to address ethical concerns, develop robust safety measures, and ensure transparency and interpretability of AGI systems. Researchers are exploring ways to make AI systems more transparent, explainable, and accountable to mitigate risks and enable human users to understand and trust their behavior.
  9. Collaboration and Interdisciplinary Research: Achieving AGI requires collaboration among experts from various fields, including AI, neuroscience, psychology, philosophy, and more. Encouraging interdisciplinary research and knowledge sharing can lead to new insights and innovative approaches in developing AGI.

It’s important to note that while progress is being made in each of these areas, achieving AGI remains an open-ended problem with no clear timeline for completion. It’s a complex and long-term goal that requires continuous research, exploration, and iteration.

Lord have mercy on us all!



This article was originally published by Thomas Cherickal on Hackernoon.