Unlocking the AI ​​black Box: New Formula Explains How it Recognizes Relevant Patterns - Latest Global News

Unlocking the AI ​​black Box: New Formula Explains How it Recognizes Relevant Patterns

A team at UC San Diego has discovered a method for decoding the learning process of neural networks by using a statistical formula to clarify how functions are learned – a breakthrough that promises more understandable and efficient AI systems. Photo credit: SciTechDaily.com

The findings can also be used to increase the efficiency of various machine learning frameworks.

Neural networks have led to breakthroughs in artificial intelligence, including the large language models now used in a wide range of applications, from finance to human resources to healthcare. But these networks remain a black box whose inner workings are difficult for engineers and scientists to understand. Now a team led by data and computer scientists at the University of California, San Diego, has examined neural networks using an X-ray to find out how they actually learn.

The researchers found that a formula used in statistical analysis provides a streamlined mathematical description of how neural networks like GPT-2, a precursor to ChatGPT, learn relevant patterns in data, called features. This formula also explains how neural networks use these relevant patterns to make predictions.

“We’re trying to understand neural networks from the ground up,” said Daniel Beaglehole, a Ph.D. student in the UC San Diego Department of Computer Science and Engineering and co-first author of the study. “Our formula makes it easy to interpret which functions the network uses to make predictions.”

The team presented their findings in the March 7 issue of the journal Science.

Why is that important? AI-supported tools are now omnipresent in everyday life. Banks use them to approve loans. Hospitals use them to analyze medical data such as X-rays and MRI scans. Companies use them to screen applicants. However, it is currently difficult to understand the mechanism that neural networks use to make decisions and the biases in the training data that could affect this.

“If you don’t understand how neural networks learn, it is very difficult to determine whether neural networks produce reliable, accurate and appropriate answers,” said Mikhail Belkin, corresponding author of the paper and a professor at the UC San Diego Halicioglu Data Science Institute. “This is particularly important given the rapid recent growth machine learning and neural network technology.”

The study is part of a larger effort in Belkin’s research group to develop a mathematical theory that explains how neural networks work. “Technology has outpaced theory many times over,” he said. “We have to catch up.”

The team also showed that the statistical formula they used to understand how neural networks learn, known as Average Gradient Outer Product (AGOP), could be applied to improve performance and efficiency in other types of machine learning architectures , which do not include neural networks.

“If we understand the underlying mechanisms that drive neural networks, we should be able to build machine learning models that are simpler, more efficient and more interpretable,” Belkin said. “We hope this will help democratize AI.”

The machine learning systems Belkin envisions would require less computing power and therefore less power from the grid to function. These systems would also be less complex and therefore easier to understand.

Illustrating the new findings using an example

(Artificial) neural networks are computational tools for learning relationships between data features (e.g. identifying specific objects or faces in an image). An example of a task is determining whether a person is wearing glasses or not in a new image. Machine learning addresses this problem by providing the neural network with many example images (training images) labeled as images of “a person wearing glasses” or “a person not wearing glasses”. The neural network learns the relationship between images and their labels and extracts data patterns or features that it needs to focus on to make a decision. One reason AI systems are considered a black box is because it is often difficult to describe mathematically what criteria the systems actually use to make their predictions, including possible biases. The new work provides a simple mathematical explanation for how the systems learn these functions.

Features are relevant patterns in the data. In the example above, there are a variety of functions that the neural networks learn and then use to determine whether a person in a photo is actually wearing glasses or not. One feature to pay attention to in this task is the upper part of the face. Other features could include the eye or nose area, where glasses are often placed. The network selectively pays attention to the features that it learns are relevant and then discards the other parts of the image, such as the lower part of the face, the hair, etc.

Feature learning is the ability to identify relevant patterns in data and then use those patterns to make predictions. In the glasses example, the network learns to pay attention to the upper part of the face. In the new Science In their work, the researchers identified a statistical formula that describes how the neural networks learn functions.

Alternative Neural Network Architectures: The researchers further showed that introducing this formula to computer systems that are not based on neural networks allows these systems to learn faster and more efficiently.

“How do I ignore what is not necessary? People are good at that,” Belkin said. “Machines do the same thing. For example, large language models implement this “selective attention,” and we don’t know how they do it. In our Science In our work, we present a mechanism that explains at least part of how neural networks ‘selectively pay attention’.”

Reference: “Mechanism for Feature Learning in Neural Networks and Backpropagation-Free Machine Learning Models” by Adityanarayanan Radhakrishnan, Daniel Beaglehole, Parthe Pandit and Mikhail Belkin, March 7, 2024, Science.
DOI: 10.1126/science.adi5639

Funders of the study included the National Science Foundation and the Simons Foundation for the Collaboration on the Theoretical Foundations of Deep Learning. Belkin is part of the NSF-funded and UC San Diego-led Institute for Learning-enabled Optimization at Scale (TILOS).

Sharing Is Caring:

Leave a Comment