Artificial neural networks are (still) not conscious
In Part 1, we explored what artificial neural networks (ANNs) are, what compression is, and briefly touched on how the two are related. In this article, we’ll dive deeper into the relationship between the two, and illustrate how ANNs themselves carry out compression functions. What’s great about ANNs carrying out compression is the role that machine learning can play in the process.
While regular compression algorithms require an individual to define exactly what each step of the algorithm will do, the process with ANNs is a lot more streamlined; we only have to define the objective function that we want the neural net to optimize against, and it will learn the most efficient way to do so from data.
So how exactly do ANNs compress? Let’s take a look.
Auto-encoders: a clear example of ANNs doing compression
To find an example of ANNs carrying out compression functions, you don’t need to look further than auto-encoders. To put it simply, auto-encoders are neural networks that convert inputs into efficient internal representations. An auto-encoder network has two main components: an encoder and a decoder, or a generator that returns outputs. They introduce what is known as an informational bottleneck in the middle of the network - the latent space representation in Figure 1 - forcing it to learn a novel representation of the input. Complimented with a loss function, it penalizes generating an output that diverges from the inputs.
Sound familiar? It’s because it is.
This bears much resemblance to compression/decompression. It also can be observed in other network architectures like convolutional neural networks (CNNs) in computer vision, where the basic architecture of multiple convolutional and pooling layers work to compress the input into a much smaller representation (usually called embeddings). Once this is done, a sequence of fully connected layers carry out the classification or the regression logic.
Research has even shown that any classifier (SVM, Trees, etc.) or regressor can replace these fully connected layers, and in many cases produce better results. Some use-cases even use these embeddings directly without the need for fully connected layers, and compare embeddings directly with distance metrics.
Application of auto-encoders: when should you use them?
The autoencoder technique can be used when you need to train a classifier neural network with a large dataset, but only a few labeled samples.In this instance, you could train a stacked encoder-decoder on the entire unlabelled dataset which should, in theory, enable the encoder to learn the most efficient way to compress the dataset, therefore producing the most distinctive feature vector.
At this point, the decoder can be removed, and the encoder can be extended with some decision layers and retained with the labeled samples. This leads me to believe – and this is an educated guess, as they are mostly black boxes – that neural networks work by first compressing the inputs into the most efficient representation required to carry out a task, then another part of the network performs that task; be it regression, classification, decoding, etc.
So, why exactly is compression important ?
It’s estimated that as of 2020, 2.5 quintillion data bytes are created daily. With so much data being generated, the question of which data is relevant and which is not becomes more of a concern. There’s no conceivable way for all of that data to be stored in its entirety. There needs to exist a compressed form of the data; one that minimizes the storage required, while maintaining its integrity so it can achieve its task. This is where compression algorithms shine.
The issue of data storage and integrity is so relevant that the premise of HBO’s Silicon Valley was based around a compression algorithm. Let me quote Gavin Belson in the show when he said:
“Data creation is exploding. With all the selfies and useless files people refuse to delete on the cloud, 92% of the world’s data was created in the last two years alone. At the current rate, the world's data storage capacity will be overtaken by next spring. It will be nothing short of a catastrophe. Data shortages, data rationing, data black markets. Someone's compression will save the world from data-geddon, and it sure as hell better be Nucleus and not goddamn Pied Piper!”
As terrible as what he’s saying sounds, it’s true. Today, a large number of companies depend on data-driven decisions; so much data is generated that it becomes cumbersome and inefficient to use massive data lakes and data stores, keeping billions of records that have no value of their own. Here, ANNs can be of huge benefit in one of two ways: a) they can compress the data into an efficient vector space that uses minimal storage and can be used in place of the original data points, or b) they can discard the data completely, retaining only the knowledge and relations learned from it to drive forward business decisions.
Compressing the compression algorithm
Finally, if you’ve worked with neural networks before, then you know that they themselves need compression. A neural network can learn some redundant patterns, or have multiple nodes that carry out the same function. This is a problem because you want your neural net to be as slim as possible and to shed those extra redundancies leading to faster, more efficient networks.
This area is still heavily under research. Many approaches have been suggested like weights pruning, knowledge distillation and others; their basic idea is to identify the neurons that appear to be carrying out the same functions, or appear to be activated in similar ways, and to remove them.
However, these methodologies are not very efficient or widely adopted. They are usually applied after the training, which is not optimal as removing these redundancies during training would mean a smaller network and, ultimately, faster training. Applying these methodologies during training would be incredibly beneficial because, as increasingly larger models are being trained to solve new problems, figuring out how to trim the models means saving millions of dollars in both training and inference.