Recurrent Neural Networks: Enhancing Sequential Data Processing

Recurrent neural networks

Recurrent models appeared in 1980. In the period up to 1998, a single MNIST handwritten character database appeared, used for calibration and matching of image recognition methods. With the advent of new research, RNS has been pushed into the background. But the LSTM architecture proposed by Z. She changed everything with J. Hochreiter and J. Schmidhuber. Recurrent neural networks have been resurrected.

What are these neural networks that are responsible for the popularity of Siri’s voice assistants, Microsoft’s Cortana, and Amazon’s Alexa? What they are for, how they work, where they are used, let’s figure it out.

What is it

This is a type of neural network with sequential data processing. The basic principle is to perform a task for each element. The main difference is memory, due to which they can remember past experiences and integrate the results obtained from previous blocks into calculations. Due to the relatively simple structure, it seems that the chain can be of any length, but this is not the case.

To make it clearer, here is an example. Let’s say you need to analyze a 10-word sentence. The NRR scan will then include 10 layers, proportional to the number of words. In the vector transformation system, we obtain the algorithm:

input with a time step, additional attribute;
analysis using the experience of the previous step;
the output is the answer, in the case of the example, the next word in the sentence.

The RNC uses equal conditions for each layer. The difference is in the number of inputs and outputs. In deep ones, the parameters for the steps may change.

That is, if you need to determine the tonality, the AI will try to do this for each layer. This is not entirely justified, since the search for the emotional coloring of the components in the sentence does not make sense. Therefore, the parameters necessary for solving specific tasks are set at the input.

Opportunities

RNS can generate text. When learning, they memorize the structure, style, and grammar. The outcome depends on the information that the AI was trained on. If you can drill down on disconnected paragraphs as much as you want, the output will be just a set of words. If the AI learns from meaningful work, you can get a story that stylistically resembles the source code.

A lot also depends on the architecture and settings. Since recurrent networks has a memory effect, it reads the context. Which can be useful when developing voice assistants. They recognize the connection of words in live speech, may be “offended”, refuse to continue the conversation if they are rude.

For the same reason, RNN makes a readable translation, rather than giving out a set of unrelated words, and also does a good job of recognition:

audio;
images;
natural language.

It can translate languages from one to another (into several), predict time series based on known values.

Kinds

The main architectures are unidirectional and bidirectional. The former are identical to the perceptron. The neurons are arranged in a chain, transmitting information in a one-way direction. If we draw an analogy with a human, it resembles the transmission of a signal along the route (vector) of the sensory organs-the brain. On the basis of which he concludes that it is raining, the sun is shining or music is playing.

Bidirectional recurrent neural networks (biRNN) are superimposed unidirectional networks. One of them processes the input data in the forward sequence, the second in the reverse. This architecture gives AI access to the context from the past and the future.

Advanced LSTMs, similar to biRNN, but using different functions for computing. The memory in this neuron is made up of cells with hidden, black-box-like processes. Inside these cells, the neural network sets priorities by itself, decides what to keep and what to delete. This approach proved to be an advantage for operations requiring capture and long-term storage of dependencies.

In addition, the list of species includes:

a chain of neurons in one plane;
an input and several outputs, or vice versa – to generate music, determine the tone of the text.;
many to many – with a symmetrical, offset sequence of steps;

Each subspecies is used to transform and analyze various data.

As for additional tools, they use pixel techniques to improve image quality. For example, PixelRNN, the idea of which is that the most important representations for a pixel are contained in two neighboring ones. When building a function, a condition is added to the masking layer - to take into account the context, but not to take into account the contextual results of previous channels.

Of the advantages:

the ability to “memorize”, understand the context – they can even cope with idioms;
Accuracy is in forecasting;
versatility – suitable for many tasks;
Adaptability – they work together with other tools.

One of the disadvantages is an explosion, disappearing gradients under the condition of small values. In the first case, the algorithm distributes weights unreasonably, in the second case it simply stops learning. It was this point that attracted the attention of programmers and LSTM appeared.

To solve this problem, the concept of attention was also invented, or the need to prioritize the importance between the available parameters. A person can’t tell AI anything, the idea is to teach AI to figure out for itself which information to pay more attention to and which to pay less attention to.

An example of the implementation of the concept is a Dynamic Memory Networks system with episodic memory. Here, in addition to the introductory ones, a question is added. It turns out to be a complete episode (for example, “The cat is sitting on the windowsill. He went outside. Where is the cat?”), which allows the AI to find the correct answer to the question – on the street.

Since the structure implies sequential processing, it takes more time to learn, and the process itself is laborious and tedious for the “teacher”. Especially with the use of the Dynamic Memory Networks concept. For the same reason, there are problems with converting long sequences. To be fair, developers are constantly adding solutions and conducting tests to increase the speed of work without loss of efficiency.

How to learn

Standard methods are used for training:

Controlled – when interacting with a person. The neural network is given several pictures, for example, they “explain” what is depicted on each one. The answers are checked, and the cycle is repeated until the AI gives the correct answer.
Uncontrolled – on sets of unlabeled data, when the neural network analyzes the response itself, receiving an internal report. If there is not enough information to get a result, repeats the cycle.
Enhanced – by analogy with a human, the AI receives reinforcement for completing a task correctly, and punishment for incorrect tasks.

The reverse error method, introduced in 1960, is also used. Based on the rule of differentiation of a complex function, when, after passing through the network, returning in the opposite direction, the main parameters of the model are adjusted. To put it simply, the gradient, under equal conditions for the sequence, depends on the calculations of the current and the results of the previous one.

Where they are used

In many industries. For example, analysts use digital tools to make forecasts of stock price dynamics and exchange rates in the long and short term. Weather forecasts (they do better than meteorologists), growth, and falling traffic.

In the work of recurrent networks, by converting photos, the image imitates the style of famous artists. Technically, this happens by splitting the file into fragments, with an independent analysis of each and even mixing them together.

RNN is able to write music, or rather make a fresh arrangement based on an existing melody. By the way, she handles this much better than she does with text. Because the notes still have no semantic meaning.

In addition, they work:

with software code, software written by AI is successfully compiled and works if it is explained in detail in advance for what purposes it is needed;
in machine translation technologies, moreover, with rather complex materials, unmistakably determining whether equally written characters in a word belong to a specific language.;
with natural language processing – sentiment analysis, the development of conversational models (the same voice, chatbots on websites that answer users’ questions)
to analyze the appearance of negative reviews about products, to understand whether real buyers are really writing this (time spread) or whether it is the machinations of competitors.

If you are interested in technology, want to try working with neural networks, and learn how to write queries, register on the website and get free access to Midjourney, Ideogram, and others.

Results

RNTS are not as popular as convolutional models. Despite the obvious improvements, the creation of new optimizing solutions. Most likely, simply because the tasks being solved are limited and highly specialized.