OCR at high level has 2 main tasks

  • Text detection — detect the potions of text in the images(Word level or character level detection)
  • Text transcription — covert an image into sequence of characters

Problems with OCR are

  • tesseract or any OCR solutions transcribe the text from left to right. If information we want to extract is not in the same reading order (Like value should always be on right of key in key/value extraction) we would not be able to rightly extract the text.
  • Extracting complex entities spanning across multiple lines
  • Some Information cannot be extracted with rules post…

Attention and Memory in Deep Learning and NLP.

Visual Attention

The recent trend in deep learning for NLP is “Attention Mechanism”. This method is closely related to the visual attention we see in human beings. When we are looking at the image we don’t focus on the overall image at once we focus on certain parts of the image with high resolution while perceiving the low resolution and adjusting the focal point over time.

Human attention allows us to focus on certain regions with “high resolution” (ear in the yellow box) while perceiving the “low resolution” (like snowy background and coat of the dog) and adjust the focal…


Entropy is the measure of the reduction in uncertainty.

Entropy came from Claude Shannon's information theory, where the goal is to send information from the sender to the recipient in an optimized way. We use bits to send information, a bit is either 0 or 1.

When we are using one bit to send a piece of information we are reducing the recipient's uncertainty by a factor of 2.

Suppose we have two types of weather Rainy & Sunny, And the forecast has predicted next day would be rainy, here we reduced the uncertainty by a factor of 2…

Intelligence is not to act when you are uncertain

There can be two types of errors associated with any kind of Machine learning or deep learning model

  1. Reducible Error(With more data points this error can be reduced)
  2. Irreducible Error(This error is because of inherent variance in the data)

The overall error made by any model is a combination of the above two errors.

Aleatoric Uncertainty:

When we have done any lab experiment, the values measured after multiple trials will never be the same. Even with all similar input values output measurements will differ every time you run. This is what…

Neural structured learning is a framework used for training neural networks with structured signals. This can be applied to NLP, vision or any prediction problem in general(classification, Regression).

The structure can be explicitly given (like knowledge graphs in NLP) or structure can be generated on the fly while training (like creating adversarial examples with perturbations in the data).

Why is this helpful?

  • This helps to achieve better accuracy when labeled data samples are really small
  • More robust models(because the goal of adversarial examples generated is to confuse the model to predict wrong.)
  • The models will be invariant to slight variations…

ONNX stands for an Open Neural Network Exchange is a way of easily porting models among different frameworks available like Pytorch, Tensorflow, Keras, Cafee2, CoreML.Most of these frameworks now support ONNX format.

Pytorch is the most preferred language of researchers for their experiments because of its pythonic way of writing code compared to TensorFlow. But when you have to deploy it to production Tensorflow has the best stack like Tensorflow Serving. Currently, there is no easy way to convert Tensorflow models to Pytorch. This is where ONNX shines.

Step 1 Convert your Pytorch model to ONNX format

from transformers import

Dropout applied to regular neural network cannot be applied to RNN’s as it will hinder the RNN’s ability to retain the long term dependency.

To over come this Drop Connect is used:

Instead of operating on the RNN’s hidden states, one can regularize the network through restrictions on the recurrent matrices as well.This can be done through restricting the capacity of the matrix, can be applied without any modification to existing LSTM implementations. This weight-dropped LSTM applies recurrent regularisation through a DropConnect mask on the hidden-to-hidden recurrent weights. …

Santhosh Kolloju

Lead AI Applied Research

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store