Open source frameworks-DL

7 min readNov 8, 2020

Implementing these deep learning architectures is certainly possible, but starting from scratch can be time-consuming, and they also need time to optimize and mature. Luckily, you can take advantage of several open source frameworks to more easily implement and deploy deep learning algorithms. These frameworks support languages like Python, C/C++, and the Java® language. Let’s explore three of the most popular frameworks and their strengths and weaknesses.

Caffe

One of the most popular deep learning frameworks is Caffe. Caffe was originally developed as part of a Ph.D. dissertation but is now released under the Berkeley Software Distribution license. Caffe supports a wide range of deep learning architectures, including CNN and LSTM, but notably does not support RBMs or DBMs (although the coming release of Caffe2 will include such support).

Caffe has been used for image classification and other vision applications, and it supports GPU-based acceleration with the NVIDIA CUDA Deep Neural Network library. Caffe supports Open Multi-Processing (OpenMP) for parallelizing deep learning algorithms over a cluster of systems. Caffe and Caffe2 are written in C++ for performance and offer a Python and MATLAB interface for deep learning training and execution.

Deeplearning4j

Deeplearning4j is a popular deep learning framework that is focused on Java technology, but it includes application programming interfaces for other languages, such as Scala, Python, and Clojure. The framework is released under the Apache license and includes support for RBMs, DBNs, CNNs, and RNNs. Deeplearning4j also includes distributed parallel versions that work with Apache Hadoop and Spark (big data processing frameworks).

Deeplearning4j has been applied to various problems, including fraud detection in the financial sector, recommender systems, image recognition, and cybersecurity (network intrusion detection). The framework integrates with CUDA for GPU optimization and can be distributed with OpenMP or Hadoop.

TensorFlow

TensorFlow was developed by Google as an open source library and descendent of the closed source DistBelief. You can use TensorFlow to train and deploy various neural networks (CNNs, RBMs, DBNs, and RNNs) and is released under the Apache 2.0 license. TensorFlow has been applied to various problems, such as image captioning, malware detection, speech recognition, and information retrieval. An Android-focused stack called TensorFlow Lite was recently released.

You can develop applications with TensorFlow in Python, C++, the Java language, Rust, or Go (although Python is the most stable) and distribute their execution with Hadoop. TensorFlow supports CUDA, as well, in addition to specialized hardware interfaces.

Distributed Deep Learning

Dubbed the “jet engine of deep learning,” IBM Distributed Deep Learning (DDL) is a library that links into leading frameworks such as Caffe and TensorFlow. DDL can be used to accelerate deep learning algorithms over clusters of servers and hundreds of GPUs. DDL optimizes the communication of neuron calculations by defining optimal paths that the resulting data must take between GPUs. Resolving the bottleneck of a deep learning cluster was demonstrated by beating a prior image recognition task Microsoft had recently set.

Keras

Keras is an advanced interface used for deep learning platforms similar to Google’s TensorFlow discussed above. Created by Francois Chollet in 2015, Keras grew to become to the second most popular DL framework after TensorFlow. This is called out in its mission — “to make drafting DL models as easy as writing new methods in Python.”

For those who have faced some difficulty using TensorFlow, Keras is usually a welcome relief. Keras can create common neuron layers, choose metrics, optimization methods, and error functions as well as get the model trained rapidly and efficiently.

The main advantage of Keras is in its modularity. Every neural building block is available to use in the library and users can easily compose them over one other, thus creating a more customized and elaborate model.

SciKit-learn

Given the recent expansion of Deep Learning, it is easy to have the misconception that the more traditional Machine Learning models are redundant. However, this is far from the truth. Some everyday ML tasks are still solved using the more classic or traditional models that have been the industry standard before the so-called, Deep Learning boom. Also, these traditional models are refreshingly simpler and easier to use.

Given that DL models are excellent at capturing patterns, it can tend to be hard to explain what their learning methodology is. Additionally, very often they tend to be expensive to train and then to deploy. The more regular problems of dimensionality reduction, clustering as well as feature selection can all be easily solved with the help of the more traditional models.

This is where SciKit-learn comes in. SciKit-learn is a framework backed by academia and has just celebrated its 10th year in the field. It employs practically every machine learning model available today, right from logistic regression to linear, from random forests to SVM classifiers. SciKit-learn comes with a comprehensive toolbox of preprocessing methods including text transformations, dimensionality reduction, and others.

SciKit-learn is perhaps one of the greatest achievements of the Python community. The user guide for SciKit-learn is almost as good as a textbook for machine learning and data science. Even a small startup needing to jump into the deep learning fray should consider choosing SciKit-learn as a starting for similar results but at a fraction of the development time.

Apache Spark

Apache Spark forms a significant part of IBM’s Deep Learning capabilities and is designed keeping in mind cluster computing. It contains MLlib, which is a distributed machine learning framework. MLlib works in conjunction with Spark’s distributed memory architecture. MLlib comes with a vast number of commonly used statistical tools as well as machine learning algorithms.

Another significant component of IBMs platform is SystemML. SystemML consists of a high-level, declarative R-like syntax which has inbuilt statistical functions, linear algebra primitives and can construct specific to machine learning. Its scripts can execute from within one SMP enclosure. They can also run across many different nodes within a distributed computation using Apache Spark or Hadoop’s MapReduce.

SystemML and Spark work well together so much so that Spark can collect data using Spark Streaming and renders it specific representation, while, at the same time, SystemML can determine the optimum algorithm that can be used to analyze this data considering the layout and configuration of the cluster.

One of the platform’s most standout feature is that it can automatically optimize data analysis based on the clustering characteristics and data. This makes sure that the model is scalable and efficient. Competing deep learning frameworks are not able to take similar optimization decisions for the user without outside input.

Edward

Perhaps one of the most intriguing and promising developments the community has witnessed in a while is Edward.

Created by Dustin Tran, a researcher at Google, along with a group of AI contributors and researchers, Edward is built atop TensorFlow and combines three areas — Machine Learning, Bayesian Statistics, Deep Learning, and Probabilistic Programming.

Edward enables users to create Probabilistic Graphical Models or PGMs and can be used to build Bayesian neural networks together with models that can be projected as a graph while using probabilistic representations.

Edward’s uses are limited to the more advanced AI models rather than to real-world applications, however since PGMs are becoming more useful concerning AI research, it is safe to assume that the near future holds more practical uses for Edward.

Lime

Among the toughest challenges when it comes to ML is debugging its internal representations, or explaining what exactly the model has learned.

Lime is a Python package that is easy to use and explains the internal representations intelligently. It takes a constructed model as input, runs a second ‘meta’ approximator based on the learned model and approximates the behavior of the model for multiple data. The output is essentially an explanation of the model and identifies what part of the input assisted the model in reaching a decision and which part did not.

Once this conclusion is reached, the results are displayed in an accessible, legible and interpretable format. As an example, let us assume that a text classifier might look like this and highlights those words that assisted the model in concluding, and their respective probabilities -

Lime works hand in hand with SciKit-learn models similar to any other classifier model that accepts raw text or arrays and outputs a probability of each class.

Lime can also explain the classification of images. In the screenshot below, Lime can tell what areas of the image are used by the model for classifying the image as ‘cat’ viz. the green area, and what parts of the image had a negative weight which caused it to lean toward a classification of ‘dog’.

Deep learning is represented by a spectrum of architectures that can build solutions for a range of problem areas. These solutions can be feed-forward focused or recurrent networks that permit consideration of previous inputs. Although building these types of deep architectures can be complex, various open source solutions, such as Caffe, Deeplearning4j, TensorFlow, and DDL,Keras,scikit learn, apache spark are available to get you up and running quickly.

Thank You
Prateek Dutta
Student, B.Tech. Artificial Intelligence