Denoising AutoEncoders

in Ruby

Posted by Henry Chinner on April 28, 2015

In this tutorial we will train and visualize a denoising autoencoder on the MNIST data set. We will use this tutorial as a basis to construct deep neural networks in follow up tutorials.

Denoising autoencoders are regular autoencoder where the input signal gets corrupted. Corrupting the input signal forces the autoencoder to learn how to impute missing or corrupted values in the input signal. In doing so the autoencoder ends up learning useful representations of the data.

Denoising AutoEncoder

With regular autoencoders not enough restriction is placed on the learning and the regular autoencoder ends up learning an encoding similar to PCA, which is not as powerful a building block for classifiers as denoising autoencoders are.

Quick Start

Let’s go ahead and train an autoencoder.

  1. Pull or clone the rubylab git repo

  2. cd into /denoising_autoencoder

  3. Unzip the train, pretrain and test files in the data folder.

  4. cd into the /src directory and run bundle install

  5. Create a new .rb file and give it any name.

  6. Paste the following code into yourfile.rb

require 'rubygems'
require 'bundler/setup'
require './network_types.rb'

# Data layer
layer0 = DataLayer.new({:name =>'data',
                        :mode =>'pretrain', 
                        :data_file=>'pretrain',
                        :batch_size => 20,
                        :corruption_level => 0.5,
                        :a_func => {:name =>'scalar',:params=>{:scalar => 1/255.0}},
                        :fan_out =>784})

# Encoding Layer
layer1 = Layer.new({:weight_decay => 0.00001,
                    :a_func => {:name =>'tanh'},
                    :corruption_level => nil,
                    :weight_update_status => true,
                    :with_momentem => true,
                    :momentum => 0.9,
                    :name =>'l1',
                    :fan_in => 784,
                    :fan_out =>300})

# Decoding Layer
layer1_d = Layer.new({:weight_decay => 0.00001,
                    :a_func => {:name =>'sigmoid'},
                    :name =>'l1_d',
                    :weight_update_status => true,
                    :with_momentem => true,
                    :momentum => 0.9,
                    :fan_in => 300,
                    :fan_out =>784,
                    :y_pointer =>{:l_name=>'data',:l_attr => :y}})


# Setting up the network
net = FFNetwork.new([layer0,layer1,layer1_d])

# Training the network
net.train({:alpha =>0.005,:max_epocs=>30000})

# Save the model to disk so we can use it later
net.persist('denoising_autoencoder_mnist')

Then run yourfile.rb

While in training mode, a png will be created in the /src folder every 100 training steps. Each tile in the image represents a hidden unit in the encoding layer. The image is equivalent to the input image which will maximally activate the hidden unit. You should eventually see an image similar to this one.

MNIST Dataset random sample

You will notice the network has learned some interesting pen stroke like features. We can easily imagine how these lower level features can be used in the reconstruction of the MNIST handwritten digits.

Light Documentation

yourfile.rb requires ./network_types.rb which contains the FFNetwork class ( feed forward ). An instance of the FFNetwork class is created by passing an array of Layer and DataLayer objects to it where the order of the objects defines the layer ordering of the network.

layer0, layer1 and layer1_d are the 3 layers we need to construct a denoising autoencoder. The following table describes the configuration options of DataLayer and Layer.

Option Description Layer
:name Unique name given to each layer. This is used so a layer can be referenced by other layers DataLayer Layer
:mode train, test or pretrain DataLayer
:data_file Specify the name of the data source DataLayer  
:batch_size Sets how many observations are loaded by DataLayer at a time. DataLayer
:corruption_level Post activation masking probability DataLayer Layer
:fan_out Number of units in the layer DataLayer Layer
:fan_in Number of units in the previous layer Layer
:weight_decay Rate of weight decay, set to 0 if not needed Layer
:weight_update_status Setting to false will prevent the weights from being updated Layer
:with_momentum Should momentum be used in stochastic gradient descent Layer
:y_pointer Specifies the location of the target/output variable Layer

You can play around with different layer and network configurations. The mini-framework is setup to handle networks with an arbitrary number of layers. In a follow up tutorial we will construct a deep neural network with the same framework to improve on our baseline MNIST classification model results.

Useful Resources:

  1. The architecture of the code was inspired by a modular deep learning framework in Python called deeppy
  2. Desnoising Autoencoder Video
  3. Visualizing hidden representations