Homework 4
In this assignment you will implement recurrent networks, and apply them to image captioning on Microsoft COCO. We will also introduce the TinyImageNet dataset, and use a pretrained model on this dataset to explore different applications of image gradients.
The goals of this assignment are as follows:
- Understand the architecture of recurrent neural networks (RNNs) and how they operate on sequences by sharing weights over time
- Understand the difference between vanilla RNNs and Long-Short Term Memory (LSTM) RNNs
- Understand how to sample from an RNN at test-time
- Understand how to combine convolutional neural nets and recurrent nets to implement an image captioning system
- Understand how a trained convolutional network can be used to compute gradients with respect to the input image
- Implement and different applications of image gradients, including saliency maps, fooling images, class visualizations, feature inversion, and DeepDream.
Setup
Get the starter code by cloning the hw4 github repository. This can be accomplished by executing the following command:
git clone https://github.com/comp150DL/hw4.git
Setup Virtualenv: If you have not created a virtualenv for handling the python dependencies related to this course, please follow the Virtualenv tutorial.
If you would like to work on the provided AWS instances, please follow the Tufts AWS tutorial for how to connect to your Jupyter Notebook remotely.
To satisfy all software dependencies, start your virtualenv and double check that all required packages are installed:
workon deep-venv
cd hw4
pip install -r requirements.txt
Download data: Once you have the starter code, you
will need to download the rocessed MS-COCO dataset, the TinyImageNet
dataset, and the pretrained TinyImageNet model. Run the following
from the hw4
directory:
cd datasets
./get_coco_captioning.sh
./get_tiny_imagenet_a.sh
./get_pretrained_model.sh
Compile the Cython extension: Convolutional Neural Networks require a very
efficient implementation. We have implemented of the functionality using
Cython; you will need to compile the Cython extension
before you can run the code. From the hw4/hw4
directory, run the following
command:
python setup.py build_ext --inplace
Start Jupyter Notebook: After you have the
data, you should start the Jupyter Notebook server from the
hw4
directory. If you
are unfamiliar with Jupyter, you should read the
Jupyter tutorial.
Submitting your work
To make sure everything is working properly, remember to do
a clean run (“Kernel -> Restart & Run All”) after you finish
work for each notebook and submit the final version with all
the outputs. Once you are done working, compress all the code and
notebooks in a single file and submit your archive by emailing to comp150dl@gmail.com.
On Linux or macOS
you can run the
provided collectSubmission.sh
script from hw4/
to
produce a
file hw4.zip
(or
hw4.tar.gz
if zip is not on your system) .
Q1: Image Captioning with Vanilla RNNs (40 points)
The Jupyter notebook RNN_Captioning.ipynb
will walk you through the
implementation of an image captioning system on MS-COCO using vanilla recurrent
networks.
Q2: Image Captioning with LSTMs (35 points)
The Jupyter notebook LSTM_Captioning.ipynb
will walk you through the
implementation of Long-Short Term Memory (LSTM) RNNs, and apply them to image
captioning on MS-COCO.
Q3: Image Gradients: Saliency maps and Fooling Images (10 points)
The Jupyter notebook ImageGradients.ipynb
will introduce the TinyImageNet
dataset. You will use a pretrained model on this dataset to compute gradients
with respect to the image, and use them to produce saliency maps and fooling
images.
Q4: Image Generation: Classes, Inversion, DeepDream (15 points)
In the Jupyter notebook ImageGeneration.ipynb
you will use the pretrained
TinyImageNet model to generate images. In particular you will generate
class visualizations and implement feature inversion and DeepDream.
Q5: Do something extra! (up to +10 points)
Given the components of the assignment, try to do something cool. Maybe there is some way to generate images that we did not implement in the assignment?