Papers

Paper Presentations

In this assignment you will examine recent progress from the deep learning and computer vision literature. You will lead a class discussion about your assigned paper in the format of a graduate seminar or reading group. The goals of this assignment are

  • Practice reading and reviewing academic papers
  • Learn to identify novel concepts
  • Practice critical evaluation of research in Deep Learning
  • Learn cannonical experimental frameworks and common metrics
  • Lead an interesting discussion
  • Practice using or re-implementing systems described in an academic paper

Presentations should be approximately 40mins (+/- 5mins) in length. This depends on the scope of the paper and any presented demos, however presentations should be no shorter than 30mins and no longer than 45mins.

Q1: Summarize the Paper (25 points)

Your presentation should cover all the major contributions of the assigned paper. Comparisons should be made to related works.

Q2: Explain Key Contributions (20 points)

Tell the class what is special about this paper compared to related work. Examine why the paper was well-cited or if you anticipate that it will be.

Q3: Explain Weaknesses (20 points)

Point out difficulties the paper's authors experienced or thing that you think the paper didn't address.

Q4: Address in-class and blog comment questions (25 points)

Respond to questions asked by your classmates.

Q5: Demo something from the paper (10 points)

Most papers will have a project page or some demo of the system presented in the paper. Show the class this demo and explain what's going on. If possible run the demo on new images or for new use cases not show in the paper.

Q6: Do something extra! (up to +10 points)

You should feel free to implement any part of the paper that you find interesting. This is a chance to combine your paper presentation with your final project. Implement some aspect of the paper that contributes to your own research. If the paper author's have released code or a pretrained model, try running that on the data you will use in your final project as a performance comparison or to use as a baseline.

Suggested Topics

Title, Authors Link to Paper, Project page
CVPR 2014 Tutorial on Deep Learning. Graham Taylor, Marc'Aurelio Ranzato, and Honglak Lee. Read only the first two sets of labeled Introduction and Supervised learning. CVPR 2014 tutorial
ImageNet Classification with Deep Convolutional Neural Networks. Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton. NIPS 2012. pdf
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model sizeForrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer, arXiv 2016. arXiv
ConvNet detection and segmentation
Object Detectors Emerge in Deep Scene CNNs. Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba. ICLR, 2015. project page, arXiv
Learning Deep Features for Scene Recognition using Places Database. B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. NIPS 2014. project page, pdf, demo
DeepBox: Learning Objectness with Convolutional Networks. Weicheng Kuo, Bharath Hariharan, Jitendra Malik. ICCV 2015. arXiv
Selective Search for Object Recognition. J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders. IJCV 2013. project page
Fast R-CNN. Ross Girshick. ICCV 2015. arXiv, code
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. NIPS 2015. pdf
Fully Convolutional Networks for Semantic Segmentation. Jonathan Long, Evan Shelhamer, Trevor Darrell. CVPR 2015. arXiv
Deep Neural Decision Forests. Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, and Samuel Rota Bulo. ICCV 2015. Project page
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. R. Girshick, J. Donahue, T. Darrell, J. Malik. CVPR 2014. arXiv
Going Deeper with Convolutions. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. 2014. arXiv
Diagnosing error in object detectors. Derek Hoiem, Yodsawalai Chodpathumwan, and Qieyun Dai. ECCV 2012. project page
Vizualizing ConvNets
Understanding Deep Image Representations by Inverting Them. Aravindh Mahendran, Andrea Vedaldi. CVPR 2015. arXiv
Visualizing and Understanding Convolutional Networks. Matthew D Zeiler, Rob Fergus. ECCV 2014. pdf
Weakly Supervised and Unsupervised ConvNets
Using very deep autoencoders for content-based image retrieval. Krizhevsky, Alex, and Geoffrey E. Hinton. ESANN. 2011. pdf
Unsupervised Visual Representation Learning by Context Prediction. Carl Doersch, Abhinav Gupta, Alexei A. Efros. ICCV 2015. project page
Learning a Discriminative Model for the Perception of Realism in Composite Images. Jun-Yan Zhu, Philipp Krahenbuhl, Eli Shechtman, Alexei A. Efros. ICCV 2015. project page
Sketch-Based 3D Shape Retrieval Using Convolutional Neural Networks. Fang Wang, Le Kang, Yi Li. CVPR 2015. arXiv
Multi-view Convolutional Neural Networks for 3D Shape Recognition. Hang Su, Subhransu Maji, Evangelos Kalogerakis, Erik Learned-Miller. ICCV 2015. project page
A High Performance CRF Model for Clothes Parsing. E Simo-Serra, S Fidler, F Moreno-Noguer, R Urtasun Computer VisionÐACCV 2014. pdf, code
Siamese and Ranking ConvNets
Learning Visual Similarity for Product Design with Convolutional Neural Networks. Sean Bell, Kavita Bala. Siggraph 2015. author page, pdf
Learning Deep Representations for Ground-to-Aerial Geolocalization. Tsung-Yi Lin, Yin Cui, Serge Belongie, James Hays. CVPR 2015. pdf
Joint Embeddings of Shapes and Images via CNN Image Purification. Yangyan Li, Hao Su, Charles Ruizhongtai Qi, Noa Fish, Daniel Cohen-Or, Leonidas Guibas. Siggraph Asia 2015. project page
Images and Words
VISALOGY: Answering Visual Analogy Questions. Fereshteh Sadeghi, C. Lawrence Zitnick, Ali Farhadi, NIPS 2015 arXiv
Exploring Nearest Neighbor Approaches for Image Captioning. Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C Lawrence Zitnick. arXiv, 2015. arXiv
Visual Madlibs: Fill in the blank Description Generation and Question Answering. Licheng Yu, Eunbyung Park, Alexander C. Berg, Tamara L. Berg. ICCV, 2015. project page, pdf
VQA: Visual Question Answering. S. Antol*, A. Agrawal*, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. ICCV, 2015. project page, arXiv
Visual Turing test for computer vision systems. Geman, Donald, et al. Proceedings of the National Academy of Sciences 112.12 (2015): 3618-3623. PNAS page
Generative ConvNets
Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus. 2015. project page, arXiv
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Alec Radford, Luke Metz, Soumith Chintala. 2015. project page, arXiv
Learning to Generate Chairs, Tables and Cars with Convolutional Networks. Alexey Dosovitskiy, Jost Tobias Springenberg, Maxim Tatarchenko, Thomas Brox. CVPR 2015. arXiv
A Neural Algorithm of Artistic Style. Leon A. Gatys, Alexander S. Ecker, Matthias Bethge. 2015. implementation, arXiv
Co-Atention Networks
Hierarchical Question-Image Co-Attention for Visual Question Answering Jiasen Lu, Jianwei Yang, Dhruv Batra , Devi Parikh, NIPS 2016. arXiv
Residual Networks
Identity Mappings in Deep Residual NetworksKaiming He , Xiangyu Zhang, Shaoqing Ren, Jian Sun, ECCV 2016 paper
Residual networks behave like ensembles of relatively shallow networksAndreas Veit, Michael J Wilber, Serge Belongie, NIPS 2016 nips.cc
Datasets
Microsoft COCO: Common Objects in Context. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick. ECCV 2014. project page, paper
The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding. Genevieve Patterson, Chen Xu, Hang Su, James Hays. IJCV 2014. project page
Attribute-based Representations
Learning Deep Representations of Fine-grained Visual Descriptions. Scott Reed, Zeynep Akata, Bernt Schiele, Honglak Lee, CVPR 2016. arXiv
Automatic attribute discovery and characterization from noisy web data. Berg, Tamara L., Alexander C. Berg, and Jonathan Shih. Computer VisionECCV 2010. Springer Berlin Heidelberg, 2010. 663-676. pdf
Discovering the Spatial Extent of Relative Attributes. Fanyi Xiao, Yong Jae Lee. ICCV 2015. pdf
Misc
How do humans sketch objects? Mathias Eitz, James Hays, and Marc Alexa. Siggraph 2012. project page
Transient Attributes for High-Level Understanding and Editing of Outdoor Scenes. Pierre-Yves Laffont, Zhile Ren, Xiaofeng Tao, Chao Qian, James Hays. Siggraph 2014. project page
Learning to predict where humans look. T. Judd, K. Ehinger, F. Durand, and A. Torralba. IEEE International Conference on Computer Vision (ICCV), 2009.project page
What makes Paris look like Paris? Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros. Siggraph 2012. project page
Learning Visual Biases from Human Imagination. Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba. NIPS 2015. project page
Sketch2Photo: Internet Image Montage. ACM SIGGRAPH ASIA 2009, ACM Transactions on Graphics. Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, Shi-Min Hu.project page
Eulerian video magnification for revealing subtle changes in the world. Wu, Hao-Yu, et al. ACM Trans. Graph. 31.4 (2012): 65. project page
Photo tourism: Exploring photo collections in 3D. Noah Snavely, Steven M. Seitz, Richard Szeliski. Siggraph 2006.pdf, project page