Papers
Paper Presentations
In this assignment you will examine recent progress from the deep learning and computer vision literature. You will lead a class discussion about your assigned paper in the format of a graduate seminar or reading group. The goals of this assignment are
- Practice reading and reviewing academic papers
- Learn to identify novel concepts
- Practice critical evaluation of research in Deep Learning
- Learn cannonical experimental frameworks and common metrics
- Lead an interesting discussion
- Practice using or re-implementing systems described in an academic paper
Presentations should be approximately 40mins (+/- 5mins) in length. This depends on the scope of the paper and any presented demos, however presentations should be no shorter than 30mins and no longer than 45mins.
Q1: Summarize the Paper (25 points)
Your presentation should cover all the major contributions of the assigned paper. Comparisons should be made to related works.
Q2: Explain Key Contributions (20 points)
Tell the class what is special about this paper compared to related work. Examine why the paper was well-cited or if you anticipate that it will be.
Q3: Explain Weaknesses (20 points)
Point out difficulties the paper's authors experienced or thing that you think the paper didn't address.
Q4: Address in-class and blog comment questions (25 points)
Respond to questions asked by your classmates.
Q5: Demo something from the paper (10 points)
Most papers will have a project page or some demo of the system presented in the paper. Show the class this demo and explain what's going on. If possible run the demo on new images or for new use cases not show in the paper.
Q6: Do something extra! (up to +10 points)
You should feel free to implement any part of the paper that you find interesting. This is a chance to combine your paper presentation with your final project. Implement some aspect of the paper that contributes to your own research. If the paper author's have released code or a pretrained model, try running that on the data you will use in your final project as a performance comparison or to use as a baseline.
Suggested Topics
Title, Authors | Link to Paper, Project page |
CVPR 2014 Tutorial on Deep Learning. Graham Taylor, Marc'Aurelio Ranzato, and Honglak Lee. Read only the first two sets of labeled Introduction and Supervised learning. | CVPR 2014 tutorial |
ImageNet Classification with Deep Convolutional Neural Networks. Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton. NIPS 2012. | |
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model sizeForrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer, arXiv 2016. | arXiv |
Object Detectors Emerge in Deep Scene CNNs. Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba. ICLR, 2015. | project page, arXiv |
Learning Deep Features for Scene Recognition using Places Database. B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. NIPS 2014. | project page, pdf, demo |
DeepBox: Learning Objectness with Convolutional Networks. Weicheng Kuo, Bharath Hariharan, Jitendra Malik. ICCV 2015. | arXiv |
Selective Search for Object Recognition. J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders. IJCV 2013. | project page |
Fast R-CNN. Ross Girshick. ICCV 2015. | arXiv, code |
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. NIPS 2015. | |
Fully Convolutional Networks for Semantic Segmentation. Jonathan Long, Evan Shelhamer, Trevor Darrell. CVPR 2015. | arXiv |
Deep Neural Decision Forests. Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, and Samuel Rota Bulo. ICCV 2015. | Project page |
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. R. Girshick, J. Donahue, T. Darrell, J. Malik. CVPR 2014. | arXiv |
Going Deeper with Convolutions. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. 2014. | arXiv |
Diagnosing error in object detectors. Derek Hoiem, Yodsawalai Chodpathumwan, and Qieyun Dai. ECCV 2012. | project page |
Understanding Deep Image Representations by Inverting Them. Aravindh Mahendran, Andrea Vedaldi. CVPR 2015. | arXiv |
Visualizing and Understanding Convolutional Networks. Matthew D Zeiler, Rob Fergus. ECCV 2014. | |
Using very deep autoencoders for content-based image retrieval. Krizhevsky, Alex, and Geoffrey E. Hinton. ESANN. 2011. | |
Unsupervised Visual Representation Learning by Context Prediction. Carl Doersch, Abhinav Gupta, Alexei A. Efros. ICCV 2015. | project page |
Learning a Discriminative Model for the Perception of Realism in Composite Images. Jun-Yan Zhu, Philipp Krahenbuhl, Eli Shechtman, Alexei A. Efros. ICCV 2015. | project page |
Sketch-Based 3D Shape Retrieval Using Convolutional Neural Networks. Fang Wang, Le Kang, Yi Li. CVPR 2015. | arXiv |
Multi-view Convolutional Neural Networks for 3D Shape Recognition. Hang Su, Subhransu Maji, Evangelos Kalogerakis, Erik Learned-Miller. ICCV 2015. | project page |
A High Performance CRF Model for Clothes Parsing. E Simo-Serra, S Fidler, F Moreno-Noguer, R Urtasun Computer VisionĂACCV 2014. | pdf, code |
Learning Visual Similarity for Product Design with Convolutional Neural Networks. Sean Bell, Kavita Bala. Siggraph 2015. | author page, pdf |
Learning Deep Representations for Ground-to-Aerial Geolocalization. Tsung-Yi Lin, Yin Cui, Serge Belongie, James Hays. CVPR 2015. | |
Joint Embeddings of Shapes and Images via CNN Image Purification. Yangyan Li, Hao Su, Charles Ruizhongtai Qi, Noa Fish, Daniel Cohen-Or, Leonidas Guibas. Siggraph Asia 2015. | project page |
VISALOGY: Answering Visual Analogy Questions. Fereshteh Sadeghi, C. Lawrence Zitnick, Ali Farhadi, NIPS 2015 | arXiv |
Exploring Nearest Neighbor Approaches for Image Captioning. Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C Lawrence Zitnick. arXiv, 2015. | arXiv |
Visual Madlibs: Fill in the blank Description Generation and Question Answering. Licheng Yu, Eunbyung Park, Alexander C. Berg, Tamara L. Berg. ICCV, 2015. | project page, pdf |
VQA: Visual Question Answering. S. Antol*, A. Agrawal*, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. ICCV, 2015. | project page, arXiv |
Visual Turing test for computer vision systems. Geman, Donald, et al. Proceedings of the National Academy of Sciences 112.12 (2015): 3618-3623. | PNAS page |
Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus. 2015. | project page, arXiv |
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Alec Radford, Luke Metz, Soumith Chintala. 2015. | project page, arXiv |
Learning to Generate Chairs, Tables and Cars with Convolutional Networks. Alexey Dosovitskiy, Jost Tobias Springenberg, Maxim Tatarchenko, Thomas Brox. CVPR 2015. | arXiv |
A Neural Algorithm of Artistic Style. Leon A. Gatys, Alexander S. Ecker, Matthias Bethge. 2015. | implementation, arXiv |
Hierarchical Question-Image Co-Attention for Visual Question Answering Jiasen Lu, Jianwei Yang, Dhruv Batra , Devi Parikh, NIPS 2016. | arXiv |
Identity Mappings in Deep Residual NetworksKaiming He , Xiangyu Zhang, Shaoqing Ren, Jian Sun, ECCV 2016 | paper | Residual networks behave like ensembles of relatively shallow networksAndreas Veit, Michael J Wilber, Serge Belongie, NIPS 2016 | nips.cc |
|
|
Microsoft COCO: Common Objects in Context. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick. ECCV 2014. | project page, paper |
The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding. Genevieve Patterson, Chen Xu, Hang Su, James Hays. IJCV 2014. | project page |
Learning Deep Representations of Fine-grained Visual Descriptions. Scott Reed, Zeynep Akata, Bernt Schiele, Honglak Lee, CVPR 2016. | arXiv |
Automatic attribute discovery and characterization from noisy web data. Berg, Tamara L., Alexander C. Berg, and Jonathan Shih. Computer VisionECCV 2010. Springer Berlin Heidelberg, 2010. 663-676. | |
Discovering the Spatial Extent of Relative Attributes. Fanyi Xiao, Yong Jae Lee. ICCV 2015. | |
How do humans sketch objects? Mathias Eitz, James Hays, and Marc Alexa. Siggraph 2012. | project page |
Transient Attributes for High-Level Understanding and Editing of Outdoor Scenes. Pierre-Yves Laffont, Zhile Ren, Xiaofeng Tao, Chao Qian, James Hays. Siggraph 2014. | project page |
Learning to predict where humans look. T. Judd, K. Ehinger, F. Durand, and A. Torralba. IEEE International Conference on Computer Vision (ICCV), 2009. | project page |
What makes Paris look like Paris? Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros. Siggraph 2012. | project page |
Learning Visual Biases from Human Imagination. Carl Vondrick, Hamed Pirsiavash, Aude Oliva, Antonio Torralba. NIPS 2015. | project page |
Sketch2Photo: Internet Image Montage. ACM SIGGRAPH ASIA 2009, ACM Transactions on Graphics. Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, Shi-Min Hu. | project page |
Eulerian video magnification for revealing subtle changes in the world. Wu, Hao-Yu, et al. ACM Trans. Graph. 31.4 (2012): 65. | project page |
Photo tourism: Exploring photo collections in 3D. Noah Snavely, Steven M. Seitz, Richard Szeliski. Siggraph 2006. | pdf, project page |