Final Project
Final Project Demo Schedule
- 12:00 - 12:20 pm: Amit Patel - Deep Reinforcement Learning for Dota 2
- 12:20 - 12:40 pm: Dylan, Nate, and Sam B. - Network Architectures for Multi-Label Video Tagging
- 12:40 - 1:00 pm: Team Mango (Beibei, Cole, Hogyan) - RNN for 2.5D CNN
- 1:00 - 1:20 pm: Jason, Lisa, and Sam - Sketch to Photo Translation
- 1:20 - 1:40 pm: Jason, Jay, and Alex - Lung Cancer Detection
- 1:40 - 2:00 pm: Justin, Jon, and Ben - Deep 3D+
- 2:00 - 2:20 pm: Jie, Jorge, and Chris
- 2:20 - 2:40 pm: Hossien and Xinmeng
- 2:40 - 3:00 pm: Takuto
Important Dates
Course project proposal: March 2 (due 11:59pm, email to comp150dl@gmail.com).Course project milestone: April 4 (due 11:59pm, email to comp150dl@gmail.com).
Final project presentations will be held from 12pm-3pm on Friday May 12 at Microsoft New England Research and Development (NERD), 1 Memorial Drive 1st Floor, Cambridge, MA. Adams/Attucks Conference Room.
Final project write-up: Friday May 12 (due 11:59pm, email to comp150dl@gmail.com).
Groups
These are the suggested teams for the semester project. If scheduling or other significant concerns arise, please email comp150dl@gmail.com. Teams will receive an email from the intructor with recommended project topics and research references.
- Team 1: Jason Krone, Lisa Fan, Sam Woolf
- Team 2: Alex Tong, Jay DeStories, Jason Fan
- Team 3: Beibei Du, Hogyan Wang, Cole Springate-Combs
- Team 4: Nathan Watts, Dylan Cashman, Sam Bruck
- Team 5: Ben Papp, Justin Lee, Jonathan Hohrath
- Team 6: Mohammad Hossein Chaghazardi, Xinmeng Li, Sambit Pradhan
- Team 7: Jie Li, Jorge Sendino Lopez de la Reina, Chris Mattoli
- Team 8: Amit Patel
- Team 9: Takato Sato
Overview
The Final Project is an opportunity for you to apply what you have learned in class to a problem of your interest. Your are encouraged to select a topic with your group and work on your own project. Potential projects usually fall into these two tracks:
- Applications. If you're coming to the class with a specific background and interests (e.g. biology, engineering, physics), we'd love to see you apply deep neural networks to problems related to your particular domain of interest. Pick a real-world problem and apply deep neural networks to solve it.
- Models. You can build a new model (algorithm) with deep neural networks, or a new variant of existing models, and apply it to tackle vision tasks. This track might be more challenging, and sometimes leads to a piece of publishable work.
Here you can find some sample project ideas:
- Any well known vision challenge such as COCO Detection or Keypoint Estimation, Visual Question Answering, or a Kaggle competition
- Describe images with Natural Language (COCO Attributes or Visual Genome Dataset)
- Yelp Dataset Challenge: describe restaurants from their Yelp images
- Dating artifacts with deep learning (use dataset collected from Spring 2016 version of this class)
- Deep Humor: generate funny captions for images (dataset collection required, contact instructor)
- Breast Cancer Prediction from Mammograms (contact instructor for dataset)
- Breast Cancer Prediction from Mammograms: Multiple Instance Learning (contact instructor for dataset)
- Rules for Social Media: characterize social network behavior of trans teens on tumblr (contact instructor for dataset)
- Geolocation: predict a photo's GPS corrdinates (several datasets available)
- Segment reflective surfaces with Light Field Data
- Recyclables recognition -- build your own dataset for this!
- Butterfly species recognition (ask instructor who to contact for this data)
- High fashion understanding using repository of New York Times fashion week images from last 20 years (instructor has this dataset)
- Alternate update scheme for faster training of Adversarial Nets (talk to instructor for more details)
- New loss function that optimizes for better generalization of network solution over lowest possible loss (talk to instructor for more details)
- Sample project ideas from UMass Amherst (Google Docs)
To inspire ideas, you might look at the papers read or recommended in this class, recent deep learning publications from top-tier vision conferences, as well as other resources below.
- Awesome Deep Vision
- CVPR: IEEE Conference on Computer Vision and Pattern Recognition
- ICCV: International Conference on Computer Vision
- ECCV: European Conference on Computer Vision
- NIPS: Neural Information Processing Systems
- ICLR: International Conference on Learning Representations
- Kaggle challenges: An online machine learning competition website. For example, a Yelp classification challenge.
For applications, this type of projects would involve careful data preparation, an appropriate loss function, details of training and cross-validation and good test set evaluations and model comparisons. Don't be afraid to be creative. Some successful examples can be found below:
- Teaching Deep Convolutional Neural Networks to Play Go
- Playing Atari with Deep Reinforcement Learning
- Winning the Galaxy Challenge with convnets
For models, deep neural networks have been successfully used in a variety of computer vision and NLP tasks. This type of projects would involve understanding the state-of-the-art vision or NLP models, and building new models or improving existing models. The list below presents some papers on recent advances of deep neural networks in the computer vision community.
- Object recognition: [Krizhevsky et al.], [Russakovsky et al.], [Szegedy et al.], [Simonyan et al.], [He et al.]
- Object detection: [Girshick et al.], [Sermanet et al.], [Erhan et al.]
- Image segmentation: [Long et al.]
- Video classification: [Karpathy et al.], [Simonyan and Zisserman]
- Scene classification: [Zhou et al.]
- Face recognition: [Taigman et al.]
- Depth estimation: [Eigen et al.]
- Image-to-sentence generation: [Karpathy and Fei-Fei], [Donahue et al.], [Vinyals et al.]
- Visualization and optimization: [Szegedy et al.], [Nguyen et al.], [Zeiler and Fergus], [Goodfellow et al.], [Schaul et al.]
We also provide a list of popular computer vision datasets:
- Meta Pointer: A large collection organized by CV Datasets.
- Yet another Meta pointer
- ImageNet: a large-scale image dataset for visual recognition organized by WordNet hierarchy
- SUN Database: a benchmark for scene recognition and object detection with annotated scene categories and segmented objects
- Places Database: a scene-centric database with 205 scene categories and 2.5 millions of labelled images
- NYU Depth Dataset v2: a RGB-D dataset of segmented indoor scenes
- Microsoft COCO: a new benchmark for image recognition, segmentation and captioning
- Flickr100M: 100 million creative commons Flickr images
- Labeled Faces in the Wild: a dataset of 13,000 labeled face photographs
- Human Pose Dataset: a benchmark for articulated human pose estimation
- YouTube Faces DB: a face video dataset for unconstrained face recognition in videos
- UCF101: an action recognition data set of realistic action videos with 101 action categories
- HMDB-51: a large human motion dataset of 51 action classes
Grading Policy
Final Project: 40% milestone: 5% write-up: 10% • clarity, structure, language, references: 3% • background literature survey, good understanding of the problem: 3% • good insights and discussions of methodology, analysis, results, etc.: 4% technical: 12% • correctness: 4% • depth: 4% • innovation: 4% evaluation and results: 10% • sound evaluation metric: 3% • thoroughness in analysis and experimentation: 3% • results and performance: 4% presentation: 3% (+2% bonus for best few presentations)
Project Proposal
The project proposal should be one paragraph (200-400 words). Your proposal should contain:
- Who are the (1~3) group members? What will each person do? (This need to be a separate detailed paragraph)
- What is the problem that you will be investigating? Why is it interesting?
- What data will you use? If you are collecting new datasets, how do you plan to collect them?
- What method or algorithm are you proposing? If there are existing implementations, will you use them and how? How do you plan to improve or modify such implementations?
- What reading will you examine to provide context and background?
- How will you evaluate your results? Qualitatively, what kind of results do you expect (e.g. plots or figures)? Quantitatively, what kind of analysis will you use to evaluate and/or compare your results (e.g. what performance metrics or statistical tests)?
Each group should submit a plain-text proposal to comp150dl@gmail.com. If your proposed project is joint with another class' project (with the consent of the other class' instructor), make this clear in the proposal.
Project Milestone
Your project milestone report should be between 2 - 3 pages using the provided template. The following is a suggested structure for your report:
- Title, Author(s)
- Introduction: this section introduces your problem, and the overall plan for approaching your problem
- Problem statement: Describe your problem precisely specifying the dataset to be used, expected results and evaluation
- Technical Approach: Describe the methods you intend to apply to solve the given problem
- Intermediate/Preliminary Results: State and evaluate your results upto the milestone
Submission: Please email a PDF file named <your ID>_milestone.pdf
to comp150dl@gmail.com. One submission for each group is sufficient.
Final Submission
Your final write-up should be between 4 - 8 pages using the provided template. After the class, we will post all the final reports online so that you can read about each others' work. If you do not want your writeup to be posted online, then please let us know at least a week in advance of the final writeup submission deadline.
You will submit one or two files:
- A pdf file of your final report
- (OPTIONAL) zip file (or pdf file) with Supplementary Materials
- Title, Author(s)
- Abstract: It should not be more than 300 words
- Introduction: this section introduces your problem, and the overall plan for approaching your problem
- Background/Related Work: This section discusses relevant literature for your project
- Approach: This section details the framework of your project. Be specific, which means you might want to include equations, figures, plots, etc
- Experiment: This section begins with what kind of experiments you're doing, what kind of dataset(s) you're using, and what is the way you measure or evaluate your results. It then shows in details the results of your experiments. By details, we mean both quantitative evaluations (show numbers, figures, tables, etc) as well as qualitative results (show images, example results, etc).
- Conclusion: What have you learned? Suggest future ideas.
- References: This is absolutely necessary.
Examples of things to put in your supplementary material:
- Source code (if your project proposed an algorithm, or code that is relevant and important for your project.).
- Cool videos, interactive visualizations, demos, etc.
- All of Caffe source code.
- Various ordinary data preprocessing scripts.
- Any code that is larger than 1MB.
- Model checkpoints.
- A computer virus.
Final Presentation
The culmination of this course will be Demo Day on May 12 from 12pm-3pm at Microsoft New England Research and Development (NERD), 1 Memorial Drive 1st Floor, Cambridge, MA. Adams/Attucks Conference Room. Each team will present their results of their project in a talk not exceeding 20 mins. Please leave 2-3 mins for questions. Teams that go over 20 mins will be given no mercy and will be hauled off stage. For examples of good talks, look for videos of oral presentations at NIPS, CVPR, ICCV, etc.
Honor Code
You may consult any papers, books, online references, or publicly available implementations for ideas and code that you may want to incorporate into your strategy or algorithm, so long as you clearly cite your sources in your code and your writeup. However, under no circumstances may you look at another group’s code or incorporate their code into your project.
If you are doing a similar project for another class, you must make this clear and write down the exact portion of the project that is being counted for COMP 150DL.
Previous Years' Projects
Brown Data-Driven Vision Final Projects (May 10, 2016)
Rapid content based image retrieval by Gustave Marques Netto
Deep Learning for Natural Image Segmentation Priors by Gabe Hope
Determining artifact date and culture from images by Christine Whalen