About Program Info

CVPR15

Monday, June 8

8:30am-8:40am	Ballrooms A,B,C	Opening Remarks from Conference Chairs
8:40am-10:10am		Oral Session
10:10am-12:30pm	Exhibit Hall A	Poster Session 1A
12:30pm-2:00pm	Exhibit Hall B	Lunch
2:00pm-3:30pm		Oral Session
3:30pm-6:00pm	Exhibit Hall A	Poster Session 1B
6:00pm-7:30pm	Ballrooms A,B,C	Reception & Awards
7:30pm-8:30pm	Rooms 302,304,306	PAMI Technical Committee/Computer Vision Foundation Meeting

Tuesday, June 9

8:30am-10:00am		Oral Session
10:00am-12:30pm	Exhibit Hall A	Poster Session 2A
12:30pm-2:00pm	Exhibit Hall B	Lunch
2:00pm-3:30pm		Oral Session
3:30pm-6:00pm	Exhibit Hall A	Poster Session 2B
6:00pm-9:00pm	Sheraton Grand Ballroom	Banquet Dinner

Wednesday, June 10

8:30am-10:00am		Oral Session
10:30am-11:25am	Ballrooms A,B,C	Plenary Speaker:
11:30am-12:25pm	Ballrooms A,B,C	Plenary Speaker:
12:30pm-2:00pm	Exhibit Hall B	Lunch
2:00pm-3:30pm		Oral Session
3:30pm-6:00pm	Exhibit Hall A	Poster Session 3B

Monday June 8, 8:40am-10:10am

CNN Architectures	Depth and 3D Surfaces
Ballrooms A,B,C	Rooms 302,304,306
Hypercolumns for Object Segmentation and Fine-Grained Localization	DynamicFusion: Reconstruction and Tracking of Non-Rigid Scenes in Real-Time
Modeling Local and Global Deformations in Deep Learning: Epitomic Convolution, Multiple Instance Learning, and Sliding Window Detection	3D Scanning Deformable Objects With a Single RGBD Sensor
Improving Object Detection With Deep Convolutional Networks via Bayesian Optimization and Structured Prediction	An Efficient Volumetric Framework for Shape Tracking
Going Deeper With Convolutions	Part-Based Modelling of Compound Scenes From Images
Understanding Image Representations by Measuring Their Equivariance and Equivalence	SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite
Deep Neural Networks Are Easily Fooled: High Confidence Predictions for Unrecognizable Images	Small-Variance Nonparametric Clustering on the Hypersphere

Monday June 8, 10:10am-12:30pm

Poster Session
	Session 1A, Exhibit Hall A
Poster #	Title and Authors
1	Going Deeper With Convolutions
2	Propagated Image Filtering
3	Web Scale Photo Hash Clustering on A Single Machine
4	Expanding Object Detector's Horizon: Incremental Learning Framework for Object Detection in Videos
5	Supervised Discrete Hashing
6	What do 15,000 Object Categories Tell Us About Classifying and Localizing Actions?
7	Landmarks-Based Kernelized Subspace Alignment for Unsupervised Domain Adaptation
8	Blur Kernel Estimation Using Normalized Color-Line Prior
9	A Light Transport Model for Mitigating Multipath Interference in Time-of-Flight Sensors
10	Traditional Saliency Reloaded: A Good Old Model in New Shape
11	Automatic Construction Of Robust Spherical Harmonic Subspaces
12	Leveraging Stereo Matching With Learning-Based Confidence Measures
13	Saliency Detection via Cellular Automata
14	Efficient Sparse-to-Dense Optical Flow Estimation Using a Learned Basis and Layers
15	Learning Multiple Visual Tasks While Discovering Their Structure
16	Projection Metric Learning on Grassmann Manifold With Application to Video Based Face Recognition
17	Structural Sparse Tracking
18	Data-Driven Depth Map Refinement via Multi-Scale Sparse Representation
19	Uncalibrated Photometric Stereo Based on Elevation Angle Recovery From BRDF Symmetry of Isotropic Materials
20	Attributes and Categories for Generic Instance Search From One Example
21	Heat Diffusion Over Weighted Manifolds: A New Descriptor for Textured 3D Non-Rigid Shapes
22	A Dynamic Programming Approach for Fast and Robust Object Pose Recognition From Range Images
23	Beyond Gaussian Pyramid: Multi-Skip Feature Stacking for Action Recognition
24	A Geodesic-Preserving Method for Image Warping
25	Shape Driven Kernel Adaptation in Convolutional Neural Network for Robust Facial Traits Recognition
26	From Categories to Subcategories: Large-Scale Image Classification With Partial Class Label Refinement
27	Combination Features and Models for Human Detection
28	Improving Object Detection With Deep Convolutional Networks via Bayesian Optimization and Structured Prediction
29	A Metric Parametrization for Trifocal Tensors With Non-Colinear Pinholes
30	An Efficient Volumetric Framework for Shape Tracking
31	Structured Sparse Subspace Clustering: A Unified Optimization Framework
32	Delving Into Egocentric Actions
33	Latent Trees for Estimating Intensity of Facial Action Units
34	Robust Regression on Image Manifolds for Ordered Label Denoising
35	Privacy Preserving Optics for Miniature Vision Sensors
36	Deep Transfer Metric Learning
37	Small-Variance Nonparametric Clustering on the Hypersphere
38	DynamicFusion: Reconstruction and Tracking of Non-Rigid Scenes in Real-Time
39	Reliable Patch Trackers: Robust Visual Tracking by Exploiting Reliable Patches
40	Predicting Eye Fixations Using Convolutional Neural Networks
41	Kernel Fusion for Better Image Deblurring
42	Direction Matters: Depth Estimation With a Surface Normal Classifier
43	Modeling Local and Global Deformations in Deep Learning: Epitomic Convolution, Multiple Instance Learning, and Sliding Window Detection
44	Grasp Type Revisited: A Modern Perspective on a Classical Feature for Vision
45	Learning Hypergraph-Regularized Attribute Predictors
46	A Coarse-to-Fine Model for 3D Pose Estimation and Sub-Category Recognition
47	Deep Neural Networks Are Easily Fooled: High Confidence Predictions for Unrecognizable Images
48	Deformable Part Models are Convolutional Neural Networks
49	Hypercolumns for Object Segmentation and Fine-Grained Localization
50	Mapping Visual Features to Semantic Profiles for Retrieval in Medical Imaging
51	Event-Driven Stereo Matching for Real-Time 3D Panoramic Vision
52	Graph-Based Simplex Method for Pairwise Energy Minimization With Binary Variables
53	Image Denoising via Adaptive Soft-Thresholding Based on Non-Local Samples
54	3D Scanning Deformable Objects With a Single RGBD Sensor
55	Nested Motion Descriptors
56	Efficient Minimal-Surface Regularization of Perspective Depth Maps in Variational Stereo
57	Maximum Persistency via Iterative Relaxed Inference With Graphical Models
58	Deep Hierarchical Parsing for Semantic Segmentation
59	Designing Deep Networks for Surface Normal Estimation
60	Layered RGBD Scene Flow Estimation
61	Hashing With Binary Autoencoders
62	SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite
63	Collaborative Feature Learning From Social Media
64	Diversity-Induced Multi-View Subspace Clustering
65	Building a Bird Recognition App and Large Scale Dataset With Citizen Scientists: The Fine Print in Fine-Grained Dataset Collection
66	Early Burst Detection for Memory-Efficient Image Retrieval
67	Indoor Scene Structure Analysis for Single Image Depth Estimation
68	Light Field Layer Matting
69	Depth Camera Tracking With Contour Cues
70	Radial Distortion Homography
71	Efficient Object Localization Using Convolutional Networks
72	Just Noticeable Defocus Blur Detection and Estimation
73	How Do We Use Our Hands? Discovering a Diverse Set of Common Grasps
74	Rotating Your Face Using Multi-Task Deep Neural Network
75	Is Object Localization for Free? - Weakly-Supervised Learning With Convolutional Neural Networks
76	Super-Resolution Person Re-Identification With Semi-Coupled Low-Rank Discriminant Dictionary Learning
77	Dual Domain Filters Based Texture and Structure Preserving Image Non-Blind Deconvolution
78	Region-Based Temporally Consistent Video Post-Processing
79	Global Refinement of Random Forest
80	Adaptive Region Pooling for Object Detection
81	Discriminative and Consistent Similarities in Instance-Level Multiple Instance Learning
82	MUlti-Store Tracker (MUSTer): A Cognitive Psychology Inspired Approach to Object Tracking
83	Finding Action Tubes
84	Learning a Convolutional Neural Network for Non-Uniform Motion Blur Removal
85	Complexity-Adaptive Distance Metric for Object Proposals Generation
86	High-Fidelity Pose and Expression Normalization for Face Recognition in the Wild
87	Transformation of Markov Random Fields for Marginal Distribution Estimation
88	Sparse Convolutional Neural Networks
89	FaceNet: A Unified Embedding for Face Recognition and Clustering
90	Cascaded Hand Pose Regression
91	Cross-Scene Crowd Counting via Deep Convolutional Neural Networks
92	The Application of Two-Level Attention Models in Deep Convolutional Neural Network for Fine-Grained Image Classification
93	End-to-End Integration of a Convolution Network, Deformable Parts Model and Non-Maximum Suppression
94	A Mixed Bag of Emotions: Model, Predict, and Transfer Emotion Distributions
95	Neuroaesthetics in Fashion: Modeling the Perception of Fashionability
96	Part-Based Modelling of Compound Scenes From Images
97	Efficient Parallel Optimization for Potts Energy With Hierarchical Fusion
98	Pooled Motion Features for First-Person Videos
99	Functional Correspondence by Matrix Completion
100	Elastic-Net Regularization of Singular Values for Robust Subspace Learning
101	Hardware Compliant Approximate Image Codes
102	Photometric Refinement of Depth Maps for Multi-Albedo Objects
103	Predicting the Future Behavior of a Time-Varying Probability Distribution
104	Classifier Based Graph Construction for Video Segmentation
105	ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding
106	Mid-Level Deep Pattern Mining
107	Prediction of Search Targets From Fixations in Open-World Settings
108	Understanding Image Representations by Measuring Their Equivariance and Equivalence
109	Effective Learning-Based Illuminant Estimation Using Simple Features
110	PAIGE: PAirwise Image Geometry Encoding for Improved Efficiency in Structure-From-Motion
111	Dense, Accurate Optical Flow Estimation With Piecewise Parametric Model
112	Single-Image Estimation of the Camera Response Function in Near-Lighting
113	Multispectral Pedestrian Detection: Benchmark Dataset and Baseline
114	A Low-Dimensional Step Pattern Analysis Algorithm With Application to Multimodal Retinal Image Registration
115	Bilinear Heterogeneous Information Machine for RGB-D Action Recognition
116	MRF Optimization by Graph Approximation
117	SALICON: Saliency in Context
118	Weakly Supervised Object Detection With Convex Clustering
119	Interleaved Text/Image Deep Mining on a Very Large-Scale Radiology Database
120	Learning Semantic Relationships for Better Action Retrieval in Images
121	Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition

Monday June 8, 2:00pm-3:30pm

Discovery and Dense Correspondences	3D Shape: Matching, Recognition, Reconstruction
Ballrooms A,B,C	Rooms 302,304,306
Discovering States and Transformations in Image Collections	Category-Specific Object Reconstruction From a Single Image
Unsupervised Object Discovery and Localization in the Wild: Part-Based Matching With Bottom-Up Region Proposals	Discriminative Shape From Shading in Uncalibrated Illumination
FlowWeb: Joint Image Set Alignment by Weaving Consistent, Pixel-Wise Correspondences	Learning to Generate Chairs With Convolutional Neural Networks
EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow	3D ShapeNets: A Deep Representation for Volumetric Shapes
Phase-Based Frame Interpolation for Video	Sketch-Based 3D Shape Retrieval Using Convolutional Neural Networks
Towards Open World Recognition	Data-Driven 3D Voxel Patterns for Object Category Recognition

Monday June 8, 3:30pm-6:00pm

Poster Session
	Session 1B, Exhibit Hall A
Poster #	Title and Authors
1	Depth and Surface Normal Estimation From Monocular Images Using Regression on Deep Features and Hierarchical CRFs
2	Discriminative Shape From Shading in Uncalibrated Illumination
3	Multi-Manifold Deep Metric Learning for Image Set Classification
4	Target Identity-Aware Network Flow for Online Multiple Target Tracking
5	Adaptive As-Natural-As-Possible Image Stitching
6	EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow
7	Learning Coarse-to-Fine Sparselets for Efficient Object Detection and Scene Classification
8	Continuous Visibility Feature
9	FlowWeb: Joint Image Set Alignment by Weaving Consistent, Pixel-Wise Correspondences
10	Unsupervised Object Discovery and Localization in the Wild: Part-Based Matching With Bottom-Up Region Proposals
11	Supervised Descriptor Learning for Multi-Output Regression
12	A Statistical Model of Riemannian Metric Variation for Deformable Shape Analysis
13	Temporally Coherent Interpretations for Long Videos Using Pattern Theory
14	Line-Sweep: Cross-Ratio For Wide-Baseline Matching and 3D Reconstruction
15	Simplified Mirror-Based Camera Pose Computation via Rotation Averaging
16	On the Relationship Between Visual Attributes and Convolutional Networks
17	Saliency Detection by Multi-Context Deep Learning
18	DeepShape: Deep Learned Shape Descriptor for 3D Shape Matching and Retrieval
19	Bayesian Adaptive Matrix Factorization With Automatic Model Selection
20	Joint Action Recognition and Pose Estimation From Video
21	Fast Action Proposals for Human Action Detection and Search
22	Joint Multi-Feature Spatial Context for Scene Recognition on the Semantic Manifold
23	Large-Scale Damage Detection Using Satellite Imagery
24	A Novel Locally Linear KNN Model for Visual Recognition
25	Bilinear Random Projections for Locality-Sensitive Binary Codes
26	Combining Local Appearance and Holistic View: Dual-Source Deep Neural Networks for Human Pose Estimation
27	Superpixel Segmentation Using Linear Spectral Clustering
28	Person Count Localization in Videos From Noisy Foreground and Detections
29	Good Features to Track for Visual SLAM
30	Discovering States and Transformations in Image Collections
31	Generalized Deformable Spatial Pyramid: Geometry-Preserving Dense Correspondence Estimation
32	Classifier Adaptation at Prediction Time
33	Phase-Based Frame Interpolation for Video
34	Matching-CNN Meets KNN: Quasi-Parametric Human Parsing
35	Absolute Pose for Cameras Under Flat Refractive Interfaces
36	Protecting Against Screenshots: An Image Processing Approach
37	Pose-Conditioned Joint Angle Limits for 3D Human Pose Reconstruction
38	VisKE: Visual Knowledge Extraction and Question Answering by Visual Verification of Relation Phrases
39	A Graphical Model Approach for Matching Partial Signatures
40	From Captions to Visual Concepts and Back
41	Semi-Supervised Low-Rank Mapping Learning for Multi-Label Classification
42	ConceptLearner: Discovering Visual Concepts From Weakly Labeled Image Collections
43	Computationally Bounded Retrieval
44	Viewpoints and Keypoints
45	Discrete Hyper-Graph Matching
46	Rolling Shutter Motion Deblurring
47	Learning to Generate Chairs With Convolutional Neural Networks
48	Accurate Depth Map Estimation From a Lenslet Light Field Camera
49	Deep Semantic Ranking Based Hashing for Multi-Label Image Retrieval
50	Similarity Learning on an Explicit Polynomial Kernel Feature Map for Person Re-Identification
51	Learning to Propose Objects
52	Basis Mapping Based Boosting for Object Detection
53	Computing the Stereo Matching Cost With a Convolutional Neural Network
54	Recognize Complex Events From Static Images by Fusing Deep Channels
55	Multi-Feature Max-Margin Hierarchical Bayesian Model for Action Recognition
56	Model Recommendation: Generating Object Detectors From Few Samples
57	A Linear Least-Squares Solution to Elastic Shape-From-Template
58	Robust Large Scale Monocular Visual SLAM
59	Membership Representation for Detecting Block-Diagonal Structure in Low-Rank or Sparse Subspace Clustering
60	Bayesian Inference for Neighborhood Filters With Application in Denoising
61	Deep LAC: Deep Localization, Alignment and Classification for Fine-Grained Recognition
62	Unconstrained Realtime Facial Performance Capture
63	Blind Optical Aberration Correction by Exploring Geometric and Visual Priors
64	Ontological Supervision for Fine Grained Classification of Street View Storefronts
65	Finding Distractors In Images
66	From Image-Level to Pixel-Level Labeling With Convolutional Networks
67	Semantic Alignment of LiDAR Data at City Scale
68	Oriented Edge Forests for Boundary Detection
69	Query-Adaptive Late Fusion for Image Search and Person Re-Identification
70	Filtered Feature Channels for Pedestrian Detection
71	GRSA: Generalized Range Swap Algorithm for the Efficient Optimization of MRFs
72	PatchCut: Data-Driven Object Segmentation via Local Shape Transfer
73	Illumination and Reflectance Spectra Separation of a Hyperspectral Image Meets Low-Rank Matrix Factorization
74	Semantic Part Segmentation Using Compositional Model Combining Shape and Appearance
75	A Discriminative CNN Video Representation for Event Detection
76	24/7 Place Recognition by View Synthesis
77	Understanding Image Virality
78	Book2Movie: Aligning Video Scenes With Book Chapters
79	3D Model-Based Continuous Emotion Recognition
80	Learning to Rank in Person Re-Identification With Metric Ensembles
81	Making Better Use of Edges via Perceptual Grouping
82	Real-Time Joint Estimation of Camera Orientation and Vanishing Points
83	Sketch-Based 3D Shape Retrieval Using Convolutional Neural Networks
84	Salient Object Detection via Bootstrap Learning
85	Towards Open World Recognition
86	Data-Driven 3D Voxel Patterns for Object Category Recognition
87	3D ShapeNets: A Deep Representation for Volumetric Shapes
88	Robust Image Alignment With Multiple Feature Descriptors and Matching-Guided Neighborhoods
89	Pushing the Frontiers of Unconstrained Face Detection and Recognition: IARPA Janus Benchmark A
90	Depth From Shading, Defocus, and Correspondence Using Light-Field Angular Coherence
91	New Insights Into Laplacian Similarity Search
92	Feature-Independent Context Estimation for Automatic Image Annotation
93	Category-Specific Object Reconstruction From a Single Image
94	Active Sample Selection and Correction Propagation on a Gradually-Augmented Graph
95	Efficient and Accurate Approximations of Nonlinear Convolutional Networks
96	Ranking and Retrieval of Image Sequences From Multiple Paragraph Queries
97	Casual Stereoscopic Panorama Stitching
98	Superpixel Meshes for Fast Edge-Preserving Surface Reconstruction
99	Best-Buddies Similarity for Robust Template Matching
100	Superdifferential Cuts for Binary Energies
101	The S-Hock Dataset: Analyzing Crowds at the Stadium
102	Discriminant Analysis on Riemannian Manifold of Gaussian Distributions for Face Recognition With Image Sets
103	Texture Representations for Image and Video Synthesis
104	Shadow Optimization From Structured Deep Edge Detection
105	Total Variation Regularization of Shape Signals
106	Learning Similarity Metrics for Dynamic Scene Segmentation
107	Subspace Clustering by Mixture of Gaussian Regression
108	DASC: Dense Adaptive Self-Correlation Descriptor for Multi-Modal and Multi-Spectral Correspondence
109	In Defense of Color-Based Model-Free Tracking
110	Best of Both Worlds: Human-Machine Collaboration for Object Annotation
111	Robust Multiple Homography Estimation: An Ill-Solved Problem
112	Semi-Supervised Domain Adaptation With Subspace Learning for Visual Recognition
113	Articulated Motion Discovery Using Pairs of Trajectories
114	A Solution for Multi-Alignment by Transformation Synchronisation
115	A Convex Optimization Approach to Robust Fundamental Matrix Estimation
116	Simultaneous Pose and Non-Rigid Shape With Particle Dynamics
117	Semi-Supervised Learning With Explicit Relationship Regularization
118	Person Re-Identification by Local Maximal Occurrence Representation and Metric Learning
119	Joint Patch and Multi-Label Learning for Facial Action Unit Detection
120	Real-Time Visual Analysis of Microvascular Blood Flow for Critical Care

Tuesday June 9, 8:30am-10:00am

Images and Language	Multiple View Geometry
Ballrooms A,B,C	Rooms 302,304,306
Show and Tell: A Neural Image Caption Generator	*Reconstructing the World in Six Days (As Captured by the Yahoo 100 Million Image Dataset)*
Deep Visual-Semantic Alignments for Generating Image Descriptions	Joint Vanishing Point Extraction and Tracking
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description	Robust Camera Location Estimation by Convex Programming
Image Specificity	Efficient Globally Optimal Consensus Maximisation With Tree Search
Don't Just Listen, Use Your Imagination: Leveraging Visual Common Sense for Non-Visual Tasks	R6P - Rolling Shutter Absolute Camera Pose
Becoming the Expert - Interactive Multi-Class Machine Teaching	Building Proteins in a Day: Efficient 3D Molecular Reconstruction

Tuesday June 9, 10:00am-12:30pm

Poster Session
	Session 2A, Exhibit Hall A
Poster #	Title and Authors
1	JOTS: Joint Online Tracking and Segmentation
2	Gaze-Enabled Egocentric Video Summarization via Constrained Submodular Maximization
3	Sparse Depth Super Resolution
4	Efficient Illuminant Estimation for Color Constancy Using Grey Pixels
5	Can Humans Fly? Action Understanding With Multiple Classes of Actors
6	Reweighted Laplace Prior Based Hyperspectral Compressive Sensing for Unknown Sparsity
7	Class Consistent Multi-Modal Fusion With Binary Features
8	R6P - Rolling Shutter Absolute Camera Pose
9	Embedded Phase Shifting: Robust Phase Shifting With Embedded Signals
10	Shape and Light Directions From Shading and Polarization
11	3D Deep Shape Descriptor
12	Cross-Age Face Verification by Coordinating With Cross-Face Age Verification
13	Beyond Mahalanobis Metric: Cayley-Klein Metric Learning
14	From Dictionary of Visual Words to Subspaces: Locality-Constrained Affine Subspace Coding
15	FPA-CS: Focal Plane Array-Based Compressive Imaging in Short-Wave Infrared
16	BOLD - Binary Online Learned Descriptor For Efficient Image Matching
17	Defocus Deblurring and Superresolution for Time-of-Flight Depth Cameras
18	Burst Deblurring: Removing Camera Shake Through Fourier Burst Accumulation
19	SOM: Semantic Obviousness Metric for Image Quality Assessment
20	DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection
21	Efficient Globally Optimal Consensus Maximisation With Tree Search
22	Mind's Eye: A Recurrent Visual Representation for Image Caption Generation
23	Hierarchical Sparse Coding With Geometric Prior For Visual Geo-Location
24	P3.5P: Pose Estimation With Unknown Focal Length
25	Joint Vanishing Point Extraction and Tracking
26	Learning a Non-Linear Knowledge Transfer Model for Cross-View Action Recognition
27	Random Tree Walk Toward Instantaneous 3D Human Pose Estimation
28	Deep Hashing for Compact Binary Codes Learning
29	Completing 3D Object Shape From One Depth Image
30	Encoding Based Saliency Detection for Videos and Images
31	Online Sketching Hashing
32	Enriching Object Detection With 2D-3D Registration and Continuous Viewpoint Estimation
33	Representing 3D Texture on Mesh Manifolds for Retrieval and Recognition Applications
34	Saliency Propagation From Simple to Difficult
35	Learning an Efficient Model of Hand Shape Variation From Depth Images
36	On the Minimal Problems of Low-Rank Matrix Factorization
37	Symmetry-Based Text Line Detection in Natural Scenes
38	DevNet: A Deep Event Network for Multimedia Event Detection and Evidence Recounting
39	Learning to Detect Motion Boundaries
40	Improving Object Proposals With Multi-Thresholding Straddling Expansion
41	Visual Recognition by Counting Instances: A Multi-Instance Cardinality Potential Kernel
42	Unconstrained 3D Face Reconstruction
43	Becoming the Expert - Interactive Multi-Class Machine Teaching
44	Long-Term Recurrent Convolutional Networks for Visual Recognition and Description
45	Zero-Shot Object Recognition by Semantic Manifold Distance
46	Hyper-Class Augmented and Regularized Deep Learning for Fine-Grained Image Classification
47	Direct Structure Estimation for 3D Reconstruction
48	Global Supervised Descent Method
49	Robust Camera Location Estimation by Convex Programming
50	Practical Robust Two-View Translation Estimation
51	Learning From Massive Noisy Labeled Data for Image Classification
52	KL Divergence Based Agglomerative Clustering for Automated Vitiligo Grading
53	Robust Saliency Detection via Regularized Random Walks Ranking
54	Weakly Supervised Semantic Segmentation for Social Images
55	Image Specificity
56	A Multi-Plane Block-Coordinate Frank-Wolfe Algorithm for Training Structural SVMs With a Costly Max-Oracle
57	Web-Scale Training for Face Identification
58	Dynamically Encoded Actions Based on Spacetime Saliency
59	Three Viewpoints Toward Exemplar SVM
60	Visual Recognition by Learning From Web Data: A Weakly Supervised Domain Generalization Approach
61	Clustering of Static-Adaptive Correspondences for Deformable Object Tracking
62	Geo-Semantic Segmentation
63	Towards Unified Depth and Semantic Prediction From a Single Image
64	Towards Force Sensing From Vision: Observing Hand-Object Interactions to Infer Manipulation Forces
65	A MRF Shape Prior for Facade Parsing With Occlusions
66	Probability Occupancy Maps for Occluded Depth Images
67	Segment Based 3D Object Shape Priors
68	Shape-From-Template in Flatland
69	Understanding Tools: Task-Oriented Object Modeling, Learning and Recognition
70	Deep Roto-Translation Scattering for Object Classification
71	Non-Rigid Registration of Images With Geometric and Photometric Deformation by Using Local Affine Fourier-Moment Matching
72	Detector Discovery in the Wild: Joint Multiple Instance and Representation Learning
73	Deeply Learned Face Representations Are Sparse, Selective, and Robust
74	Unsupervised Visual Alignment With Similarity Graphs
75	Video Anomaly Detection and Localization Using Hierarchical Feature Representation and Gaussian Process Regression
76	Inferring 3D Layout of Building Facades From a Single Image
77	Evaluation of Output Embeddings for Fine-Grained Image Classification
78	Virtual View Networks for Object Reconstruction
79	Real-Time Coarse-to-Fine Topologically Preserving Segmentation
80	Supervised Mid-Level Features for Word Image Representation
81	Learning Lightness From Human Judgement on Relative Reflectance
82	Scene Classification With Semantic Fisher Vectors
83	Don't Just Listen, Use Your Imagination: Leveraging Visual Common Sense for Non-Visual Tasks
84	Co-Saliency Detection via Looking Deep and Wide
85	Adopting an Unconstrained Ray Model in Light-Field Cameras for 3D Shape Reconstruction
86	Towards 3D Object Detection With Bimodal Deep Boltzmann Machines Over RGBD Imagery
87	An Active Search Strategy for Efficient Object Class Detection
88	Geodesic Exponential Kernels: When Curvature and Linearity Conflict
89	Transformation-Invariant Convolutional Jungles
90	Exemplar SVMs as Visual Feature Encoders
91	Object Scene Flow for Autonomous Vehicles
92	Reflectance Hashing for Material Recognition
93	Joint Photo Stream and Blog Post Summarization and Exploration
94	Video Summarization by Learning Submodular Mixtures of Objectives
95	Building Proteins in a Day: Efficient 3D Molecular Reconstruction
96	Learning Descriptors for Object Recognition and 3D Pose Estimation
97	Image Partitioning Into Convex Polygons
98	Deep Visual-Semantic Alignments for Generating Image Descriptions
99	Unsupervised Learning of Complex Articulated Kinematic Structures Combining Motion and Skeleton Information
100	Elastic Functional Coding of Human Actions: From Vector-Fields to Latent Variables
101	Show and Tell: A Neural Image Caption Generator
102	Descriptor Free Visual Indoor Localization With Line Segments
103	Fixation Bank: Learning to Reweight Fixation Candidates
104	Deep Networks for Saliency Detection via Local Estimation and Global Search
105	Reflection Removal Using Ghosting Cues
106	A Dataset for Movie Description
107	Fast and Robust Hand Tracking Using Detection-Guided Optimization
108	Efficient SDP Inference for Fully-Connected CRFs Based on Low-Rank Decomposition
109	Discriminative Learning of Iteration-Wise Priors for Blind Deconvolution
110	Eye Tracking Assisted Extraction of Attentionally Important Objects From Videos
111	Multi-View Feature Engineering and Learning
112	Self Scaled Regularized Robust Regression
113	Simultaneous Feature Learning and Hash Coding With Deep Neural Networks
114	MatchNet: Unifying Feature and Metric Learning for Patch-Based Matching
115	*Reconstructing the World in Six Days (As Captured by the Yahoo 100 Million Image Dataset)*
116	Exact Bias Correction and Covariance Estimation for Stereo Vision
117	Computing Similarity Transformations From Only Image Correspondences
118	Image Segmentation in Twenty Questions
119	Interaction Part Mining: A Mid-Level Approach for Fine-Grained Action Recognition
120	Sparse Projections for High-Dimensional Binary Codes

Tuesday June 9, 2:00pm-3:30pm

Segmentation in Images and Video	3D Models and Images
Ballrooms A,B,C	Rooms 302,304,306
Causal Video Object Segmentation From Persistence of Occlusions	Picture: A Probabilistic Programming Language for Scene Perception
Semantic Object Segmentation via Detection in Weakly Labeled Video	Rent3D: Floor-Plan Priors for Monocular Layout Estimation
Fully Convolutional Networks for Semantic Segmentation	The Stitched Puppet: A Graphical Model of 3D Human Shape and Pose
Shape-Tailored Local Descriptors and Their Application to Segmentation and Tracking	3D Shape Estimation From 2D Landmarks: A Convex Relaxation Approach
Deep Filter Banks for Texture Recognition and Segmentation	Holistic 3D Scene Understanding From a Single Geo-Tagged Image
Active Learning for Structured Probabilistic Models With Histogram Approximation	Joint SFM and Detection Cues for Monocular 3D Localization in Road Scenes

Tuesday June 9, 3:30pm-6:00pm

Poster Session
	Session 2B, Exhibit Hall A
Poster #	Title and Authors
1	Hierarchically-Constrained Optical Flow
2	The k-Support Norm and Convex Envelopes of Cardinality and Rank
3	Matching Bags of Regions in RGBD images
4	Recurrent Convolutional Neural Network for Object Recognition
5	Feedforward Semantic Segmentation With Zoom-Out Features
6	The Aperture Problem for Refractive Motion
7	Saliency-Aware Geodesic Video Object Segmentation
8	DEEP-CARVING: Discovering Visual Attributes by Carving Deep Neural Nets
9	Rent3D: Floor-Plan Priors for Monocular Layout Estimation
10	Learning a Sequential Search for Landmarks
11	Fully Convolutional Networks for Semantic Segmentation
12	Deep Correlation for Matching Images and Text
13	Multi-Objective Convolutional Learning for Face Labeling
14	Deep Multiple Instance Learning for Image Classification and Auto-Annotation
15	Multi-Instance Object Segmentation With Occlusion Handling
16	Material Recognition in the Wild With the Materials in Context Database
17	Understanding Pedestrian Behaviors From Stationary Crowd Groups
18	Depth From Focus With Your Mobile Phone
19	Fusion Moves for Correlation Clustering
20	Second-Order Constrained Parametric Proposals and Sequential Search-Based Structured Prediction for Semantic Segmentation in RGB-D Images
21	Metric Imitation by Manifold Transfer for Efficient Vision Applications
22	The Stitched Puppet: A Graphical Model of 3D Human Shape and Pose
23	Scene Labeling With LSTM Recurrent Neural Networks
24	FAemb: A Function Approximation-Based Embedding Method for Image Retrieval
25	Automatically Discovering Local Visual Material Attributes
26	Depth Image Enhancement Using Local Tangent Plane Approximations
27	Video Co-Summarization: Video Summarization by Visual Co-Occurrence
28	Watch and Learn: Semi-Supervised Learning for Object Detectors From Video
29	Generalized Tensor Total Variation Minimization for Visual Data Recovery
30	Active Learning for Structured Probabilistic Models With Histogram Approximation
31	Image Parsing With a Wide Range of Classes and Scene-Level Context
32	Bayesian Sparse Representation for Hyperspectral Image Super Resolution
33	Semantic Object Segmentation via Detection in Weakly Labeled Video
34	Learning With Dataset Bias in Latent Subcategory Models
35	Project-Out Cascaded Regression With an Application to Face Alignment
36	Image Retrieval Using Scene Graphs
37	Unifying Holistic and Parts-Based Deformable Model Fitting
38	Small Instance Detection by Integer Programming on Object Density Maps
39	Motion Part Regularization: Improving Action Recognition via Trajectory Selection
40	Multi-Task Deep Visual-Semantic Embedding for Video Thumbnail Selection
41	Fine-Grained Visual Categorization via Multi-Stage Metric Learning
42	Saturation-Preserving Specular Reflection Separation
43	Joint SFM and Detection Cues for Monocular 3D Localization in Road Scenes
44	Fisher Vectors Meet Neural Networks: A Hybrid Classification Architecture
45	UniHIST: A Unified Framework for Image Restoration With Marginal Histogram Constraints
46	Human Action Segmentation With Hierarchical Supervoxel Consistency
47	Robust Manhattan Frame Estimation From a Single RGB-D Image
48	Learning to Segment Under Various Forms of Weak Supervision
49	Fast and Accurate Image Upscaling With Super-Resolution Forests
50	Light Field From Micro-Baseline Image Pair
51	Efficient ConvNet-Based Marker-Less Motion Capture in General Scenes With a Low Number of Cameras
52	Learning Scene-Specific Pedestrian Detectors Without Real Data
53	Deep Filter Banks for Texture Recognition and Segmentation
54	Multiple Random Walkers and Their Application to Image Cosegmentation
55	Beyond the Shortest Path : Unsupervised Domain Adaptation by Sampling Subspaces Along the Spline Flow
56	Spherical Embedding of Inlier Silhouette Dissimilarities
57	Semantics-Preserving Hashing for Cross-View Retrieval
58	Object Proposal by Multi-Branch Hierarchical Segmentation
59	Ambient Occlusion via Compressive Visibility Estimation
60	Shape-Tailored Local Descriptors and Their Application to Segmentation and Tracking
61	Scalable Object Detection by Filter Compression With Regularized Sparse Coding
62	An Improved Deep Learning Architecture for Person Re-Identification
63	Understanding Classifier Errors by Examining Influential Neighbors
64	Riemannian Coding and Dictionary Learning: Kernels to the Rescue
65	Scalable Structure From Motion for Densely Sampled Videos
66	Parsing Occluded People by Flexible Compositions
67	Joint Calibration of Ensemble of Exemplar SVMs
68	Holistic 3D Scene Understanding From a Single Geo-Tagged Image
69	A Large-Scale Car Dataset for Fine-Grained Categorization and Verification
70	DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection
71	Convolutional Feature Masking for Joint Object and Stuff Segmentation
72	A Fixed Viewpoint Approach for Dense Reconstruction of Transparent Objects
73	Low-Level Vision by Consensus in a Spatial Hierarchy of Regions
74	Line Drawing Interpretation in a Multi-View Context
75	Toward User-Specific Tracking by Detection of Human Shapes in Multi-Cameras
76	Intra-Frame Deblurring by Leveraging Inter-Frame Camera Motion
77	Salient Object Subitizing
78	Hierarchical-PEP Model for Real-World Face Recognition
79	The Common Self-Polar Triangle of Concentric Circles and Its Application to Camera Calibration
80	Taking a Deeper Look at Pedestrians
81	Learning to Segment Moving Objects in Videos
82	GMMCP Tracker: Globally Optimal Generalized Maximum Multi Clique Problem for Multiple Object Tracking
83	Learning Graph Structure for Multi-Label Image Classification via Clique Generation
84	Matrix Completion for Resolving Label Ambiguity
85	Video Magnification in Presence of Large Motions
86	Flying Objects Detection From a Single Moving Camera
87	Line-Based Multi-Label Energy Optimization for Fisheye Image Rectification and Calibration
88	Adaptive Eye-Camera Calibration for Head-Worn Devices
89	Modeling Object Appearance Using Context-Conditioned Component Analysis
90	Displets: Resolving Stereo Ambiguities Using Object Knowledge
91	Time-to-Contact From Image Intensity
92	Transferring a Semantic Representation for Person Re-Identification and Search
93	Robust Video Segment Proposals With Painless Occlusion Handling
94	Face Alignment Using Cascade Gaussian Process Regression Trees
95	Regularizing Max-Margin Exemplars by Reconstruction and Generative Models
96	A Fast Algorithm for Elastic Shape Distances Between Closed Planar Curves
97	Reflection Removal for In-Vehicle Black Box Videos
98	Tree Quantization for Large-Scale Similarity Search and Classification
99	Integrating Parametric and Non-Parametric Models For Scene Labeling
100	Mining Semantic Affordances of Visual Object Categories
101	Causal Video Object Segmentation From Persistence of Occlusions
102	Multiple Instance Learning for Soft Bags via Top Instances
103	Multiclass Semantic Video Segmentation With Object-Level Active Inference
104	Effective Face Frontalization in Unconstrained Images
105	Action Recognition With Trajectory-Pooled Deep-Convolutional Descriptors
106	Weakly Supervised Localization of Novel Objects Using Appearance Transfer
107	First-Person Pose Recognition Using Egocentric Workspaces
108	Simultaneous Time-of-Flight Sensing and Photometric Stereo With a Single ToF Sensor
109	Active Learning and Discovery of Object Categories in the Presence of Unnameable Instances
110	Learning to Compare Image Patches via Convolutional Neural Networks
111	Watch-n-Patch: Unsupervised Understanding of Actions and Relations
112	Optimal Graph Learning With Partial Tags and Multiple Features for Image and Video Annotation
113	DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection
114	Picture: A Probabilistic Programming Language for Scene Perception
115	Exploiting Uncertainty in Regression Forests for Accurate Camera Relocalization
116	Fusing Subcategory Probabilities for Texture Classification
117	Video Event Recognition With Deep Hierarchical Context Model
118	Object-Based RGBD Image Co-Segmentation With Mutex Constraint
119	Associating Neural Word Embeddings With Deep Image Representations Using Fisher Vectors
120	3D Shape Estimation From 2D Landmarks: A Convex Relaxation Approach

Wednesday June 10, 8:30am-10:00am

Action and Event Recognition	Computational Photography
Ballrooms A,B,C	Rooms 302,304,306
How Many Bits Does it Take for a Stimulus to Be Salient?	Visual Vibrometry: Estimating Material Properties From Small Motion in Video
Deeply Learned Attributes for Crowded Scene Understanding	Recovering Inner Slices of Translucent Objects by Multi-Frequency Illumination
Joint Inference of Groups, Events and Human Roles in Aerial Videos	Fast Bilateral-Space Stereo for Synthetic Defocus
Modeling Video Evolution for Action Recognition	Simultaneous Video Defogging and Stereo Reconstruction
Space-Time Tree Ensemble for Action Recognition	One-Day Outdoor Photometric Stereo via Skylight Estimation
Social Saliency Prediction

Wednesday June 10, 10:30am-12:25pm

Plenary Speakers
Ballrooms A,B,C
What's Wrong with Deep Learning? Yann LeCun Facebook AI Research & New York University Deep learning methods have had a profound impact on a number of areas in recent years, including natural image understanding and speech recognition. Other areas seem on the verge of being similarly impacted, notably natural language processing, biomedical image analysis, and the analysis of sequential signals in a variety of application domains. But deep learning systems, as they exist today, have many limitations. First, they lack mechanisms for reasoning, search, and inference. Complex and/or ambiguous inputs require deliberate reasoning to arrive at a consistent interpretation. Producing structured outputs, such as a long text, or a label map for image segmentation, require sophisticated search and inference algorithms to satisfy complex sets of constraints. One approach to this problem is to marry deep learning with structured prediction (an idea first presented at CVPR 1997). While several deep learning systems augmented with structured prediction modules trained end to end have been proposed for OCR, body pose estimation, and semantic segmentation, new concepts are needed for tasks that require more complex reasoning. Second, they lack short-term memory. Many tasks in natural language understanding, such as question-answering, require a way to temporarily store isolated facts. Correctly interpreting events in a video and being able to answer questions about it requires remembering abstract representations of what happens in the video. Deep learning systems, including recurrent nets, are notoriously inefficient at storing temporary memories. This has led researchers to propose neural nets systems augmented with separate memory modules, such as LSTM, Memory Networks, Neural Turing Machines, and Stack-Augmented RNN. While these proposals are interesting, new ideas are needed. Lastly, they lack the ability to perform unsupervised learning. Animals and humans learn most of the structure of the perceptual world in an unsupervised manner. While the interest of the ML community in neural nets was revived in the mid-2000s by progress in unsupervised learning, the vast majority of practical applications of deep learning have used purely supervised learning. There is little doubt that future progress in computer vision will require breakthroughs in unsupervised learning, particularly for video understanding, But what principles should unsupervised learning be based on? Preliminary works in each of these areas pave the way for future progress in image and video understanding. Biography: Yann LeCun is Director of AI Research at Facebook, and Silver Professor of Data Science, Computer Science, Neural Science, and Electrical Engineering at New York University, affiliated with the NYU Center for Data Science, the Courant Institute of Mathematical Science, the Center for Neural Science, and the Electrical and Computer Engineering Department. He received the Electrical Engineer Diploma from Ecole Superieure d'Ingenieurs en Electrotechnique et Electronique (ESIEE), Paris in 1983, and a PhD in Computer Science from Universite Pierre et Marie Curie (Paris) in 1987. After a postdoc at the University of Toronto, he joined AT&T Bell Laboratories in Holmdel, NJ in 1988. He became head of the Image Processing Research Department at AT&T Labs-Research in 1996, and joined NYU as a professor, after a brief period as a Fellow of the NEC Research Institute in Princeton. He directed NYU's initiative in data science and became the founding director of the NYU Center for Data Science. He was named Director of AI Research at Facebook in late and retains a part-time position on the NYU faculty. His current interests include AI, machine learning, computer perception, mobile robotics, and computational neuroscience. He has published over 180 technical papers and book chapters on these topics as well as on neural networks, handwriting recognition, image processing and compression, and on dedicated circuits and architectures for computer perception. The character recognition technology he developed at Bell Labs is used by several banks around the world to read checks and was reading between 10 and 20% of all the checks in the US in the early 2000s. His image compression technology, called DjVu, is used by hundreds of web sites and publishers and millions of users to access scanned documents on the Web. Since the late 80's he has been working on deep learning methods, particularly the convolutional network model, which is the basis of many products and services deployed by companies such as Facebook, Google, Microsoft, Baidu, IBM, NEC, AT&T and others for image and video understanding, document recognition, human-computer interaction, and speech recognition. LeCun has been on the editorial board of IJCV, IEEE PAMI, and IEEE Trans. Neural Networks, was program chair of CVPR'06, and is chair of ICLR. He is on the science advisory board of Institute for Pure and Applied Mathematics, and has advised many large and small companies about machine learning technology, including several startups he co-founded. He is the lead faculty at NYU for the Moore-Sloan Data Science Environment, a $36M initiative in collaboration with UC Berkeley and University of Washington to develop data-driven methods in the sciences. He is the recipient of the IEEE Neural Network Pioneer Award.
Reverse Engineering the Human Visual System Jack L. Gallant University of California at Berkeley The human brain is the most sophisticated image processing system known, capable of impressive feats of recognition and discrimination under challenging natural conditions. Reverse-engineering the brain might enable us to design artificial systems with the same capabilities. My laboratory uses a data-driven system identification approach to tackle this reverse-engineering problem. Our approach consists of four broad stages. First, we use functional MRI to measure brain activity while people watch naturalistic movies. We divide these data into two parts, one use to fit models and one for testing model predictions. Second, we use a system identification framework (based on multiple linearizing feature spaces) to model activity measured at each point in the brain. Third, we inspect the most accurate models to understand how the brain represents low-, mid- and high-level information in the movies. Finally, we use the estimated models to decode brain activity, reconstructing the structural and semantic content in the movies. Any effort to reverse-engineer the brain is inevitably limited by the spatial and temporal resolution of brain measurements, and at this time the resolution of human brain measurements is relatively poor. Still, as measurement technology progresses this framework could inform development of biologically-inspired computer vision systems, and it could aid in development of practical new brain reading technologies. Biography: Jack Gallant is Chancellor's Professor of Psychology at the University of California at Berkeley. He is affiliated with the graduate programs in Bioengineering, Biophysics, Neuroscience and Vision Science. He received his Ph.D. from Yale University and did post-doctoral work at the California Institute of Technology and Washington University Medical School. His research program focuses on computational modeling of the human brain. These models accurately describe how the brain encodes information during complex, naturalistic tasks, and they show how information about the external and internal world are mapped systematically across the surface of the cerebral cortex. These models can also be used to decode information in the brain in order to reconstruct mental experiences. Gallant's brain decoding algorithm was one of Times Magazine's Inventions of the Year, and he appears frequently on radio and television. Further information about ongoing work in the Gallant lab, links to talks and papers, and links to an online interactive brain viewer.

Plenary Speakers

Ballrooms A,B,C

What's Wrong with Deep Learning?
Yann LeCun
Facebook AI Research & New York University

Deep learning methods have had a profound impact on a number of areas in recent years, including natural image understanding and speech recognition. Other areas seem on the verge of being similarly impacted, notably natural language processing, biomedical image analysis, and the analysis of sequential signals in a variety of application domains. But deep learning systems, as they exist today, have many limitations.

First, they lack mechanisms for reasoning, search, and inference. Complex and/or ambiguous inputs require deliberate reasoning to arrive at a consistent interpretation. Producing structured outputs, such as a long text, or a label map for image segmentation, require sophisticated search and inference algorithms to satisfy complex sets of constraints. One approach to this problem is to marry deep learning with structured prediction (an idea first presented at CVPR 1997). While several deep learning systems augmented with structured prediction modules trained end to end have been proposed for OCR, body pose estimation, and semantic segmentation, new concepts are needed for tasks that require more complex reasoning.

Second, they lack short-term memory. Many tasks in natural language understanding, such as question-answering, require a way to temporarily store isolated facts. Correctly interpreting events in a video and being able to answer questions about it requires remembering abstract representations of what happens in the video. Deep learning systems, including recurrent nets, are notoriously inefficient at storing temporary memories. This has led researchers to propose neural nets systems augmented with separate memory modules, such as LSTM, Memory Networks, Neural Turing Machines, and Stack-Augmented RNN. While these proposals are interesting, new ideas are needed.

Lastly, they lack the ability to perform unsupervised learning. Animals and humans learn most of the structure of the perceptual world in an unsupervised manner. While the interest of the ML community in neural nets was revived in the mid-2000s by progress in unsupervised learning, the vast majority of practical applications of deep learning have used purely supervised learning. There is little doubt that future progress in computer vision will require breakthroughs in unsupervised learning, particularly for video understanding, But what principles should unsupervised learning be based on?

Preliminary works in each of these areas pave the way for future progress in image and video understanding.

Biography:

Yann LeCun is Director of AI Research at Facebook, and Silver Professor of Data Science, Computer Science, Neural Science, and Electrical Engineering at New York University, affiliated with the NYU Center for Data Science, the Courant Institute of Mathematical Science, the Center for Neural Science, and the Electrical and Computer Engineering Department.

He received the Electrical Engineer Diploma from Ecole Superieure d'Ingenieurs en Electrotechnique et Electronique (ESIEE), Paris in 1983, and a PhD in Computer Science from Universite Pierre et Marie Curie (Paris) in 1987. After a postdoc at the University of Toronto, he joined AT&T Bell Laboratories in Holmdel, NJ in 1988. He became head of the Image Processing Research Department at AT&T Labs-Research in 1996, and joined NYU as a professor, after a brief period as a Fellow of the NEC Research Institute in Princeton. He directed NYU's initiative in data science and became the founding director of the NYU Center for Data Science. He was named Director of AI Research at Facebook in late and retains a part-time position on the NYU faculty.

His current interests include AI, machine learning, computer perception, mobile robotics, and computational neuroscience. He has published over 180 technical papers and book chapters on these topics as well as on neural networks, handwriting recognition, image processing and compression, and on dedicated circuits and architectures for computer perception. The character recognition technology he developed at Bell Labs is used by several banks around the world to read checks and was reading between 10 and 20% of all the checks in the US in the early 2000s. His image compression technology, called DjVu, is used by hundreds of web sites and publishers and millions of users to access scanned documents on the Web. Since the late 80's he has been working on deep learning methods, particularly the convolutional network model, which is the basis of many products and services deployed by companies such as Facebook, Google, Microsoft, Baidu, IBM, NEC, AT&T and others for image and video understanding, document recognition, human-computer interaction, and speech recognition.

LeCun has been on the editorial board of IJCV, IEEE PAMI, and IEEE Trans. Neural Networks, was program chair of CVPR'06, and is chair of ICLR. He is on the science advisory board of Institute for Pure and Applied Mathematics, and has advised many large and small companies about machine learning technology, including several startups he co-founded. He is the lead faculty at NYU for the Moore-Sloan Data Science Environment, a $36M initiative in collaboration with UC Berkeley and University of Washington to develop data-driven methods in the sciences. He is the recipient of the IEEE Neural Network Pioneer Award.

Reverse Engineering the Human Visual System
Jack L. Gallant
University of California at Berkeley

The human brain is the most sophisticated image processing system known, capable of impressive feats of recognition and discrimination under challenging natural conditions. Reverse-engineering the brain might enable us to design artificial systems with the same capabilities. My laboratory uses a data-driven system identification approach to tackle this reverse-engineering problem. Our approach consists of four broad stages. First, we use functional MRI to measure brain activity while people watch naturalistic movies. We divide these data into two parts, one use to fit models and one for testing model predictions. Second, we use a system identification framework (based on multiple linearizing feature spaces) to model activity measured at each point in the brain. Third, we inspect the most accurate models to understand how the brain represents low-, mid- and high-level information in the movies. Finally, we use the estimated models to decode brain activity, reconstructing the structural and semantic content in the movies. Any effort to reverse-engineer the brain is inevitably limited by the spatial and temporal resolution of brain measurements, and at this time the resolution of human brain measurements is relatively poor. Still, as measurement technology progresses this framework could inform development of biologically-inspired computer vision systems, and it could aid in development of practical new brain reading technologies.

Biography:

Jack Gallant is Chancellor's Professor of Psychology at the University of California at Berkeley. He is affiliated with the graduate programs in Bioengineering, Biophysics, Neuroscience and Vision Science. He received his Ph.D. from Yale University and did post-doctoral work at the California Institute of Technology and Washington University Medical School. His research program focuses on computational modeling of the human brain. These models accurately describe how the brain encodes information during complex, naturalistic tasks, and they show how information about the external and internal world are mapped systematically across the surface of the cerebral cortex. These models can also be used to decode information in the brain in order to reconstruct mental experiences. Gallant's brain decoding algorithm was one of Times Magazine's Inventions of the Year, and he appears frequently on radio and television. Further information about ongoing work in the Gallant lab, links to talks and papers, and links to an online interactive brain viewer.

Wednesday June 10, 2:00pm-3:30pm

Learning and Matching Local Features	Image and Video Processing and Restoration
Ballrooms A,B,C	Rooms 302,304,306
Domain-Size Pooling in Local Descriptors: DSP-SIFT	Generalized Video Deblurring for Dynamic Scenes
Learning Deep Representations for Ground-to-Aerial Geolocalization	Approximate Nearest Neighbor Fields in Video
Understanding Deep Image Representations by Inverting Them	Single Image Super-Resolution From Transformed Self-Exemplars
Situational Object Boundary Detection	L0TV: A New Method for Image Restoration in the Presence of Impulse Noise
Fast 2D Border Ownership Assignment	On Learning Optimized Reaction Diffusion Processes for Effective Image Restoration
A Flexible Tensor Block Coordinate Ascent Scheme for Hypergraph Matching	Fast and Flexible Convolutional Sparse Coding

Wednesday June 10, 3:30pm-6:00pm

Poster Session
	Session 3B, Exhibit Hall A
Poster #	Title and Authors
1	3D All The Way: Semantic Segmentation of Urban Scenes From Start to End in 3D
2	Fast Bilateral-Space Stereo for Synthetic Defocus
3	Large-Scale and Drift-Free Surface Reconstruction Using Online Subvolume Registration
4	Fast Randomized Singular Value Thresholding for Nuclear Norm Minimization
5	LMI-Based 2D-3D Registration: From Uncalibrated Images to Euclidean Scene
6	Clique-Graph Matching by Preserving Global & Local Structure
7	Appearance-Based Gaze Estimation in the Wild
8	One-Day Outdoor Photometric Stereo via Skylight Estimation
9	A New Retraction for Accelerating the Riemannian Three-Factor Low-Rank Matrix Completion Algorithm
10	Heteroscedastic Max-Min Distance Analysis
11	Sparse Composite Quantization
12	Sparse Representation Classification With Manifold Constraints Transfer
13	CIDEr: Consensus-Based Image Description Evaluation
14	Joint Inference of Groups, Events and Human Roles in Aerial Videos
15	Photometric Stereo With Near Point Lighting: A Solution by Mesh Deformation
16	Efficient Label Collection for Unlabeled Image Datasets
17	Separating Objects and Clutter in Indoor Scenes
18	FaLRR: A Fast Low Rank Representation Solver
19	Simulating Makeup Through Physics-Based Manipulation of Intrinsic Image Layers
20	Correlation Filters With Limited Boundaries
21	Shape-Based Automatic Detection of a Large Number of 3D Facial Landmarks
22	Material Classification With Thermal Imagery
23	Deeply Learned Attributes for Crowded Scene Understanding
24	Learning To Look Up: Realtime Monocular Gaze Correction Using Machine Learning
25	Background Subtraction via Generalized Fused Lasso Foreground Modeling
26	Mirror, Mirror on the Wall, Tell Me, Is the Error Small?
27	Beyond Short Snippets: Deep Networks for Video Classification
28	segDeepM: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection
29	Situational Object Boundary Detection
30	Real-Time 3D Head Pose and Facial Landmark Estimation From Depth Images Using Triangular Surface Patch Features
31	Aligning 3D Models to RGB-D Images of Cluttered Scenes
32	A Stable Multi-Scale Kernel for Topological Machine Learning
33	The Treasure Beneath Convolutional Layers: Cross-Convolutional-Layer Pooling for Image Classification
34	Face Video Retrieval With Image Query via Hashing Across Euclidean Space and Riemannian Manifold
35	EgoSampling: Fast-Forward and Stereo for Egocentric Videos
36	Social Saliency Prediction
37	Beyond Principal Components: Deep Boltzmann Machines for Face Modeling
38	Statistical Inference Models for Image Datasets With Systematic Variations
39	Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues
40	Superpixel-Based Video Object Segmentation Using Perceptual Organization and Location Prior
41	Robust Image Filtering Using Joint Static and Dynamic Guidance
42	Solving Multiple Square Jigsaw Puzzles With Missing Pieces
43	A Dynamic Convolutional Layer for Short Range Weather Prediction
44	SWIFT: Sparse Withdrawal of Inliers in a First Trial
45	VIP: Finding Important People in Images
46	Dataset Fingerprints: Exploring Image Collections Through Data Mining
47	Transport-Based Single Frame Super Resolution of Very Low Resolution Face Images
48	3D Reconstruction in the Presence of Glasses by Acoustic and Stereo Fusion
49	Deep Sparse Representation for Robust Image Registration
50	Real-Time Part-Based Visual Tracking via Adaptive Correlation Filters
51	Beyond Spatial Pooling: Fine-Grained Representation Learning in Multiple Domains
52	HC-Search for Structured Prediction in Computer Vision
53	Revisiting Kernelized Locality-Sensitive Hashing for Improved Large-Scale Image Retrieval
54	High-Speed Hyperspectral Video Acquisition With a Dual-Camera Architecture
55	More About VLAD: A Leap From Euclidean to Riemannian Manifolds
56	Camera Intrinsic Blur Kernel Estimation: A Reliable Framework
57	Classifier Learning With Hidden Information
58	Single Target Tracking Using Adaptive Clustered Decision Trees and Dynamic Multi-Level Appearance Models
59	Simultaneous Video Defogging and Stereo Reconstruction
60	Face Alignment by Coarse-to-Fine Shape Searching
61	Learning Deep Representations for Ground-to-Aerial Geolocalization
62	Unsupervised Simultaneous Orthogonal Basis Clustering Feature Selection
63	Space-Time Tree Ensemble for Action Recognition
64	Subgraph Decomposition for Multi-Target Tracking
65	Understanding Image Structure via Hierarchical Shape Parsing
66	Coarse-To-Fine Region Selection and Matching
67	Label Consistent Quadratic Surrogate Model for Visual Saliency Prediction
68	Subgraph Matching Using Compactness Prior for Robust Feature Correspondence
69	Pedestrian Detection Aided by Deep Learning Semantic Tasks
70	Multihypothesis Trajectory Analysis for Robust Visual Tracking
71	Domain-Size Pooling in Local Descriptors: DSP-SIFT
72	Object Detection by Labeling Superpixels
73	Fast 2D Border Ownership Assignment
74	From Single Image Query to Detailed 3D Reconstruction
75	Fast and Flexible Convolutional Sparse Coding
76	Iteratively Reweighted Graph Cut for Multi-Label MRFs With Non-Convex Priors
77	Pairwise Geometric Matching for Large-Scale Object Retrieval
78	Deep Convolutional Neural Fields for Depth Estimation From a Single Image
79	Data-Driven Sparsity-Based Restoration of JPEG-Compressed Images in Dual Transform-Pixel Domain
80	TVSum: Summarizing Web Videos Using Titles
81	Understanding Deep Image Representations by Inverting Them
82	Single Image Super-Resolution From Transformed Self-Exemplars
83	Constrained Planar Cuts - Object Partitioning for Point Clouds
84	A Weighted Sparse Coding Framework for Saliency Detection
85	Handling Motion Blur in Multi-Frame Super-Resolution
86	Approximate Nearest Neighbor Fields in Video
87	Inverting RANSAC: Global Model Detection via Inlier Rate Estimation
88	Robust Multi-Image Based Blind Face Hallucination
89	On Learning Optimized Reaction Diffusion Processes for Effective Image Restoration
90	A Flexible Tensor Block Coordinate Ascent Scheme for Hypergraph Matching
91	TILDE: A Temporally Invariant Learned DEtector
92	A Maximum Entropy Feature Descriptor for Age Invariant Face Recognition
93	Sense Discovery via Co-Clustering on Images and Text
94	An Approximate Shading Model for Object Relighting
95	Deep Domain Adaptation for Describing People Based on Fine-Grained Clothing Attributes
96	A Convolutional Neural Network Cascade for Face Detection
97	Visual Vibrometry: Estimating Material Properties From Small Motion in Video
98	Jointly Learning Heterogeneous Features for RGB-D Activity Recognition
99	Convolutional Neural Networks at Constrained Time Cost
100	Fine-Grained Histopathological Image Analysis via Robust Segmentation and Large-Scale Retrieval
101	L0TV: A New Method for Image Restoration in the Presence of Impulse Noise
102	Modeling Video Evolution for Action Recognition
103	Long-Term Correlation Tracking
104	Joint Tracking and Segmentation of Multiple Targets
105	RGBD-Fusion: Real-Time High Precision Depth Recovery
106	Modeling Deformable Gradient Compositions for Single-Image Super-Resolution
107	Generalized Video Deblurring for Dynamic Scenes
108	Active Pictorial Structures
109	Ego-Surfing First-Person Videos
110	Visual Saliency Based on Multiscale Deep Features
111	Recovering Inner Slices of Translucent Objects by Multi-Frequency Illumination
112	Local High-Order Regularization on Data Manifolds
113	Fine-Grained Classification of Pedestrians in Video: Benchmark and State of the Art
114	Curriculum Learning of Multiple Tasks
115	How Many Bits Does it Take for a Stimulus to Be Salient?
116	Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction
117	SOLD: Sub-Optimal Low-rank Decomposition for Efficient Video Segmentation
118	On the Appearance of Translucent Edges
119	On Pairwise Costs for Network Flow Multi-Object Tracking
120	Fine-Grained Recognition Without Part Annotations
121	Robust Reconstruction of Indoor Scenes

Pamitc

CVPR15