Neural Networks and Deep Learning methods continue to intrigue the community for the consistent performance they continue to provide. As the architectures grow increasingly complex, the difficulty to discern the reasons for such performance increase exponentially. The trade-off between computational cost and accuracy, hyper-parameter tuning etc. are significant issues for us to understand. This will help build the scientific temper behind an amazing machine that began its journey from perceptron learning. Besides state-of-the-art architectures, the course will offer unique insight by fusing advanced metaheuristics with Deep Learning methods. The course will dive deep into origin of activation functions, explore loss functions, back-propagation and extraction of features for parsimonious computing. The focus is essentially on exploiting fundamental concepts to arrive at elegant, accurate and computationally cheaper solutions.
The approach to the course will fulfill this ambitious goal.

Day-01: Foundations for deep learning : Optimization principles

Stochastic gradient descent for loss, momentum and interpreting MLE in “derivative-centered” optimization, quick recap on back-propagation, “derivative-free” optimization, gentle introduction to PSO and Quantum PSO; PSO/QPSO as viable alternatives to gradient learning, establishing equivalence between GD and PSO, equivalence between Stochastic GD and QPSO, accuracy-convergence rate tradeoff, Moore-Penrose inverse approach in back-propagation– some benchmark problems and benchmark functions for optimization, performance measures- Understanding the concept of cross-validation/ holdout, confusion matrices.

Day-02: Graduating from linearly separable data to non-linearity: Verification exercise

A standard DL architecture with loss function and soft-max layer, hidden layers:  Underlying mathematical framework; What is it and why does it work the way it works? Rationale behind hyperparameter tuning
How can we compute large, adaptive learning rates (and its importance)? Analytical insights to the loss function; loss functions and learning rate computation, how can we get the system to perform better, faster?
Compare with fixed learning rates, performance comparison on DL architecture with several loss functions-MSE, Hausdorf, binary cross-entropy, loss function for the multi-class classification, Adaptive RMS prop, experiments with data- hands on session, Adaptive LR on Resnet; QUIZ

Day-03: What do the networks ‘see’? Understanding convergence

How do we obtain the adaptive learning rate formula? Deep dive into analytical properties of the loss function, Mean Value theorem and applications, Lipschtiz continuity, Regularization, Lipschitz and MVT, sample derivation of adaptive learning rate, EVOLUTION TO cross-entropy for multi-class problems, Performance gain and system utilization metrics, hands on exploration on covert data, PHL-EC data; QUIZ

Day-04: “Special” Activation functions

Gentle recap of ordinary differential equations, Pickard condition, Banach spaces, contraction mapping principle, Activation function as solution to ODE, SBAF- a “special” activation function, power series expansion of SBAF, sigmoid as special case of SBAF, SBAF as fixed point, gentle introduction to stability and first return maps of SBAF fixed points, parameter “fixing” through stability theory, hands on exploration on SDSS-GALEX, PHL-EC data; QUIZ

Day-05: Keep the design simple: more on Activation functions

Gradient training on parameters in SBAF to match values obtained from Pickard iteration, Relating SBAF to Binary Logistic Regression: Regression under Uncertainty, Cybenko Approximation for SBAF, understanding the ReLu and approximations, Limitations of ReLU and Sigmoid and improvements by special sibling units- A-ReLU and SBAF; Approximating ReLU- a continuous sibling-AReLU, Error approximation and estimation of A-ReLU parameters, Differentiability of A-ReLU and justification for A-ReLU to be an independent activation unit, representational power of SBAF and A-ReLU, system utility benchmarking comparisons, OPEN BOOK WRITTEN TEST

Evaluation and Assessment

Assessment will be based on the hands-on assignments throughout the course (assigned on Day 1 through Day 4) a Quiz component on (Day 2 through Day 4) and a written test on the Day 5.

Quizzes – 10 points (x 4 days) = 40 points
Assignments – 10 points (x4 days) = 40 points
Written Test – 20 points (Day 5) = 10 points
Total grade = 100 points