Notable References
- Breakthroughs in matching and recommendation algorithms by alibaba
- Caffe2
- Crieto dataset
- Deep bench
- Mlperf
- Movielens 20m dataset
- Pytorch
- TensorFlow: A system for large-scale machine learning
- Fathom: Reference workloads for modern deep learning methods
- Deep Speech 2: End-to-end speech recognition in English and Mandarin
- Exploring neural transducers for end-to-end speech recognition
- Jigsaw: Scalable software-defined caches
- Lessons learned at Instagram Stories and Feed Machine Learning
- Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks
- Dadiannao: A machine-learning supercomputer
- Wide & deep learning for recommender systems
- Notes from the AI frontier insights from hundreds of use cases
- Serving DNNs in real time at datacenter scale with Project Brainwave
- Empirical evaluation of gated recurrent neural networks on sequence modeling
- Dawnbench: An end-to-end deep learning benchmark and competition
- Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1
- Deep neural networks for YouTube recommendations
- The tail at scale
- Coordinated control of multiple prefetchers in multi-core systems
- Bandana: Using non-volatile memory for storing deep learning models
- The Netflix Recommender System: Algorithms, Business Value, and Innovation
- Accurate, large minibatch SGD: Training Imagenet in 1 hour
- Deep learning with limited numerical precision
- EIE: Efficient inference engine on compressed deep neural network
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
- Deep Speech: Scaling up end-to-end speech recognition
- Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective
- Deep residual learning for image recognition
- Neural Collaborative Filtering
- High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches
- In-Datacenter Performance Analysis of a Tensor Processing Unit
- Stripes: Bit-serial deep neural network computing
- Profiling a warehouse-scale computer
- RecNMP: Accelerating personalized recommendation with near-memory processing
- TensorDIMM: A practical near-memory processing architecture for embeddings and tensor operations in deep learning
- Whare-Map: Heterogeneity in homogeneous warehouse-scale computers
- MLPerf Training Benchmark
- On the dimensionality of embeddings for sparse features and data
- Deep Learning Recommendation Model for Personalization and Recommendation Systems
- Deep Learning Recommendation Model for Personalization and Recommendation Systems
- Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications
- Minerva: Enabling low-power, highly-accurate deep neural network accelerators
- You Only Look Once: Unified, Real-Time Object Detection
- Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor
- Simultaneous multithreading: Maximizing on-chip parallelism
- Use Cases of Recommendation Systems in Current Applications and Methods
- Deep & Cross Network for Ad Click Predictions
- Machine Learning at Facebook: Understanding Inference at the Edge
- PACMan: Prefetch-aware cache management for high performance caching
- Characterization and dynamic mitigation of intra-application cache interference
- Personalized recommendation systems: Five hot research topics you must know
- OpenRec: A modular framework for extensible and adaptable recommendation algorithms
- Imagenet training in minutes
- Cambricon-X: An accelerator for sparse neural networks
- Benchmarking and analyzing deep neural network training