Posts

Showing posts from December, 2019

Deep Leakage from Gradients

Venue: NeurIPS 2019 Authors: Ligeng Zhu, Zhijian Liu, and Song Han Introduction. As deep learning is being used across a variety of fields and at scale, new problems are emerging. In the case of distributed learning were sharing training data is not an option, different participants perform a local training on local data with a shared global model, and communicate to other participants just the gradients. The gradients are then averaged and applied to the global model. This technique is also very similar to federated learning. In such a setting, a malicious attacker can pry on the gradients and try to extract information about the data based on which the gradients were produced. This paper proposes an attack, called deep leakage from gradients (DLG), which can recover pixel-level information for image classification tasks, and token-level information for text NLP tasks just from the gradients. They also propose two defense techniques that their attack cannot break. Deep Leakage ...

Sparse Tensor Core: Algorithm and Hardware Co-Design for Vector-wise Sparse Neural Networks on Modern GPUs

Authors : Maohua Zhu, Tao Zhang, Zhenyu Gu, Yuan Xie Venue : MICRO 2019 Introduction. The existence of sparsity in weights and activations of neural networks is common knowledge. Lots of works have been proposed to exploit the inherent sparsity in neural networks in order to achieve speed-up and energy gains. All the existing sparsity techniques can be classified into two categories; 1. generic or fine-grained sparsity, and 2. unified or coarse-grained sparsity. Generic sparsity achieves very high sparsity with no to little loss of accuracy, but suffers from irregular memory accesses, thereby resulting in a negative speed-up. Unified sparsity, also called structural sparsity, has a more structured approach in sparsifying NNs, improves performance, but impacts accuracy negatively. From a GPU standpoint, generic sparsity is 18% slower than its dense counterpart, and CANNOT make use of the modern Tensor Cores. Unified sparsity, on the other hand,  provides a 50% speedup compared...

Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Netowork Compiler

Authors: Yu Ji, Youhui Zhang, Wenguang Chen, Yuan Xie Venue: ASPLOS 2018 With the machine learning community trying to push the limits of neural networks on one hand, and the architecture community proposing their own constraints and data-flows to accelerate neural networks on the other hand, this paper tries to bridge the gap between the two communities by proposing a neural network compiler. The main aim of this paper is to run a given neural network on a given hardware, no matter what the constraints are. They achieve this by modelling the target NN as a computational graph, restructuring it based on the constraints of the target architecture, and fine-tuning the graph to minimise accuracy loss. One of the main conflicts between the NN and the hardware is the precision of inputs. This paper solves this issue by using an autoencoder network that produces the low-precision representation of the inputs. The accuracy loss incurred in translating  to low-precision values can be...

Scaling Datacenter Accelerators with Computation Reuse Architectures

Authors: Adi Fuchs, David Wentzalff. Princeton University Venue: ISCA 2018 Being the third paper at ISCA-18 that exploits input redundancy in one way or other (after EVA2 and Euphrates), COREx (COmputation-REuse Accelerators) proposes an effective idea to improve speedup and energy efficiency of datacenters. The paper is motivated by the manifestation of  Zipf's law  in data center workloads such as internet traffic and data compression.  As the paper title suggests, COREx stores the outputs and inputs of common kernels, and skips computation by sending the stored output to the host, if the current input is the same as stored input. They define the storing step as " memorization " Trading communication for computation, this work is the exact opposite of AMNESIAC (published at ASPLOS-17), which trades computation for communication. They define 3 constraints that needs to be satisfied in-order for memorization to be successful. (1) Correct results: Memorization must p...

ZCOMP: Reducing DNN Cross-Layer Memory Footprint Using Vector Extensions

Authors: Berkin Akin, Zeshan A. Christi, and Alaa R. Alameldeen Venue: MICRO, 2019 With accelerators dominating the deep learning space in architecture conferences, this paper stands out as it focuses on reducing DNN inference/training overhead while using a CPU. An obvious question is, with the wide-spread use of GPUs/TPUs for deep learning, why should we focus on optimizing CPUs for deep learning. In a recent paper published at HPCA 2019, Facebook claims that CPUs are preferred for applications where a tight integration is required between DNN and non-DNN tasks. Also, Intel's recent AVX512 has specialized support for DNNs in the form of new instructions called Vector Neural Network Instructions (VNNI). Broadly, optimizing DNNs can be viewed from two different perspectives; computation, and communication. This paper targets reducing communication overhead, more specifically, activation or feature-map communication overhead by compressing them. Compressing activations/weights...