deep learning

Contributor:游客34089257 Type:English Date time:2018-05-26 16:47:34 Favorite:16 Score:0
返回上页 Report
请选择举报理由:




Collection Modify the typo
For deep versus shallow learning in educational psychology, see Student approaches to learning.
For more information, see Artificial neural network.
Machine learning and
data mining
Kernel Machine.svg
Problems[show]
Supervised learning
(classification • regression)
[show]
Clustering[show]
Dimensionality reduction[show]
Structured prediction[show]
Anomaly detection[show]
Neural nets[show]
Reinforcement learning[show]
Theory[show]
Machine-learning venues[show]
Related articles[show]
Portal-puzzle.svg Machine learning portal
v t e
Deep learning (also known as deep structured learning or hierarchical learning) is part of
a broader family of machine learning methods based on learning data representations,
as opposed to task-specific algorithms. Learning can be supervised, semi-supervised
or unsupervised.[1][2][3]
Deep learning architectures such as deep neural networks, deep belief networks and
recurrent neural networks have been applied to fields including computer vision,
speech recognition, natural language processing, audio recognition, social network
filtering, machine translation, bioinformatics and drug design,[4] where they have
produced results comparable to and in some cases superior[5] to human experts.[6]
Deep learning models are vaguely inspired by information processing and communication patterns
in biological nervous systems yet have various differences from the structural and
functional properties of biological brains, which make them incompatible with
neuroscience evidences.[7][8][9]
Contents
1 Definition
2 Overview
3 Interpretations
4 History
4.1 Deep learning revolution
5 Artificial neural networks
6 Deep neural networks
6.1 Challenges
7 Applications
7.1 Automatic speech recognition
7.2 Image recognition
7.3 Visual art processing
7.4 Natural language processing
7.5 Reliability of infrastructure systems
7.6 Drug discovery and toxicology
7.7 Customer relationship management
7.8 Recommendation systems
7.9 Bioinformatics
7.10 Mobile advertising
7.11 Image restoration
8 Relation to human cognitive and brain development
9 Commercial activity
10 Criticism and comment
10.1 Theory
10.2 Errors
10.3 Cyberthreat
11 See also
12 References
13 External links
Definition
Deep learning is a class of machine learning algorithms that:[10](pp199–200)
use a cascade of multiple layers of nonlinear processing units for feature
extraction and transformation. Each successive layer uses the output from the previous
layer as input.
learn in supervised (e.g., classification) and/or unsupervised (e.g., pattern analysis) manners.
learn multiple levels of representations that correspond to different levels of abstraction;
the levels form a hierarchy of concepts.
Overview
Most modern deep learning models are based on an artificial neural network, although they
can also include propositional formulas[11] or latent variables organized layer-wise in deep
generative models such as the nodes in Deep Belief Networks and Deep Boltzmann Machines.
In deep learning, each level learns to transform its input data into a slightly more abstract
and composite representation. In an image recognition application, the raw input may be
a matrix of pixels; the first representational layer may abstract the pixels and encode edges;
the second layer may compose and encode arrangements of edges; the third layer may encode a nose
and eyes; and the fourth layer may recognize that the image contains a face. Importantly,
a deep learning process can learn which features to optimally place in which level on its own.
(Of course, this does not completely obviate the need for hand-tuning; for example,
varying numbers of layers and layer sizes can provide different degrees of abstraction.)[1][12]
The "deep" in "deep learning" refers to the number of layers through which the data is transformed.
More precisely, deep learning systems have a substantial credit assignment path (CAP) depth.
The CAP is the chain of transformations from input to output. CAPs describe potentially causal
connections between input and output. For a feedforward neural network, the depth of the CAPs
is that of the network and is the number of hidden layers plus one (as the output layer is
also parameterized). For recurrent neural networks, in which a signal may propagate through
a layer more than once, the CAP depth is potentially unlimited.[2] No universally agreed upon
threshold of depth divides shallow learning from deep learning, but most researchers
agree that deep learning involves CAP depth > 2. CAP of depth 2 has been shown to be
a universal approximator in the sense that it can emulate any function.[citation needed]
Beyond that more layers do not add to the function approximator ability of the network.
The extra layers help in learning features.
Deep learning architectures are often constructed with a greedy layer-by-layer method.
[clarification needed][further explanation needed][citation needed] Deep learning helps
to disentangle these abstractions and pick out which features improve performance.[1]
For supervised learning tasks, deep learning methods obviate feature engineering,
by translating the data into compact intermediate representations akin to principal
components, and derive layered structures that remove redundancy in representation.
Deep learning algorithms can be applied to unsupervised learning tasks.
This is an important benefit because unlabeled data are more abundant than labeled data.
Examples of deep structures that can be trained in an unsupervised manner are neural
history compressors[13] and deep belief networks.[1][14]
Interpretations
Deep neural networks are generally interpreted in terms of the universal
approximation theorem[15][16][17][18][19] or probabilistic inference.[10][11][1][2][14][20][21]
The universal approximation theorem concerns the capacity of feedforward
neural networks with a single hidden layer of finite size to approximate
continuous functions.[15][16][17][18][19] In 1989, the first proof was
published by George Cybenko for sigmoid activation functions[16] and was
generalised to feed-forward multi-layer architectures in 1991 by Kurt Hornik.[17]
The probabilistic interpretation[20] derives from the field of machine
learning. It features inference,[10][11][1][2][14][20] as well as the
optimization concepts of training and testing, related to fitting and
generalization, respectively. More specifically, the probabilistic
interpretation considers the activation nonlinearity as a cumulative
distribution function.[20] The probabilistic interpretation led to the
introduction of dropout as regularizer in neural networks.[22] The probabilistic
interpretation was introduced by researchers including Hopfield, Widrow and Narendra
and popularized in surveys such as the one by Bishop.[23]
History
The term Deep Learning was introduced to the machine learning community by Rina Dechter
in 1986,[24][13] and to Artificial Neural Networks by Igor Aizenberg and colleagues in 2000,
in the context of Boolean threshold neurons.[25][26]
The first general, working learning algorithm for supervised, deep, feedforward,
multilayer perceptrons was published by Alexey Ivakhnenko and Lapa in 1965.[27] A 1971
paper described a deep network with 8 layers trained by the group method of data
handling algorithm.[28]
Other deep learning working architectures, specifically those built for computer
vision, began with the Neocognitron introduced by Kunihiko Fukushima in
1980.[29] In 1989, Yann LeCun et al. applied the standard backpropagation
algorithm, which had been around as the reverse mode of automatic differentiation
since 1970,[30][31][32][33] to a deep neural network with the purpose
of recognizing handwritten ZIP codes on mail. While the algorithm worked,
training required 3 days.[34]
By 1991 such systems were used for recognizing isolated 2-D hand-written digits,
while recognizing 3-D objects was done by matching 2-D images with a handcrafted
3-D object model. Weng et al. suggested that a human brain does not use a
monolithic 3-D object model and in 1992 they published Cresceptron,[35][36][37]
a method for performing 3-D object recognition in cluttered scenes. Cresceptron
is a cascade of layers similar to Neocognitron. But while Neocognitron required
a human programmer to hand-merge features, Cresceptron learned an open number of
features in each layer without supervision, where each feature is represented by
a convolution kernel. Cresceptron segmented each learned object from a cluttered
scene through back-analysis through the network. Max pooling, now often adopted by
deep neural networks (e.g. ImageNet tests), was first used in Cresceptron to reduce
the position resolution by a factor of (2x2) to 1 through the cascade for better generalization.
In 1994, André de Carvalho, together with Mike Fairhurst and David Bisset,
published experimental results of a multi-layer boolean neural network, also known
as a weightless neural network, composed of a 3-layers self-organising feature
extraction neural network module (SOFT) followed by a multi-layer classification
neural network module (GSN), which were independently trained. Each layer in the
feature extraction module extracted features with growing complexity regarding
the previous layer.[38]
In 1995, Brendan Frey demonstrated that it was possible to train (over two days)
a network containing six fully connected layers and several hundred hidden units
using the wake-sleep algorithm, co-developed with Peter Dayan and Hinton.[39] Many
factors contribute to the slow speed, including the vanishing gradient problem
analyzed in 1991 by Sepp Hochreiter.[40][41]
Simpler models that use task-specific handcrafted features such as Gabor filters
and support vector machines (SVMs) were a popular choice in the 1990s and 2000s,
because of ANNs' computational cost and a lack of understanding of how the brain
wires its biological networks.
Both shallow and deep learning (e.g., recurrent nets) of ANNs have been explored
for many years.[42][43][44] These methods never outperformed non-uniform
internal-handcrafting Gaussian mixture model/Hidden Markov model (GMM-HMM)
technology based on generative models of speech trained discriminatively.[45]
Key difficulties have been analyzed, including gradient diminishing[40] and
weak temporal correlation structure in neural predictive models.[46][47]
Additional difficulties were the lack of training data and limited computing power.
Most speech recognition researchers moved away from neural nets to pursue generat
ive modeling. An exception was at SRI International in the late 1990s.
Funded by the US government's NSA and DARPA, SRI studied deep neural
networks in speech and speaker recognition. Heck's speaker recognition
team achieved the first significant success with deep neural networks in
speech processing in the 1998 National Institute of Standards and Technology
Speaker Recognition evaluation.[48] While SRI experienced success with deep
neural networks in speaker recognition, they were unsuccessful in demonstrating
similar success in speech recognition. The principle of elevating "raw" features
over hand-crafted optimization was first explored successfully in the architecture
of deep autoencoder on the "raw" spectrogram or linear filter-bank features in the
late 1990s,[48] showing its superiority over the Mel-Cepstral features that contain
stages of fixed transformation from spectrograms. The raw features of speech,
waveforms, later produced excellent larger-scale results.[49]
Many aspects of speech recognition were taken over by a deep learning method called
Long short-term memory (LSTM), a recurrent neural network published by
Hochreiter and Schmidhuber in 1997.[50] LSTM RNNs avoid the vanishing gradient
problem and can learn "Very Deep Learning" tasks[2] that require memories of
events that happened thousands of discrete time steps before, which is important
for speech. In 2003, LSTM started to become competitive with traditional
speech recognizers on certain tasks.[51] Later it was combined with connectionist
temporal classification (CTC)[52] in stacks of LSTM RNNs.[53] In 2015, Google's
speech recognition reportedly experienced a dramatic performance jump of 49%
through CTC-trained LSTM, which they made available through Google Voice Search.[54]
In 2006, publications by Geoff Hinton, Ruslan Salakhutdinov, Osindero and Teh[55]
[56][57] showed how a many-layered feedforward neural network could be
effectively pre-trained one layer at a time, treating each layer in turn
as an unsupervised restricted Boltzmann machine, then fine-tuning it using
supervised backpropagation.[58] The papers referred to learning for deep belief nets.
Deep learning is part of state-of-the-art systems in various disciplines,
particularly computer vision and automatic speech recognition (ASR). Results on
commonly used evaluation sets such as TIMIT (ASR) and MNIST (image classification),
as well as a range of large-vocabulary speech recognition tasks have steadily improved.
[59][60][61] Convolutional neural networks (CNNs) were superseded for ASR by CTC[52] for
LSTM.[50][54][62][63][64][65][66] but are more successful in computer vision.
The impact of deep learning in industry began in the early 2000s, when CNNs already processed
an estimated 10% to 20% of all the checks written in the US, according
to Yann LeCun.[67] Industrial applications of deep learning to large-scale
speech recognition started around 2010.
The 2009 NIPS Workshop on Deep Learning for Speech Recognition[68] was
motivated by the limitations of deep generative models of speech, and the
possibility that given more capable hardware and large-scale data sets that
deep neural nets (DNN) might become practical. It was believed that pre-training
DNNs using generative models of deep belief nets (DBN) would overcome the main
difficulties of neural nets.[69] However, it was discovered that replacing pre-training
with large amounts of training data for straightforward backpropagation when using DNNs
with large, context-dependent output layers produced error rates dramatically lower
than then-state-of-the-art Gaussian mixture model (GMM)/Hidden Markov Model (HMM) and
also than more-advanced generative model-based systems.[59][70] The nature of the
recognition errors produced by the two types of systems was characteristically different,
[71][68] offering technical insights into how to integrate deep learning into the existing
highly efficient, run-time speech decoding system deployed by all major speech recognition
systems.[10][72][73] Analysis around 2009-2010, contrasted the GMM (and other generative
speech models) vs. DNN models, stimulated early industrial investment in deep learning
for speech recognition,[71][68] eventually leading to pervasive and dominant use in that
industry. That analysis was done with comparable performance (less than 1.5% in error rate)
between discriminative DNNs and generative models.[59][71][69][74]
In 2010, researchers extended deep learning from TIMIT to large vocabulary speech recognition,
by adopting large output layers of the DNN based on context-dependent HMM states constructed
by decision trees.[75][76][77][72]
Advances in hardware enabled the renewed interest. In 2009, Nvidia was involved in what was
called the “big bang” of deep learning, “as deep-learning neural networks were
trained with Nvidia graphics processing units (GPUs).”[78] That year, Google
Brain used Nvidia GPUs to create capable DNNs. While there, Ng determined that
GPUs could increase the speed of deep-learning systems by about 100 times.[79] In particular,
GPUs are well-suited for the matrix/vector math involved in machine learning.[80][81] GPUs
speed up training algorithms by orders of magnitude, reducing running times from weeks to
days.[82][83] Specialized hardware and algorithm optimizations can be used for
efficient processing.[84]
Deep learning revolution
In 2012, a team led by Dahl won the "Merck Molecular Activity Challenge" using
multi-task deep neural networks to predict the biomolecular target of one drug.
[85][86] In 2014, Hochreiter's group used deep learning to detect off-target and
toxic effects of environmental chemicals in nutrients, household products and drugs
and won the "Tox21 Data Challenge" of NIH, FDA and NCATS.[87][88][89]
Significant additional impacts in image or object recognition were felt from 2011 to
2012. Although CNNs trained by backpropagation had been around for decades, and GPU
implementations of NNs for years, including CNNs, fast implementations of CNNs with
max-pooling on GPUs in the style of Ciresan and colleagues were needed to progress
on computer vision.[80][81][34][90][2] In 2011, this approach achieved for the first
time superhuman performance in a visual pattern recognition contest. Also in 2011,
it won the ICDAR Chinese handwriting contest, and in May 2012, it won the ISBI image
segmentation contest.[91] Until 2011, CNNs did not play a major role at computer vision
conferences, but in June 2012, a paper by Ciresan et al. at the leading conference CVPR[5]
showed how max-pooling CNNs on GPU can dramatically improve many vision benchmark records.
In October 2012, a similar system by Krizhevsky et al.[6] won the large-scale ImageNet
competition by a significant margin over shallow machine learning methods. In November 2012,
Ciresan et al.'s system also won the ICPR contest on analysis of large medical images for cancer
detection, and in the following year also the MICCAI Grand Challenge on the same topic.[92]
In 2013 and 2014, the error rate on the ImageNet task using deep learning was further reduced,
following a similar trend in large-scale speech recognition. The Wolfram Image Identification
project publicized these improvements.[93]
Image classification was then extended to the more challenging task of generating descriptions
(captions) for images, often as a combination of CNNs and LSTMs.[94][95][96][97][98]
Some researchers assess that the October 2012 ImageNet victory anchored the start of
a "deep learning revolution" that has transformed the AI industry.[99]
声明:以上文章均为用户自行添加,仅供打字交流使用,不代表本站观点,本站不承担任何法律责任,特此声明!如果有侵犯到您的权利,请及时联系我们删除。
Hot degree:
Difficulty:
quality:
Description: the system according to the heat, the difficulty, the quality of automatic certification, the certification of the article will be involved in typing!

This paper typing ranking TOP20

登录后可见

用户更多文章推荐