Fonction d'activation

Dans le domaine des réseaux de neurones artificiels, la fonction d'activation est une fonction mathématique appliquée à un signal en sortie d'un neurone artificiel. Le terme de "fonction d'activation" vient de l'équivalent biologique "potentiel d'activation", seuil de stimulation qui, une fois atteint entraîne une réponse du neurone. La fonction d'activation est souvent une fonction non linéaire. Un exemple de fonction d'activation est la fonction de Heaviside, qui renvoie tout le temps 1 si le signal en entrée est positif, ou 0 s'il est négatif.

Caractéristiques des fonctions d'activation modifier

Les fonctions d'activation sont utilisées selon leurs caractéristiques :

Non-linéarité : Quand une fonction est non linéaire, un réseau neuronal à 2 couches peut être considéré comme un approximateur de fonction universel^[1]. Note: La fonction identité a l'effet inverse, rendant un réseau neuronal multicouches équivalent à un réseau neuronal à une mono-couche.
Partout différentiable : Cette propriété permet de créer des optimisations basées sur les gradients^[2].
Étendue : Quand la plage d'activation est finie, les méthodes d'apprentissage basées sur les gradients sont plus stables (impact sur un nombre de poids limités). Quand la plage est infinie, l'apprentissage est généralement plus efficace (impact sur davantage de poids).
Monotone: Lorsque la fonction est monotone, la surface d'erreur associée avec un modèle monocouche est certifié convexe^[3].
Douce (dérivée monotone) : Les fonctions à dérivée monotone ont été montrées comme ayant une meilleure capacité à généraliser dans certains cas. Ces fonctions permettent d'appliquer des principes comme le rasoir d'Ockham^[4].
Identité en 0 ( $f(x)\approx x$ quand $x\approx 0$ ) : Ces fonctions permettent de faire un apprentissage rapide en initialisant les poids de manière aléatoire. Si la fonction ne converge pas vers l'identité en 0, alors un soin spécial doit être apporté lors de l'initialisation des poids^[5].

Liste de fonctions d'activation usuelles modifier

Comparatif des principales fonctions, avec leur étendue, leur continuité, si elles sont monotones, douces et si elles convergent vers l'identité en 0.

Nom	Équation	Dérivée	Étendue	Ordre de continuité	Monotone	Lisse (dérivée monotone)	Identité en 0
Identité/Rampe	$f(x)=x$	$f'(x)=1$	$\mathbb {R}$	$C^{\infty }$	Oui	Oui	Oui
Marche/Heaviside	$f(x)=\left\{{\begin{array}{rcl}0&{\mbox{si}}&x<0\\1&{\mbox{si}}&x\geq 0\end{array}}\right.$	$f'(x)=\left\{{\begin{array}{rcl}0&{\mbox{si}}&x\neq 0\\?&{\mbox{si}}&x=0\end{array}}\right.$	$\{0,1\}$	$C^{-1}$	Oui	Non	Non
Logistique (ou marche douce, ou sigmoïde)	$f(x)={\frac {1}{1+{\rm {e}}^{-x}}}$	$f'(x)=f(x)(1-f(x))$	$[0;1]$	$C^{\infty }$	Oui	Non	Non
Tangente hyperbolique	$f(x)=\tanh(x)={\frac {2}{1+{\rm {e}}^{-2x}}}-1$	$f'(x)=1-f(x)^{2}$	$[-1;1]$	$C^{\infty }$	Oui	Non	Oui
Arc tangente	$f(x)=\tan ^{-1}(x)$	$f'(x)={\frac {1}{x^{2}+1}}$	$\left[-{\frac {\pi }{2}},{\frac {\pi }{2}}\right]$	$C^{\infty }$	Oui	Non	Oui
Signe doux ^[6]	$f(x)={\frac {x}{1+\|x\|}}$	$f'(x)={\frac {1}{(1+\|x\|)^{2}}}$	$[-1;1]$	$C^{1}$	Oui	Non	Oui
Unité de rectification linéaire (ReLU)^[7]	$f(x)=\left\{{\begin{array}{rcl}0&{\mbox{si}}&x<0\\x&{\mbox{si}}&x\geq 0\end{array}}\right.$	$f'(x)=\left\{{\begin{array}{rcl}0&{\mbox{si}}&x<0\\1&{\mbox{si}}&x\geq 0\end{array}}\right.$	$\mathbb {R} _{+}$	$C^{0}$	Oui	Oui	Oui
Unité de rectification linéaire paramétrique (PReLU)^[8]	$f(x)=\left\{{\begin{array}{rcl}\alpha x&{\mbox{si}}&x<0\\x&{\mbox{si}}&x\geq 0\end{array}}\right.$	$f'(x)=\left\{{\begin{array}{rcl}\alpha &{\mbox{si}}&x<0\\1&{\mbox{si}}&x\geq 0\end{array}}\right.$	$\mathbb {R}$	$C^{0}$	Oui	Oui	Oui
Unité exponentielle linéaire (ELU)^[9]	$f(x)=\left\{{\begin{array}{rcl}\alpha ({\rm {e}}^{x}-1)&{\mbox{si}}&x<0\\x&{\mbox{si}}&x\geq 0\end{array}}\right.$	$f'(x)=\left\{{\begin{array}{rcl}f(x)+\alpha &{\mbox{si}}&x<0\\1&{\mbox{si}}&x\geq 0\end{array}}\right.$	$\left[-\alpha ,+\infty \right[$	$C^{1}$ si $\alpha =1$	Oui	Oui	Oui, ssi $\alpha \approx 1$
Unité de rectification linéaire douce (SoftPlus)^[10]	$f(x)=\ln(1+{\rm {e}}^{x})$	$f'(x)={\frac {1}{1+{\rm {e}}^{-x}}}$	$\mathbb {R} _{+}$	$C^{\infty }$	Oui	Oui	Non
Identité courbée	$f(x)={\frac {{\sqrt {x^{2}+1}}-1}{2}}+x$	$f'(x)={\frac {x}{2{\sqrt {x^{2}+1}}}}+1$	$\mathbb {R}$	$C^{\infty }$	Oui	Oui	Oui
Exponentielle douce paramétrique (soft exponential) ^[11]	$f(\alpha ,x)=\left\{{\begin{array}{rcl}-{\frac {\ln(1-\alpha (x+\alpha ))}{\alpha }}&{\mbox{si}}&\alpha <0\\x&{\mbox{si}}&\alpha =0\\{\frac {{\rm {e}}^{\alpha x}-1}{\alpha }}+\alpha &{\mbox{si}}&\alpha >0\end{array}}\right.$	$f'(\alpha ,x)=\left\{{\begin{array}{rcl}{\frac {1}{1-\alpha (\alpha +x)}}&{\mbox{si}}&\alpha <0\\{\rm {e}}^{\alpha x}&{\mbox{si}}&\alpha \geq 0\end{array}}\right.$	$\mathbb {R}$	$C^{\infty }$	Oui	Oui	Oui, ssi $\alpha \approx 0$
Sinusoïde	$f(x)=\sin(x)$	$f'(x)=\cos(x)$	$[-1;1]$	$C^{\infty }$	Non	Non	Oui
Sinus cardinal	$f(x)=\left\{{\begin{array}{rcl}{\frac {\sin(x)}{x}}&{\mbox{si}}&x\neq 0\\1&{\mbox{si}}&x=0\end{array}}\right.$	$f'(x)=\left\{{\begin{array}{rcl}0&{\mbox{si}}&x=0\\{\frac {\cos(x)}{x}}-{\frac {\sin(x)}{x^{2}}}&{\mbox{si}}&x\neq 0\end{array}}\right.$	$[\sim -0,217234...;1]$	$C^{\infty }$	Non	Non	Non
Fonction gaussienne	$f(x)={\rm {e}}^{-x^{2}}$	$f'(x)=-2x{\rm {e}}^{-x^{2}}$	$]0;1]$	$C^{\infty }$	Non	Non	Non

Structures alternatives modifier

Une classe spéciale de fonction d'activation est regroupée dans les fonctions à base radiale (RBFs) . Elles sont souvent utilisées dans les réseaux neuronaux RBF, très efficaces en tant qu'approximations de fonction universels. si ces fonctions peuvent être très variées, on retrouve généralement une des trois formes suivantes (en fonction d'un vecteur $v$ :

Fonction gaussienne : $\,\phi (v_{i})=\exp \left(-{\frac {\|v_{i}-c_{i}\|^{2}}{2a^{2}}}\right)$
Fonction multiquadratique : $\,\phi (v_{i})={\sqrt {\|v_{i}-c_{i}\|^{2}+a^{2}}}$
Fonction multiquadratique inverse: $\,\phi (v_{i})={\frac {1}{\sqrt {\|v_{i}-c_{i}\|^{2}+a^{2}}}}$

où $c i$ est le vecteur représentant le centre de la fonction, $a$ est un paramètre permettant de régler l'étalement de la fonction.

Les machines à support vectoriel (SVMs) peuvent utiliser une classe de fonctions d'activation qui inclut à la fois les sigmoïdes et les RBF. Dans ce cas, l'entrée est transformée pour refléter un decision boundary hyperplane, basé sur peu d'entrées (appelées vecteurs support $x$ . La fonction d'activation pour les couches cachées de ces machines est souvent appelée "noyau du produit intérieur" : $K(v_{i},x)=\phi (v_{i})$ . Les vecteurs supports sont représentés comme les centres de RBF dont le noyau serait égal aux fonctions d'activation, mais ils prennent une unique forme de perceptron :

\,\phi (v_{i})=\tanh \left(\beta _{1}+\beta _{0}\sum _{j}v_{i,j}x_{j}\right)

,

Où $\beta _{0}$ et $\beta _{1}$ doivent satisfaire certains critères de convergence. Ces machines peuvent aussi accepter des fonctions d'activation polynomiale d'un ordre arbitraire^[12]:

\,\phi (v_{i})=\left(1+\sum _{j}v_{i,j}x_{j}\right)^{p}

.

Voir aussi modifier

Références modifier

↑ Cybenko, George. "Approximation by superpositions of a sigmoidal function." Mathematics of control, signals and systems 2.4 (1989): 303-314.
↑ Snyman, Jan. Practical mathematical optimization: an introduction to basic optimization theory and classical and new gradient-based algorithms. Vol. 97. Springer Science & Business Media, 2005.
↑ Wu, Huaiqin. "Global stability analysis of a general class of discontinuous neural networks with linear growth activation functions." Information Sciences 179.19 (2009): 3432-3441.
↑ Gashler, Michael S., and Stephen C. Ashmore. "Training Deep Fourier Neural Networks to Fit Time-Series Data." Intelligent Computing in Bioinformatics. Springer International Publishing, 2014. 48-55, « 1405.2262 », texte en accès libre, sur arXiv.
↑ Sussillo, David, and L. F. Abbott. "Random walks: Training very deep nonlinear feed-forward networks with smart initialization." CoRR, « 1412.6558 », texte en accès libre, sur arXiv. (2014): 286.
↑ (en) Xavier Glorot et Yoshua Bengio, « Understanding the difficulty of training deep feedforward neural networks », Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’10), Society for Artificial Intelligence and Statistics,‎ 2010 (lire en ligne)
↑ (en) Vinod Nair et Geoffrey E. Hinton, « Rectified linear units improve restricted boltzmann machines », Proceedings of the 27th International Conference on Machine Learning (ICML-10),‎ 2010
↑ (en) Kaiming He, Xiangyu Zhang, Shaoqing Ren et Jian Sun, « Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification », Computer Vision and Pattern Recognition,‎ 2015 « 1502.01852 », texte en accès libre, sur arXiv.
↑ (en) Djork-Arné Clevert, Thomas Unterthiner et Sepp Hochreiter, « Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) », Machine Learning,‎ 2016 « 1511.07289v3 », texte en accès libre, sur arXiv.
↑ (en) Xavier Glorot, Antoine Bordes et Yoshua Bengio, « Deep sparse rectifier neural networks », International Conference on Artificial Intelligence and Statistics,‎ 2011
↑ (en) Luke B. Godfrey et Michael S. Gashler, « A Continuum among Logarithmic, Linear, and Exponential Functions, and Its Potential to Improve Generalization in Neural Networks », Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: KDIR, Lisbonne, Portugal,‎ novembre 2015, p. 481-486, « 1602.01321 », texte en accès libre, sur arXiv..
↑ (en) Simon Haykin, Neural Networks : A Comprehensive Foundation, Prentice Hall, 1998, 2^e éd., 842 p. (ISBN 0-13-273350-1)

[1] Cybenko, George. "Approximation by superpositions of a sigmoidal function." Mathematics of control, signals and systems 2.4 (1989): 303-314.

[2] Snyman, Jan. Practical mathematical optimization: an introduction to basic optimization theory and classical and new gradient-based algorithms. Vol. 97. Springer Science & Business Media, 2005.

[3] Wu, Huaiqin. "Global stability analysis of a general class of discontinuous neural networks with linear growth activation functions." Information Sciences 179.19 (2009): 3432-3441.

[4] Gashler, Michael S., and Stephen C. Ashmore. "Training Deep Fourier Neural Networks to Fit Time-Series Data." Intelligent Computing in Bioinformatics. Springer International Publishing, 2014. 48-55, « 1405.2262 », texte en accès libre, sur arXiv.

[5] Sussillo, David, and L. F. Abbott. "Random walks: Training very deep nonlinear feed-forward networks with smart initialization." CoRR, « 1412.6558 », texte en accès libre, sur arXiv. (2014): 286.

[6] (en) Xavier Glorot et Yoshua Bengio, « Understanding the difficulty of training deep feedforward neural networks », Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’10), Society for Artificial Intelligence and Statistics,‎ 2010 (lire en ligne)

[7] (en) Vinod Nair et Geoffrey E. Hinton, « Rectified linear units improve restricted boltzmann machines », Proceedings of the 27th International Conference on Machine Learning (ICML-10),‎ 2010

[8] (en) Kaiming He, Xiangyu Zhang, Shaoqing Ren et Jian Sun, « Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification », Computer Vision and Pattern Recognition,‎ 2015 « 1502.01852 », texte en accès libre, sur arXiv.

[9] (en) Djork-Arné Clevert, Thomas Unterthiner et Sepp Hochreiter, « Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) », Machine Learning,‎ 2016 « 1511.07289v3 », texte en accès libre, sur arXiv.

[10] (en) Xavier Glorot, Antoine Bordes et Yoshua Bengio, « Deep sparse rectifier neural networks », International Conference on Artificial Intelligence and Statistics,‎ 2011

[11] (en) Luke B. Godfrey et Michael S. Gashler, « A Continuum among Logarithmic, Linear, and Exponential Functions, and Its Potential to Improve Generalization in Neural Networks », Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: KDIR, Lisbonne, Portugal,‎ novembre 2015, p. 481-486, « 1602.01321 », texte en accès libre, sur arXiv..

[12] (en) Simon Haykin, Neural Networks : A Comprehensive Foundation, Prentice Hall, 1998, 2^e éd., 842 p. (ISBN 0-13-273350-1)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]