電気学会 電子図書館
 HOME > 同研究会の論文誌(論文単位) > 文献詳細
Extended Summaryはついていません。

・会員価格 ¥550
・一般価格 ¥770
こちらはBookPark「電気学会 電子図書館(IEEJ Electronic Library)」による文献紹介ページです。
非会員の方はログインの必要はありません。このまま お進みください。
■ページ数 10ページ



A Method of Learning Rate Range Control for Adam to Suppress Sudden Changes of Parameters in Early Learning Stage

■著者名 行木 大輝(千葉工業大学大学院 情報科学研究科 情報科学専攻),山口 智(千葉工業大学 情報科学部 情報工学科)
■著者名(英語) Daiki Nameki (Graduate School of Information and Computer Science, Chiba Institute of Technology), Satoshi Yamaguchi (Department of Computer Science, Chiba Institute of Technology)
■価格 会員 ¥550 一般 ¥770
■書籍種類 論文誌(論文単位)
■グループ名 【C】電子・情報・システム部門
■本誌 電気学会論文誌C(電子・情報・システム部門誌) Vol.142 No.10 (2022) 特集:電子材料関連技術の最近の進展
■本誌掲載ページ 1156-1165ページ
■原稿種別 論文/日本語
■電子版へのリンク https://www.jstage.jst.go.jp/article/ieejeiss/142/10/142_1156/_article/-char/ja/
■キーワード ニューラルネットワーク,最適化アルゴリズム,Adam,AdaBound,RAdam,WarmUp   neural networks,optimization algorithms,Adam,AdaBound,RAdam,WarmUp
■要約(英語) Adam is one of the general optimization algorithms in neural networks. It can accelerate convergence speed while learning. It has, however, two problems. The first is that the final performance of a network trained by Adam, such as generalization ability, becomes worse than the one trained by SGD, in applications to large-scale networks. The second is that values of the learning rate tend to be large at the early learning stage; as a result, the values of network parameters, such as weights and bias, become too large by a first few iterations. In recent years, research has been conducted to solve these problems. AdaBound has been proposed for solving the first problem. This is a method switching dynamically from Adam to SGD. RAdam has also been proposed, for solving the second problem. This applies a method called WarmUp, which sets a small learning rate at the early learning stage and gradually increases it, to Adam. In this study, we propose to apply WarmUp to the upper limit of AdaBound's learning rate. The proposed algorithm prevents parameter updates at extremely large learning rates in the early learning stages. Therefore, more efficient learning can be expected than the conventional method. The proposed method has been applied to learning of some types of networks like CNN, ResNet, DenseNet and BERT. The results show that our method has improved performance compared to the traditional method, and an image classification task has shown a tendency to be more effective in large networks.
■版 型 A4
©Contents Works Inc.