{"id":1675,"date":"2016-08-26T18:53:17","date_gmt":"2016-08-26T09:53:17","guid":{"rendered":"http:\/\/blog.themusio.com\/?p=1675"},"modified":"2024-05-01T10:56:34","modified_gmt":"2024-05-01T01:56:34","slug":"alternatives-to-the-softmax-layer","status":"publish","type":"post","link":"https:\/\/blog.themusio.com\/?p=1675","title":{"rendered":"Alternatives to the softmax layer"},"content":{"rendered":"<div id=\"table-of-contents\">\n<h2>Table of Contents<\/h2>\n<div id=\"text-table-of-contents\">\n<ul>\n<li><a href=\"#org9eeba93\">1. Alternatives to the softmax layer&#xa0;&#xa0;&#xa0;<span class=\"tag\"><span class=\"softmax\">softmax<\/span><\/span><\/a>\n<ul>\n<li><a href=\"#org96f3239\">1.1. goal<\/a><\/li>\n<li><a href=\"#org8f6d0d3\">1.2. motivation<\/a><\/li>\n<li><a href=\"#org13f0828\">1.3. ingredients<\/a><\/li>\n<li><a href=\"#orge9bb5a2\">1.4. steps<\/a><\/li>\n<li><a href=\"#orgb798bd9\">1.5. outlook<\/a><\/li>\n<li><a href=\"#orgbd4c972\">1.6. resources<\/a><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<h1>Alternatives to the softmax layer     <a id=\"org9eeba93\"><\/a><\/h1>\n<h2>goal<a id=\"org96f3239\"><\/a><\/h2>\n<p>This weeks posts deals with some possible alternatives to the softmax layer when calculating probabilities for words over large vocabularies.<\/p>\n<h2>motivation<a id=\"org8f6d0d3\"><\/a><\/h2>\n<p>Natural language tasks as neural machine translation or dialogue generation rely on word embeddings at the input and output layer.<br \/>\nFurther for decent performances a very large vocabulary is needed to reduce the number of out of vocabulary words that cannot be properly embedded and therefore not processed.<br \/>\nThe natural language models used for these task usually come with a final softmax layer to compute the probabilities over the words in the vocabulary.<br \/>\nNow huge vocabularies make this final computation extremely expensive and therefore slows down the whole training process.<br \/>\nThis makes it necessary to look for more efficient alternatives to the softmax layer.<\/p>\n<h2>ingredients<a id=\"org13f0828\"><\/a><\/h2>\n<p>softmax, hierarchical softmax, differentiated softmax, importance sampling, noise contrastive estimation, negative sampling<\/p>\n<h2>steps<a id=\"orge9bb5a2\"><\/a><\/h2>\n<p>Before we start, let us repeat the general procedure for generating a translation of a sentence or a response to an utterance in a dialogue.<br \/>\nDepending on some context, which are most often the previous words in a sequence the next word is selected according to some normalized probability distribution over all words in the vocabulary.<br \/>\nThe actual computation is performed by a softmax layer which computes an activation for each word in the vocabulary by defining a matrix of the size of the vocabulary times dimension of the output of the previous layer.<br \/>\nSince vocabulary sizes from 10K to 1B are becoming standard the final softmax layer is way bigger than every other layer in our network architecture.<br \/>\nFurthermore the normalization of the activation forces us to sum over all words in the vocabulary.<\/p>\n<p>Apart from the obvious expensive summation over a large vocabulary there are further problems that arise indirectly and affect the overall training time.<br \/>\nSince we deal with large vocabularies we also need large training data sets in order to have sufficient training examples for infrequent words.<br \/>\nOn top of that larger training sets usually allow and make it necessary to train larger models.<br \/>\nHence the training time per epoch usually increases and the learning performance is reduced due two deeper network architectures.<\/p>\n<p>Alternatives for the softmax layer can in principle distinguished into two classes.<br \/>\nFirst we have a look at modifications of the softmax layer itself before we explain sampling approaches which allow to completely get rid of the softmax layer.<br \/>\nOne alternative, known as hierarchical softmax, modifies the network architecture by organizing the vocabulary into a tree.<br \/>\nEvery word is now reachable by a unique path and the probability of a word is the product of probabilities along the path.<br \/>\nTypically one uses balanced binary trees which have a depth equal to the logarithm of the vocabulary and hence reduce the computation time.<br \/>\nAnother method closely related to information encoding makes use of frequency clustering.<br \/>\nMore frequent words have shorter paths and one can achieve a further speed up compared to binary trees.<br \/>\nDuring test time where we do not have the target words we have to rely on the standard softmax layer.<\/p>\n<p>The idea behind differentiated softmax is to group words in the vocabulary and assign a fixed number of parameters to each.<br \/>\nFormulated differently, we split the final linear transformation into block matrices and hence only take a fixed number of parameters into consideration when computing the activations.<br \/>\nUsually one allows more parameters for more frequent words which on the other hand makes it even more difficult to learn proper word embeddings for infrequent words.<br \/>\nThis method can obviously also applied during test time.<\/p>\n<p>We now come to sampling approaches which are based on approximating the loss function and in particular the sum over the whole vocabulary.<br \/>\nTaking a closer look at the gradient of the loss function clarifies how sampling can be applied.<br \/>\nThe gradient consists of a positive reinforcement term for the target word and a negative reinforcement term over all  other words in the vocabulary weighted by their probabilities.<br \/>\nIt is the expectation of the gradient of the activations of all words.<\/p>\n<p>A first approach, called importance sampling, to this term is heavily applying Monte Carlo methods.<br \/>\nWe approximate the expectation value by sampling from the networks probability distribution.<br \/>\nSince this is exactly what we wanted to avoid, we introduce a proposal distribution which should be close to the network distribution and easier to sample from.<br \/>\nIn practice one chooses the unigram distribution of the training set.<br \/>\nThis allows for easy sampling, however we still have to rely on the normalized network distribution when calculating the expectations.<br \/>\nFortunately, we can work with a biased estimator by approximating the sum over the vocabulary in the denominator with the same samples drawn from the proposal distribution.<br \/>\nThe quality and performance of importance sampling heavily relies on the number of samples drawn from the proposal distribution.<\/p>\n<p>Two other methods rely on introducing a binary classification task based on an auxiliary loss function which indirectly maximizes the probability over correct words.<br \/>\nFirst we list noise contrastive estimation, before we go one step further and introduce and approximation of it.<br \/>\nBy generating a certain number of noise samples from a distribution, e.g. the unigram distribution of the training set, we try to guess the label of a number of sampled words.<br \/>\nFor this we introduce a mixture distribution consisting of the network distribution and the noise distribution.<br \/>\nWords from the noise distribution are labeled false and the actual target words are labeled true.<br \/>\nThe sum in the denominator over the vocabulary was found to be roughly one with a low variance when it was chosen as a free parameter that the network should learn.<br \/>\nHence it is reasonable to set it to one.<\/p>\n<p>Negative sampling is based on noise contrastive estimation and is in particular an approximation of it.<br \/>\nBy setting the number of noise samples equal to the size of the vocabulary and the noise distribution equal to the uniform distribution, we can reduce the computational steps further.<br \/>\nSince this approximation no longer optimizes for the correct words this method is not used for language modeling.<br \/>\nHowever for calculating word embeddings it was found to be useful.<\/p>\n<p>All methods allow for speed ups during training time.<br \/>\nCompared to the standard softmax computational time is reduced by a factors between 2 and 100.<br \/>\nHowever most of these methods only apply to training and only differentiated softmax with a speed up of a factor of 2 is also applicable to testing and inference.<br \/>\nThe performance of each method also heavily varies with the size of the vocabulary and certain methods perform worse for infrequent words.<\/p>\n<h2>outlook<a id=\"orgb798bd9\"><\/a><\/h2>\n<p>Since beam search, which is nowadays standard during inference and might also be important for performance reason during training time adjustments by the above mentioned methods seems very promising.<\/p>\n<h2>resources<a id=\"orgbd4c972\"><\/a><\/h2>\n<p><a href=\"http:\/\/aclweb.org\/anthology\/P\/P16\/P16-1186.pdf\">http:\/\/aclweb.org\/anthology\/P\/P16\/P16-1186.pdf<\/a><br \/>\n<a href=\"http:\/\/sebastianruder.com\/word-embeddings-softmax\/\">http:\/\/sebastianruder.com\/word-embeddings-softmax\/<\/a><br \/>\n<a href=\"http:\/\/arxiv.org\/pdf\/1410.8251v1.pdf\">http:\/\/arxiv.org\/pdf\/1410.8251v1.pdf<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Table of Contents 1. Alternatives to the softmax layer&#xa0;&#xa0;&#xa0;softmax 1.1. goal 1.2. motivation 1.3. [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[3642,3640],"tags":[3650,3758,3760,3656,3762,3658,3700,3900,3902,3904,3710,3906,3908,3878,3886],"class_list":["post-1675","post","type-post","status-publish","format-standard","hentry","category-ai-en","category-all-en","tag-ai-ja-en","tag-aka-intelligence-en","tag-artificial-intelligence-en","tag-baggage-en","tag-children-book-ja-en","tag-christmas-en","tag-cmos-en","tag-differentiated-softmax-en","tag-hierarchical-softmax-en","tag-importance-sampling-en","tag-musio-en","tag-negative-sampling-en","tag-noise-contrastive-estimation-en","tag-parents-en","tag-softmax-en"],"aioseo_notices":[],"aioseo_head":"\n\t\t<!-- All in One SEO 4.9.8 - aioseo.com -->\n\t<meta name=\"description\" content=\"Table of Contents 1. Alternatives to the softmax layer&amp;\" \/>\n\t<meta name=\"robots\" content=\"max-image-preview:large\" \/>\n\t<meta name=\"author\" content=\"Musio Team\"\/>\n\t<link rel=\"canonical\" href=\"https:\/\/blog.themusio.com\/?p=1675\" \/>\n\t<meta name=\"generator\" content=\"All in One SEO (AIOSEO) 4.9.8\" \/>\n\t\t<meta property=\"og:locale\" content=\"ja_JP\" \/>\n\t\t<meta property=\"og:site_name\" content=\"Musio Blog\" \/>\n\t\t<meta property=\"og:type\" content=\"article\" \/>\n\t\t<meta property=\"og:title\" content=\"Alternatives to the softmax layer | Musio Blog\" \/>\n\t\t<meta property=\"og:description\" content=\"Table of Contents 1. Alternatives to the softmax layer&amp;\" \/>\n\t\t<meta property=\"og:url\" content=\"https:\/\/blog.themusio.com\/?p=1675\" \/>\n\t\t<meta property=\"article:published_time\" content=\"2016-08-26T09:53:17+00:00\" \/>\n\t\t<meta property=\"article:modified_time\" content=\"2024-05-01T01:56:34+00:00\" \/>\n\t\t<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n\t\t<meta name=\"twitter:title\" content=\"Alternatives to the softmax layer | Musio Blog\" \/>\n\t\t<meta name=\"twitter:description\" content=\"Table of Contents 1. Alternatives to the softmax layer&amp;\" \/>\n\t\t<script type=\"application\/ld+json\" class=\"aioseo-schema\">\n\t\t\t{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"BlogPosting\",\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?p=1675#blogposting\",\"name\":\"Alternatives to the softmax layer | Musio Blog\",\"headline\":\"Alternatives to the softmax layer\",\"author\":{\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?author=2#author\"},\"publisher\":{\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/#organization\"},\"datePublished\":\"2016-08-26T18:53:17+09:00\",\"dateModified\":\"2024-05-01T10:56:34+09:00\",\"inLanguage\":\"ja\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?p=1675#webpage\"},\"isPartOf\":{\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?p=1675#webpage\"},\"articleSection\":\"A.I\\ud83c\\uddfa\\ud83c\\uddf8, All Articles, AI, AKA Intelligence, Artificial Intelligence, Baggage, Children Book, Christmas, CMOS, Differentiated softmax, Hierarchical Softmax, Importance sampling, Musio, Negative sampling, Noise contrastive estimation, parents, Softmax, English\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?p=1675#breadcrumblist\",\"itemListElement\":[{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.themusio.com#listItem\",\"position\":1,\"name\":\"\\u5bb6\",\"item\":\"https:\\\/\\\/blog.themusio.com\",\"nextItem\":{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?cat=3640#listItem\",\"name\":\"All Articles\"}},{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?cat=3640#listItem\",\"position\":2,\"name\":\"All Articles\",\"item\":\"https:\\\/\\\/blog.themusio.com\\\/?cat=3640\",\"nextItem\":{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?cat=3642#listItem\",\"name\":\"A.I\\ud83c\\uddfa\\ud83c\\uddf8\"},\"previousItem\":{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.themusio.com#listItem\",\"name\":\"\\u5bb6\"}},{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?cat=3642#listItem\",\"position\":3,\"name\":\"A.I\\ud83c\\uddfa\\ud83c\\uddf8\",\"item\":\"https:\\\/\\\/blog.themusio.com\\\/?cat=3642\",\"nextItem\":{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?p=1675#listItem\",\"name\":\"Alternatives to the softmax layer\"},\"previousItem\":{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?cat=3640#listItem\",\"name\":\"All Articles\"}},{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?p=1675#listItem\",\"position\":4,\"name\":\"Alternatives to the softmax layer\",\"previousItem\":{\"@type\":\"ListItem\",\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?cat=3642#listItem\",\"name\":\"A.I\\ud83c\\uddfa\\ud83c\\uddf8\"}}]},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/#organization\",\"name\":\"musio_blog\",\"description\":\"Meet Musio, Your Curious New Friend.\",\"url\":\"https:\\\/\\\/blog.themusio.com\\\/\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?author=2#author\",\"url\":\"https:\\\/\\\/blog.themusio.com\\\/?author=2\",\"name\":\"Musio Team\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?p=1675#authorImage\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/383633914125f30a1407c18aab62c47a25a6098f59185eb06b491dccb5b8fe42?s=96&d=mm&r=g\",\"width\":96,\"height\":96,\"caption\":\"Musio Team\"}},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?p=1675#webpage\",\"url\":\"https:\\\/\\\/blog.themusio.com\\\/?p=1675\",\"name\":\"Alternatives to the softmax layer | Musio Blog\",\"description\":\"Table of Contents 1. Alternatives to the softmax layer&\",\"inLanguage\":\"ja\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/#website\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?p=1675#breadcrumblist\"},\"author\":{\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?author=2#author\"},\"creator\":{\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/?author=2#author\"},\"datePublished\":\"2016-08-26T18:53:17+09:00\",\"dateModified\":\"2024-05-01T10:56:34+09:00\"},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/#website\",\"url\":\"https:\\\/\\\/blog.themusio.com\\\/\",\"name\":\"musio_blog\",\"description\":\"Meet Musio, Your Curious New Friend.\",\"inLanguage\":\"ja\",\"publisher\":{\"@id\":\"https:\\\/\\\/blog.themusio.com\\\/#organization\"}}]}\n\t\t<\/script>\n\t\t<!-- All in One SEO -->\n\n","aioseo_head_json":{"title":"Alternatives to the softmax layer | Musio Blog","description":"Table of Contents 1. Alternatives to the softmax layer&","canonical_url":"https:\/\/blog.themusio.com\/?p=1675","robots":"max-image-preview:large","keywords":"","webmasterTools":{"miscellaneous":""},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"BlogPosting","@id":"https:\/\/blog.themusio.com\/?p=1675#blogposting","name":"Alternatives to the softmax layer | Musio Blog","headline":"Alternatives to the softmax layer","author":{"@id":"https:\/\/blog.themusio.com\/?author=2#author"},"publisher":{"@id":"https:\/\/blog.themusio.com\/#organization"},"datePublished":"2016-08-26T18:53:17+09:00","dateModified":"2024-05-01T10:56:34+09:00","inLanguage":"ja","mainEntityOfPage":{"@id":"https:\/\/blog.themusio.com\/?p=1675#webpage"},"isPartOf":{"@id":"https:\/\/blog.themusio.com\/?p=1675#webpage"},"articleSection":"A.I\ud83c\uddfa\ud83c\uddf8, All Articles, AI, AKA Intelligence, Artificial Intelligence, Baggage, Children Book, Christmas, CMOS, Differentiated softmax, Hierarchical Softmax, Importance sampling, Musio, Negative sampling, Noise contrastive estimation, parents, Softmax, English"},{"@type":"BreadcrumbList","@id":"https:\/\/blog.themusio.com\/?p=1675#breadcrumblist","itemListElement":[{"@type":"ListItem","@id":"https:\/\/blog.themusio.com#listItem","position":1,"name":"\u5bb6","item":"https:\/\/blog.themusio.com","nextItem":{"@type":"ListItem","@id":"https:\/\/blog.themusio.com\/?cat=3640#listItem","name":"All Articles"}},{"@type":"ListItem","@id":"https:\/\/blog.themusio.com\/?cat=3640#listItem","position":2,"name":"All Articles","item":"https:\/\/blog.themusio.com\/?cat=3640","nextItem":{"@type":"ListItem","@id":"https:\/\/blog.themusio.com\/?cat=3642#listItem","name":"A.I\ud83c\uddfa\ud83c\uddf8"},"previousItem":{"@type":"ListItem","@id":"https:\/\/blog.themusio.com#listItem","name":"\u5bb6"}},{"@type":"ListItem","@id":"https:\/\/blog.themusio.com\/?cat=3642#listItem","position":3,"name":"A.I\ud83c\uddfa\ud83c\uddf8","item":"https:\/\/blog.themusio.com\/?cat=3642","nextItem":{"@type":"ListItem","@id":"https:\/\/blog.themusio.com\/?p=1675#listItem","name":"Alternatives to the softmax layer"},"previousItem":{"@type":"ListItem","@id":"https:\/\/blog.themusio.com\/?cat=3640#listItem","name":"All Articles"}},{"@type":"ListItem","@id":"https:\/\/blog.themusio.com\/?p=1675#listItem","position":4,"name":"Alternatives to the softmax layer","previousItem":{"@type":"ListItem","@id":"https:\/\/blog.themusio.com\/?cat=3642#listItem","name":"A.I\ud83c\uddfa\ud83c\uddf8"}}]},{"@type":"Organization","@id":"https:\/\/blog.themusio.com\/#organization","name":"musio_blog","description":"Meet Musio, Your Curious New Friend.","url":"https:\/\/blog.themusio.com\/"},{"@type":"Person","@id":"https:\/\/blog.themusio.com\/?author=2#author","url":"https:\/\/blog.themusio.com\/?author=2","name":"Musio Team","image":{"@type":"ImageObject","@id":"https:\/\/blog.themusio.com\/?p=1675#authorImage","url":"https:\/\/secure.gravatar.com\/avatar\/383633914125f30a1407c18aab62c47a25a6098f59185eb06b491dccb5b8fe42?s=96&d=mm&r=g","width":96,"height":96,"caption":"Musio Team"}},{"@type":"WebPage","@id":"https:\/\/blog.themusio.com\/?p=1675#webpage","url":"https:\/\/blog.themusio.com\/?p=1675","name":"Alternatives to the softmax layer | Musio Blog","description":"Table of Contents 1. Alternatives to the softmax layer&","inLanguage":"ja","isPartOf":{"@id":"https:\/\/blog.themusio.com\/#website"},"breadcrumb":{"@id":"https:\/\/blog.themusio.com\/?p=1675#breadcrumblist"},"author":{"@id":"https:\/\/blog.themusio.com\/?author=2#author"},"creator":{"@id":"https:\/\/blog.themusio.com\/?author=2#author"},"datePublished":"2016-08-26T18:53:17+09:00","dateModified":"2024-05-01T10:56:34+09:00"},{"@type":"WebSite","@id":"https:\/\/blog.themusio.com\/#website","url":"https:\/\/blog.themusio.com\/","name":"musio_blog","description":"Meet Musio, Your Curious New Friend.","inLanguage":"ja","publisher":{"@id":"https:\/\/blog.themusio.com\/#organization"}}]},"og:locale":"ja_JP","og:site_name":"Musio Blog","og:type":"article","og:title":"Alternatives to the softmax layer | Musio Blog","og:description":"Table of Contents 1. Alternatives to the softmax layer&amp;","og:url":"https:\/\/blog.themusio.com\/?p=1675","article:published_time":"2016-08-26T09:53:17+00:00","article:modified_time":"2024-05-01T01:56:34+00:00","twitter:card":"summary_large_image","twitter:title":"Alternatives to the softmax layer | Musio Blog","twitter:description":"Table of Contents 1. Alternatives to the softmax layer&amp;"},"aioseo_meta_data":{"post_id":"1675","title":null,"description":null,"keywords":null,"keyphrases":null,"primary_term":null,"canonical_url":null,"og_title":null,"og_description":null,"og_object_type":"default","og_image_type":"default","og_image_url":null,"og_image_width":null,"og_image_height":null,"og_image_custom_url":null,"og_image_custom_fields":null,"og_video":"","og_custom_url":null,"og_article_section":null,"og_article_tags":null,"twitter_use_og":false,"twitter_card":"default","twitter_image_type":"default","twitter_image_url":null,"twitter_image_custom_url":null,"twitter_image_custom_fields":null,"twitter_title":null,"twitter_description":null,"schema":{"blockGraphs":[],"customGraphs":[],"default":{"data":{"Article":[],"Course":[],"Dataset":[],"FAQPage":[],"Movie":[],"Person":[],"Product":[],"ProductReview":[],"Car":[],"Recipe":[],"Service":[],"SoftwareApplication":[],"WebPage":[]},"graphName":"BlogPosting","isEnabled":true},"graphs":[]},"schema_type":"default","schema_type_options":null,"pillar_content":false,"robots_default":true,"robots_noindex":false,"robots_noarchive":false,"robots_nosnippet":false,"robots_nofollow":false,"robots_noimageindex":false,"robots_noodp":false,"robots_notranslate":false,"robots_max_snippet":"-1","robots_max_videopreview":"-1","robots_max_imagepreview":"large","priority":null,"frequency":"default","local_seo":null,"breadcrumb_settings":null,"limit_modified_date":false,"ai":null,"created":"2024-04-11 17:03:47","updated":"2025-06-30 02:28:46","seo_analyzer_scan_date":null},"aioseo_breadcrumb":"<div class=\"aioseo-breadcrumbs\"><span class=\"aioseo-breadcrumb\">\n\t\t\t<a href=\"https:\/\/blog.themusio.com\" title=\"\u5bb6\">\u5bb6<\/a>\n\t\t<\/span><span class=\"aioseo-breadcrumb-separator\">&raquo;<\/span><span class=\"aioseo-breadcrumb\">\n\t\t\t<a href=\"https:\/\/blog.themusio.com\/?cat=3640\" title=\"All Articles\">All Articles<\/a>\n\t\t<\/span><span class=\"aioseo-breadcrumb-separator\">&raquo;<\/span><span class=\"aioseo-breadcrumb\">\n\t\t\t<a href=\"https:\/\/blog.themusio.com\/?cat=3642\" title=\"A.I\ud83c\uddfa\ud83c\uddf8\">A.I\ud83c\uddfa\ud83c\uddf8<\/a>\n\t\t<\/span><span class=\"aioseo-breadcrumb-separator\">&raquo;<\/span><span class=\"aioseo-breadcrumb\">\n\t\t\tAlternatives to the softmax layer\n\t\t<\/span><\/div>","aioseo_breadcrumb_json":[{"label":"\u5bb6","link":"https:\/\/blog.themusio.com"},{"label":"All Articles","link":"https:\/\/blog.themusio.com\/?cat=3640"},{"label":"A.I\ud83c\uddfa\ud83c\uddf8","link":"https:\/\/blog.themusio.com\/?cat=3642"},{"label":"Alternatives to the softmax layer","link":"https:\/\/blog.themusio.com\/?p=1675"}],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1675","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1675"}],"version-history":[{"count":2,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1675\/revisions"}],"predecessor-version":[{"id":10871,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1675\/revisions\/10871"}],"wp:attachment":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1675"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1675"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1675"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}