{"id":1503,"date":"2016-06-13T02:12:03","date_gmt":"2016-06-13T02:12:03","guid":{"rendered":"http:\/\/blog.themusio.com\/?p=1503"},"modified":"2024-05-01T11:02:04","modified_gmt":"2024-05-01T02:02:04","slug":"expectation-maximization-algorithm","status":"publish","type":"post","link":"https:\/\/blog.themusio.com\/?p=1503","title":{"rendered":"Expectation Maximization Algorithm"},"content":{"rendered":"<p><strong>Goal<\/strong><br \/>\nIn today&#8217;s summary we have a look at the expectation maximization algorithm that allows to optimize latent variable models when analytic inference of the posterior probability of latent variables is intractable.<\/p>\n<p><strong>Motivation<\/strong><br \/>\nLatent variable models are itself interesting, because they are related to variational autoencoders and encoder-decoder frameworks that are popular in unsupervised and semi-supervised learning.<br \/>\nThey allow to sample from the data distribution and are believed to enhance the expressiveness of the hierarchical recurrent encoder decoder models.<br \/>\nWe can think of them as memorizing higher abstract information, such as emotional states that allow to generate sentimental utterances in the encoder.<\/p>\n<p><strong>Ingredients<\/strong><br \/>\nvariational autoencoder, observable variables, latent variables, maximum likelihood, posterior probability, complete data log likelihood<\/p>\n<p><strong>Steps<\/strong><br \/>\nIn general we are concerned with finding good models, which means determining parameters of this model that can explain the data.<br \/>\nMore mathematically spoken, we try to find the maximum likelihood of the probability function of the model given the observed data.<br \/>\nBut now and then we encounter data sets, where not all data points have labels and we are left with a certain amount of unobserved data.<br \/>\nIt is also sometimes helpful to assume that there simply are hidden variables that lead to the observation of the data.<br \/>\nThese hidden variables are also called latent variables and might better explain the structure of the data.<\/p>\n<p>Then in order to determine the likelihood of the observed data we have to marginalize over these latent variables and have to take into account the values the latent variables can take.<br \/>\nThe sum over these values of the latent variables becomes particular tricky since it sits inside the logarithm of the likelihood function and an analytic solution is hard to find.<\/p>\n<p>Therefore it makes sense to look out for alternatives that may allow us to eventually get a handle on the marginalized log likelihood.<br \/>\nWe could make use of the complete data log likelihood function, if we would be able to observe the latent variables.<br \/>\nFor this we need the posterior probability of the latent variables given the data, which itself is a quantity that is not easily accessible.<br \/>\nBut there are a certain number of tools around that allow to approximate it, like Monte Carlo methods and variational inference.<br \/>\nNow we are at the point where we can calculate the expectation of the complete data log likelihood function and that is also where the naming for the expectation maximization algorithm comes from.<br \/>\nThis first step is also called the E-step and usually followed by the maximization of the expectation called the M-step.<br \/>\nThere is now some math left to be done to show that this algorithm of picking initial parameters of the network, estimating the latent variables and determining the optimized parameters will increase the marginalized log likelihood function.<br \/>\nWe will skip the proof here and only state that the expectation maximization algorithm converges, but it is not ensured that we can reach a global maximum.<br \/>\nHowever in practice the algorithm works well for optimizing latent variable models.<\/p>\n<p><strong>Outlook<\/strong><br \/>\nAlternative methods for inferring the posterior probability of latent variables have recently been under consideration in situations where the sampling process becomes more difficult.<br \/>\nIn particular, variational inference seems to scale well enough to be used in deep neural networks, such as autoencoder or encoder decoder frameworks.<\/p>\n<p><strong>Resources<br \/>\n<\/strong>&#8220;<a href=\"http:\/\/www.ece.rochester.edu\/~zduan\/resource\/eecs349_expectation_maximization.pdf\" target=\"_blank\" rel=\"noopener\">Machine Learning<\/a>&#8221; (PDF). <em>Machine Learning<\/em>. Published Fall 2012. Accessed 10 June 2016.<br \/>\n&#8220;<a href=\"https:\/\/www.cs.utah.edu\/~piyush\/teaching\/EM_algorithm.pdf\" target=\"_blank\" rel=\"noopener\">The Expectation Maximization Algorithm<\/a>&#8221; (PDF). <em>The Expectation Maximization Algorithm<\/em>. Published July 18 2004.\u00a0Accessed 10 June 2016.<br \/>\n&#8220;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Expectation%E2%80%93maximization_algorithm\" target=\"_blank\" rel=\"noopener\">Expectation\u2013maximization algorithm<\/a>&#8221; (WEB).\u00a0<em>Expectation\u2013maximization algorithm<\/em>.\u00a0Accessed 10 June 2016.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Goal In today&#8217;s summary we have a look at the expectation maximization algorithm that allows to optimize [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[3642,3640],"tags":[3656,4016,3896,4018,4020,4022,3898],"class_list":["post-1503","post","type-post","status-publish","format-standard","hentry","category-ai-en","category-all-en","tag-baggage-en","tag-complete-data-log-likelihood-en","tag-latent-variables-en","tag-maximum-likelihood-en","tag-observable-variables-en","tag-posterior-probability-en","tag-variational-autoencoder-en"],"aioseo_notices":[],"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1503","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1503"}],"version-history":[{"count":2,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1503\/revisions"}],"predecessor-version":[{"id":10879,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=\/wp\/v2\/posts\/1503\/revisions\/10879"}],"wp:attachment":[{"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1503"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1503"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.themusio.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1503"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}