Data Voyage - Blog of Holger PetersA personal blog on all things data-science and software engineering, with the occasional endeavours in functional programming
http://www.holger-peters.de/
Fri, 14 Dec 2018 08:27:40 +0100Fri, 14 Dec 2018 08:27:40 +0100Jekyll v3.8.4Optimizing for secondary virtues<p>When your environment claims or strives to be agile, you
will probably stumble over an interesting pattern: Many
people who identifiy with the agile approach, will also give
you simple, straightforward advice (or rule-like guidelines)
on how you can be agile. Such advice typically revolves
around things like sprint-lengths (god forbid if they are
longer than two weeks), the stand-up meeting (everyone
should be there, and it should be < 15 minutes), or the
kanban/task board (each day you should see some movement
there).</p>
<p>What you typically do not hear (at least so infrequent that
it isn’t worth mentioning) are quotes from the agile
manifesto like</p>
<blockquote>
<p>Deliver working software frequently, from a
couple of weeks to a couple of months, with a
preference to the shorter timescale.</p>
</blockquote>
<p>or the Scrum Guide, that can tell us about sprint lenghts</p>
<blockquote>
<p>The heart of Scrum is a Sprint, a time-box of one month or
less during which a “Done”, useable, and potentially
releasable product Increment is created.</p>
</blockquote>
<p>Yet when teams struggle with two-week sprints, you still
hear agile evangelists insisting on two-week sprints,
instead of suggesting the team to deliver work in frequent
increments and test what time-scale is suitable in their
current situation and go on from there.</p>
<p>Similarly, for the daily standup meeting, the <em>Daily Scrum</em>,
the the scrum guide is very clear that this is a meeting
where the development team (which does not include the PO or
the SM) plan their work.</p>
<blockquote>
<p>The Daily Scrum is an internal meeting for the Development
Team. If others are present, the Scrum Master ensures that
they do not disrupt the meeting.</p>
</blockquote>
<p>Yet, many agile enthusiasts will insist onhaving a reporting
meeting with individuals reporting in 1-2 minute time boxes.</p>
<h2 id="second-hand-virtues">Second hand virtues</h2>
<p>We observe these behaviours, because it is easy to focus on
<em>secondary virtues</em> which are easy to remember, than on
<em>primary virtues</em>, which are harder to remember or to
achieve. Pointing out that the team did not deliver within a
two-week time window is easy, finding the right fit between
sprint duration and work-complexity is hard. Focussing daily
on conducting a meeting for the benefit of the development
team is hard, conducting a daily reporting meeting with a 15
minute time box is easy.</p>
<p>The huge problem with focussing on <em>seconday virtues</em> is,
that the <em>primary virtues</em> are forgotten, and then, the
<em>secondary virtues</em> lose their benefit, because they start
to disorient, instead of support people in their pursuit of
the <em>primary virtues</em>.</p>
<h2 id="how-to-break-the-cycle">How to break the cycle</h2>
<p>My hypothesis is, that a lot of secondary virtues are just
simpler to remember and apply. More important but more
complicated guidelines are forgotten or deprioritized in
favour fo the easy-to-remember rules.</p>
<p>This is why we see Scrum Masters fighting 4-week-sprints
with tooth and nail, while they remain fairly silent on
sprints and stories without value delivery.</p>
<p>This is also why we see teams conducting Daily Scrum
meetings that do not help the development team delivering
value, but we see a reporting meeting with tight
time-keeping.</p>
<h2 id="what-can-we-do-about-it">What can we do about it?</h2>
<p>The first step is to rediscover the intent of the
recommendation, reading the primary and early sources.
Then, you can start looking for obvious contradictions
between the way you work and (a) the primary sources and (b)
your model of well-organized and successful work. These
are the starting point for meaningful change and
experimentation. Is it helpful to conduct longer sprints?
Are sprints themselves helpful to delivering value
frequently? How would the developers prefer to communicate
and organize their work? How can the standup become their
meeting?</p>
<blockquote>
<p>We are uncovering better ways of developing
software by doing it and helping others do it.
– agilemanifesto.org</p>
</blockquote>
<p>If you are working in an agile team, you must be given the
leg room to explore, inspect and adapt - just as the
signatories of the agile manifesto did.</p>
Thu, 13 Dec 2018 20:00:00 +0100
http://www.holger-peters.de/agile/2018/12/13/secondary-virtues.html
http://www.holger-peters.de/agile/2018/12/13/secondary-virtues.htmlData-ScienceagileTea Leaf Reading<p>The longer I am concerned with stats and machine-learning,
the more wary I get of what I like to call tea leaf reading,
i.e. substantiating arguments with numbers. The problem
about this is mostly, that the evidence is kind of arbitray.
Without too much effort, someone could come up with a
counterargument substantiated by some other numbers found in
the same context. Truth is, most systems are really complex,
and quite often, it would take substantial efforts to
produce numerical evidence for or against an argument, that
would hold up to statistical scrutiny. And even professions,
that were well aware of basic stats, such as sociology and
psychology, have recently learned how challenging empirical
evidence can be.</p>
<p>So, when I read the Scrum Guide, which emphasizes several
times how empirical the method is, the alarm bells are
starting to chime. I know, some Scrum consultant will use
this, to advice people to track their velocities and base
decisions based upon it. Yet many changes of velocity might
just well be within the range of usual fluctuation of that
quantity. Scrum teams evaluating whether an action they
undertook is effective or not based on their velocity, are
reading tea leaves. Roughly speaking I belief that you can
use a “velocity” measurement (I would prefer throughput as a
metric) as a guide for estimating when a backlog item might
be implemented, plus minus an error margin. I also belief
that the quantity by itself is so volatile and depends on
many parameters PLUS is highly stochastic (meaning, even if
the circumstances and parameters do not change, by
coincidence, the team would take more or less time to
complete comparable work items), that it isn’t of much use
for much.</p>
<p>Much more useful in my opinion is in many cases a
conceptual, deductive argument, and yes, personal experience
and narrative – that is unless I am very sure, that a sound
empirical reasoning can be made.</p>
Sun, 29 Jul 2018 12:05:00 +0200
http://www.holger-peters.de/agile/2018/07/29/tealeafs.html
http://www.holger-peters.de/agile/2018/07/29/tealeafs.htmlpractice,agile,engineeringagileWork Around the Workarounds<p>In their worst incarnation, workarounds can put development
and migrations of software projects to a grinding halt. Their <em>ad hoc</em>
nature — they are rarely a long-term plan — and their
tendency to span across abstraction layers and to depend on
circumstances make them a tough issue in the long run. So just why do we
introduce them in the first place?</p>
<p>Workarounds often start with two things: a goal and a
problem. A goal that we want to achieve, and a problem, that
is deemed either unfixable or too costly/difficult to fix.
The plan is then, to work around the obstacle instead of
solving a root cause of the issue.</p>
<h3 id="software-workarounds-are-hard-to-keep-in-check">Software Workarounds Are Hard To Keep In Check</h3>
<p><em>The workaround is the software-equivalent to using a
bucket under a leaky roof. The difference in software is,
that it is sometimes hard to see when a workaround has
reached the complexity of the fix. You wouldn’t install a
sewer-pipe to that bucket, would you?</em></p>
<p>At the heart of the decision to introduce a workaround are
estimates (either formal or informal ones). Estimates for the effort needed to do a fix of
the root cause, and an estimate for the effort needed to
work around this issue. Even if it isn’t a conscious
estimate, the implementation of a workaround means that
we think, the root-cause-fix is more costly than the
workaround. This is especially tempting within a
time-constraint. A deadline, a merge-window closing, or our
urge to complete the implementation of a feature are all
incentives for us to work around.</p>
<h2 id="the-interest-rate">The Interest Rate</h2>
<p>However, again and again I have observed situations like the
following: <em>The codebase needs to migrated to run on a newer
system. As a consequence, I need to do some adjustments
to reflect API changes. This is when I stumble over some
awkward code, that needs to be changed as well. I
realise test coverage for these functions isn’t great, and I finally see: This is a workaround, at least it used to be. It
takes me about half an hour to fix the actual problem and I can
delete the workaround code without porting it (which takes me another half an hour until I cleared up the remnants). I ask
myself: How much effort has been spent before, when
migrating workaround code once already seems more
difficult than fixing the problem worked-around?</em></p>
<p>The realisation: Work arounds are a mortgage. They may ease
your life for the moment, but in the end you need to pay
back with interest.</p>
<p>So, when we often opt for the workaround, and underestimate
its costs, why is this? It is, because we don’t make an
estimate for the <em>interest rate</em> that we will have to pay for
the workaround. It is the fixation on a single goal, the
deadline, the closing of a ticket, deploying a new version
of the software, or getting it to run in some other
configuration. We make them our primary goals and any means
are just right to achieve them.</p>
<h3 id="comfort-zones">Comfort Zones</h3>
<p>Another important factor is comfort. Workarounds happen in
our artefacts, in our comfort zone, when the root cause is
often in an <em>upstream</em> package, an (hopefully open source)
software or parts of the code base we just don’t know so
well. So when there is a bug in a dependency of ours, we are
much more likely to work around it, than to submit a PR with
a fix.</p>
<p>Even more astonishing is the observation, that many
of such workarounds are not even accompanied with bug
report in the upstream package’s issue tracker, which is the
lowest effort I can think of to get an upstream issue fixed.</p>
<h2 id="avoiding-the-workaround">Avoiding the Workaround</h2>
<p>Surely, not every workaround can be avoided, so my goal is,
to not introduce them lightheartedly, but also be reasonable
when they are needed. A checklist might help here:</p>
<ul>
<li>
<p>Has the (root) cause of our problem been triaged and
identified? <em>More often than not are we acting on rough
assumptions or loose correlations.</em></p>
</li>
<li>
<p>Have experts been consulted (colleagues, StackOverflow,
supporting consultants, IRC, …) <em>If workarounds are
endorsed by third-parties it might be a hint that a
workaround is indeed the best resolution for now.</em></p>
</li>
</ul>
<p>Don’t start to implement a workaround over an upstream issue
without having reported the problem upstream (bug ticket
filed/opened) or having verified that someone else has
reported this as an issue.</p>
<p><em>Has the option to fix the root cause been discussed?</em> Too
often, it is not considered at all as an option to fix the
root cause of a problem. So make sure you actually talk
about this as an option.</p>
<h2 id="embracing-the-workaround">Embracing The Workaround</h2>
<p>There are tons of reasons why, even after fixing the root
cause upstream, a workaround might still be necessary.
Classic example: OSS release cycles are longer than your
sprint. Or the project is not responsive to your pull
request. In such cases we don’t have a choice and need to implement a temporary workaround. Then, we definitely need to make sure, that when we
implement the workaround, it is marked as such in code and
communicated accordingly. If a new colleague arrives, they
shouldn’t have to assume that “it seems this is the way XYZ
is done here”, but they should be able to see, that this
patch of code is a workaround that can hopefully be removed
already, or in the foreseeable future.</p>
Sun, 19 Mar 2017 12:05:00 +0100
http://www.holger-peters.de/python/2017/03/19/workarounds.html
http://www.holger-peters.de/python/2017/03/19/workarounds.htmlpractice,agile,engineeringPythonBayes'n'Bootstrap<p>With the advent of machine learning into our IT landscapes,
a previously rather academic conflict of the statistical
community surfaces in blogs and other forums of discussion
every other week. It is the question of <em>frequentism</em> vs.
<em>Bayesianism</em>. This debate, often one that is as emotional
as the famous <em>editor-wars</em>, is in fact a very fundamental
one that touches the foundations of statistics and
probability theory. In that sense, it isn’t your usual bike
shedding discussion, even if it is sometimes lead as one.
Metaphorically, it is a custody trial to determine who may
claim the interpretational sovereignty over nothing less
than the <em>Theory of Probability</em>.</p>
<p>Frequentism and Bayesianism are both established approaches
to statistics. Their differences start, with their core
definitions. frequentism treats probabilities as ratios of
frequency-counts collected from an infinite number of
trials; and frequentist practitioners will tell you that a
finite number of trials will also suffice (as in: from 100
coin flips, 50 times we will obtain <em>head</em>, thus the
probability for head is <script type="math/tex">\wp(H=1)=0.5</script>).</p>
<p>For Bayesianists, probabilities are <strong>degrees of belief</strong>;
also Bayesianists use Bayes’ theorem for inference. A
Bayesianist would take the probability <script type="math/tex">\wp(H=1)=0.5</script> in
the coin-flip example above to mean something like “It is
credible, that head and non-head (tail) are results of a
coin flip, without one option being more likely than the
other”. A programmer can think of the Bayesian
interpretation of probabilities as an extension of Boolean
algebra: <code class="highlighter-rouge">true</code> (1, <em>firm-belief</em>) and <code class="highlighter-rouge">false</code> (0,
<em>firm-disbelief</em>) are complemented with a spectrum of
values <script type="math/tex">0 \ldots 1</script>.</p>
<p>These brief characterizations are already enough to
understand much of the criticism either method faces:</p>
<ul>
<li>Bayesian probabilities are criticised as “subjective” or
as not a genuine measurement parameter (degree of belief).</li>
<li>frequentist probabilities are said to be limited to
infinitely repeatable trials, and thus not applicable to
any real world data set, with a finite number of
measurements.</li>
</ul>
<p>This criticism is too simplistic, however. And to those, who
strongly associate with one camp, there are probably many
embarassing commonalities: Both approaches often lead to
very similar results. In this post I will show you how you
can solve a problem with both methods and compare the
results.</p>
<h1 id="estimating-the-probability-for-a-bernoulli-trial">Estimating The Probability for a Bernoulli-Trial</h1>
<p>We all expect coins to be fairly balanced. I.e. if we flip a
coin, we expect to roughly obtain head half of the times,
and tail the other half of the times. Yet there are many
processes with two outcomes, where we don’t know the
individual probabilities beforehand. For example, a
researcher might be interested in the immunization rate of a
population.</p>
<p>Our researcher determines the immunization rate of <script type="math/tex">N=40</script>
people. The measured results could be a series of numbers
(1 for immunized and 0 for not-immunized) like: <code class="highlighter-rouge">[1 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 0 0 1 1 1 0 1 1 0 1 1
1 1 1 1 1]</code>.</p>
<p>The questions we are bound to solve are</p>
<ul>
<li>What are the immunization rates (what is the probability
for a person to be immunized)?</li>
<li>How reliable (and under what circumstances) would that
inferred probability be?</li>
</ul>
<h1 id="frequentist-approaches">Frequentist Approaches</h1>
<p>I divided this section into three parts</p>
<ol>
<li>first we apply a common-sense approach to the problem</li>
<li>then, we see that our first approach is in fact the
solution of the maximum-likelihood approach</li>
<li>we apply the bootstrap method to get more than just the
maximum-likelihood estimate of the immunization rate.</li>
</ol>
<h2 id="common-sense-naïve-treatment">Common-sense (naïve) Treatment</h2>
<p>A very simple approach to this problem is to just count the
number of immunized and the number of people screened. for
the above list, we have <script type="math/tex">k = 34</script> immunized people of a
total of <script type="math/tex">N = 40</script> people, which leads to an immunization
probability of <script type="math/tex">0.85</script>.</p>
<p>If the researcher had only screened the first 20 people, the
result would have looked a bit different, <script type="math/tex">0.95</script>. If we
had only looked at the probabilities from people 20-40, we
would have gottten a probability lower than <script type="math/tex">0.85</script>. Thus,
we have a method that gives us immunization rates, yet it
heavily depends on the sample size. Also, we don’t have a
means to quantify how certain we are about these enumbers.</p>
<h2 id="maximum-likelihood-estimate">Maximum Likelihood Estimate</h2>
<p>The likelihood-function <sup id="fnref:likelihood"><a href="#fn:likelihood" class="footnote">1</a></sup>: <script type="math/tex">L(\mu \mid N, k)</script>
is the probability <script type="math/tex">k</script> immunized subjects of <script type="math/tex">N</script>
subjects in total, under the condition of a parameter <script type="math/tex">\mu</script>, which we’ll write down as <script type="math/tex">\wp(N, k \mid \mu)</script>. We
identify, that his <script type="math/tex">\wp(N, k \mid \mu)</script> is the
<a href="https://www.wikiwand.com/en/Binomial_distribution">binomial distribution</a>.</p>
<script type="math/tex; mode=display">L(\mu) = \wp(N, k \mid \mu) = \text{Binomial}(k\mid N, \mu) = \binom N k \mu^k(1-\mu)^{N-k}</script>
<p>We now want to find the <script type="math/tex">\mu</script> that maximizes the
likelihood <script type="math/tex">L(\mu)</script>. We could of course work out the
equations by hand. I use <a href="www.sympy.org">sympy</a> here, which
will do all the tiresome calculations for us:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>In [1]: N, k, mu = symbols("N, k, mu")
In [2]: likelihood = binomial(N, k) * mu**k * (1-mu)**(N-k)
</code></pre></div></div>
<p>Sympy will nicely render the likelihood term</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>In [3]: likelihood
Out[3]:
k N - k ⎛N⎞
μ ⋅(-μ + 1) ⋅⎜ ⎟
⎝k⎠
</code></pre></div></div>
<p>Now let’s see if sympy can come up with the derivative with
respect to <script type="math/tex">\mu</script>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>In [4]: diff(likelihood, mu)
Out[4]:
k N - k ⎛N⎞ k N - k ⎛N⎞
k⋅μ ⋅(-μ + 1) ⋅⎜ ⎟ μ ⋅(-N + k)⋅(-μ + 1) ⋅⎜ ⎟
⎝k⎠ ⎝k⎠
────────────────────── + ─────────────────────────────
μ -μ + 1
</code></pre></div></div>
<p>Finally, we are only interested in the value of <script type="math/tex">\mu</script>,
for which the derivative is zero.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>In [5]: solve(diff(likelihood, mu), mu)
Out[5]:
⎡k⎤
⎢─⎥
⎣N⎦
</code></pre></div></div>
<p><strong>Result:</strong> We obtain <script type="math/tex">\mu = \frac{k}{N}</script> as as the
maximum likelihood estimate, which basically is what we
expected as the naïve result.</p>
<h2 id="the-bootstrap-method">The Bootstrap Method</h2>
<p>We can use the fact, that different subsets of our data
yield different results, to get a better picture of the
reliability / variance of our probabilities. Just like
above, we will take a look at subsets of the total data set.
This time we will take a systematic approach, we will</p>
<ul>
<li>construct a new data sets from the recorded trials by
randomized sampling with replacement. The new data set has
the same size as the original one, but might contain some
data points multiple times, and other data-points will be
missing.</li>
<li>calculate rates on the newly contructed data-set.</li>
<li>Repeat this <strong>many</strong> times to get as many rates as
possible (best: calculate it for all possible
combinations, but mind the combinatorial explosion)</li>
<li>plot the histogram of these rates (each subsample yielding
one data point).</li>
</ul>
<h3 id="distribution-of-rates-in-subsample">Distribution of Rates in Subsample</h3>
<p>The histogram below was made from a sample of 100
measurements, drawing many (100000) subsamples of 100
measurements, from which the rates were calculated.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="n">N</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">scipy</span><span class="o">.</span><span class="n">stats</span><span class="o">.</span><span class="n">bernoulli</span><span class="p">(</span><span class="mf">0.86</span><span class="p">)</span><span class="o">.</span><span class="n">rvs</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">N</span><span class="p">)</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">fromiter</span><span class="p">((</span><span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">N</span><span class="p">))</span>
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100000</span><span class="p">)),</span>
<span class="n">dtype</span><span class="o">=</span><span class="nb">float</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">50</span><span class="p">))</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">"Rate"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">"Occurences"</span><span class="p">)</span></code></pre></figure>
<p><img src="/assets/images/histogram.png" alt="histogram" /></p>
<p>Depending on the subsample, we get different results for the
calculated rate. All calculated rates from a distribution.
We must assume, that the single rate calculated in the
maximum likelihood approach above is just as noisy as the
rates calculated in the bootstrap approach, because we
assume independence of the individual events. So the
bootstrapped rate distribution gives us an idea on how
credible and accurate the maximum-likelihood rate is.</p>
<h1 id="bayesian-approach">Bayesian Approach</h1>
<p><strong>Note:</strong> <em>If you aren’t so much interested in Bayes’
theorem, you can just scroll down to the heading
“Incremental updates” and enjoy the graphs.</em></p>
<h2 id="bayes-theorem">Bayes Theorem</h2>
<p>Bayes’ theorem is the central hub of Bayesian methods. It
is, however, not a postulated assumption, just happening to
work, but a direct consequence of <em>conditional
probabilities</em>. If you look at the probability for two
propositions<sup id="fnref:prop"><a href="#fn:prop" class="footnote">2</a></sup> <script type="math/tex">A</script> and <script type="math/tex">B</script> — <script type="math/tex">\wp(A,
B)</script> — you can express this joint probability by
conditional probabilities:</p>
<script type="math/tex; mode=display">\wp(A, B) = \wp(A\mid B) \wp(B) = \wp(B \mid A) \wp (A)</script>
<p>If we take <script type="math/tex">A</script> for the probability of the street to be
wet, and <script type="math/tex">B</script> for the probability of rainfall in the last
hour, then <script type="math/tex">\wp(A, B)</script> is the probabilty for <em>rainfall and
a wet street</em>. <script type="math/tex">\wp(A\mid B)</script> is the conditional
probability for a wet street, <strong>given</strong> that it has rained,
<script type="math/tex">\wp(B \mid A)</script> the conditional probability for rainfall,
given that the street is wet; and <script type="math/tex">\wp(B)</script> is the
probability of rainfall, <script type="math/tex">\wp(A)</script> is the probability of a
wet street.</p>
<p>We can rearrange the above equation dividing through <script type="math/tex">\wp(B)</script> on both sides and get an equation that expresses
the conditional probability <script type="math/tex">\wp(A\mid B)</script> by the inverse
probability <script type="math/tex">\wp(B\mid A)</script>.</p>
<script type="math/tex; mode=display">\wp(A\mid B) = \frac{\wp(B \mid A)}{\wp(B)} \wp(A)</script>
<p>Until now, <em>I think</em>, frequentists and Bayesianists can
agree. The disagreement starts on when and how to use this
equation. Bayesianists infer (conclude) <script type="math/tex">\wp(A\mid B)</script>
on the left side of the equation from the right hand side.
Whereas critics of Bayesianism consider this to be a
dangerous endeavour.</p>
<h3 id="bayes-theorem-and-model-estimation">Bayes’ Theorem and model estimation</h3>
<p>Bayesian statistics is concerned with data <script type="math/tex">D</script>
and a model <script type="math/tex">M</script> (a fit parameter, a parameter of a
distribution, a quantity that should be inferred). By
substituting <script type="math/tex">A\to M, B \to D</script>, we get Bayes’ theorem with
Bayesian semantics:</p>
<script type="math/tex; mode=display">\wp(M\mid D) = \frac{\wp(D \mid M)}{\wp(D)} \wp(M)</script>
<p>The probabilities involved are:</p>
<ul>
<li>
<p><script type="math/tex">\wp(M\mid D)</script> is the probability for the model, given
data. Obtaining this probability distribution, we have the
inference (Inference means nothing but: Estimating a model
from/given data).</p>
<p>We also call this quantity the <em>posterior</em>.</p>
</li>
<li><script type="math/tex">\wp(D\mid M)</script> is the probability for data, given the
model. This is the <em>likelihood</em> we have already seen above
<sup id="fnref:likelihood:1"><a href="#fn:likelihood" class="footnote">1</a></sup>.</li>
<li><script type="math/tex">\wp(M)</script> is the prior, a probability (distribution), that
is independent of <script type="math/tex">D</script>.</li>
<li><script type="math/tex">\wp(D)</script> is the probability for data<sup id="fnref:evidence"><a href="#fn:evidence" class="footnote">3</a></sup>. For this
treatment here, we can think of it as a normalization
constant (which can be very costly to compute).</li>
</ul>
<p>The most important aspect of this theorem is: <strong>We can
express the probability for a model, given data by some
term, that involves the probability for data, given the
model</strong>.</p>
<p>Why is this important? Because it is often much easier to
give an expression for the, than to come up with the
posterior <script type="math/tex">\wp(M\mid D)</script> directly.</p>
<h2 id="bayesian-estimation-of-immunization-rates">Bayesian Estimation of Immunization Rates</h2>
<p>The immunization rate <script type="math/tex">\mu</script> that we are looking for is a
probability (in the frequentist sense). Nevertheless, it is
in the Bayesian sense also model parameter <script type="math/tex">M = \mu</script>,
whose probability distribution given data <script type="math/tex">\wp(\mu\mid
D)</script> can be inferred using Bayes’ theorem. We would like to
find the distribution for this parameter <script type="math/tex">\wp(\mu\mid D)</script>, so that we can get an expectancy value <script type="math/tex">\mathcal{E}[\mu
\mid D]</script> and the variance.</p>
<p>Bayes rule gives us this distribution. First, we will ignore
the denominator, which is more of a normalization parameter,
without loss of generality, as</p>
<script type="math/tex; mode=display">\wp(\mu\mid D) \propto \wp(D \mid \mu) \wp(\mu).</script>
<p>What could <script type="math/tex">\wp(D\mid \mu)</script> be? During the trial, we have
<script type="math/tex">k</script> observations of immunized subjects, out of <script type="math/tex">N</script>
observations in total, so <script type="math/tex">\wp(D\mid \mu)= \wp(k \mid N,
\mu)</script>. Does the k-out-of-N sound familiar? It is what the
<a href="http://en.wikipedia.org/wiki/Binomial distribution">Binomial distribution</a>
describes</p>
<script type="math/tex; mode=display">\wp(N, k \mid \mu) \sim \text{Binomial}(k\mid N, \mu).</script>
<p>Now we need a suitable expression for <script type="math/tex">\wp(\mu)</script>. It needs
to be a probability distribution that just contains the
model alone (no data). This is a tough choice (and one of
the main sources for distrust of the Bayesian methods,
probably). Luckily, the Bayesian literature tells us, that
the Beta-distribution is a suitable choice for this,</p>
<script type="math/tex; mode=display">\wp(M) \sim \text{Beta}(\mu\mid \alpha_0, \beta_0)</script>
<p>and <script type="math/tex">\alpha_0</script> and <script type="math/tex">\beta_0</script> are the <em>a priori</em>
parameters. Choosing <script type="math/tex">\alpha_0=\beta_0=1</script> for example
would assume a uniform prior distribution between <script type="math/tex">0 \leq
\mu \leq 1</script>. I use <script type="math/tex">\alpha_0=\beta_0=\frac{1}{2}</script>
<sup id="fnref:jeffrey"><a href="#fn:jeffrey" class="footnote">4</a></sup>.</p>
<h3 id="incremental-updates">Incremental updates</h3>
<p>Due to some neat properties of binomial and beta
distribution, what follows is that Bayes’ theorem in this
instance simplifies to a very simple rule. Starting with a
Beta-prior distribution, we can obtain the posterior
distribution by just adding our observed <script type="math/tex">k</script> and <script type="math/tex">N-k</script>
data points to the parameters of the beta distribution.</p>
<script type="math/tex; mode=display">\wp(\mu \mid N,k) = \text{Beta}(\mu\mid \alpha_0 + k, \beta_0 + (N-k))</script>
<h2 id="data">Data</h2>
<p>Using the above beta-prior model with our collected data, we
can obtain posterior distributions for the immunization
rate. The following plot shows such distributions, adding
more data with each subplot.</p>
<p><img src="/assets/images/distributions.png" alt="distributions" /></p>
<p>In the plot, we can follow along, how wit hmore data, the
distribution gets narrower (i.e. with more data, we are more
certain). The first subplot labelled <script type="math/tex">N= 0</script> is the prior
distribution.</p>
<p>The infered value for the immunization rate, <script type="math/tex">\mu</script>, is
the expectancy value of these distributions (represented by
the vertical black lines in the plot). For comparison, the
red, dotted line is the true immunization rate, that we
know, because we put it into the random number generator
that provides us with the data set.</p>
<p>We can see in the plots, that gradually our estimate
changes, as the data fluctuates, although with many
measurements it seems to stabilize. The learn curve is a
good way to visualize this.</p>
<h1 id="learn-curve-for-bayes">Learn curve for Bayes</h1>
<p>Pretending that we only perform one trial at a time, we can
plot a learning curve that shows how our method performs and
how the inference quality improves with more data:</p>
<p><img src="/assets/images/learncurvebayes.png" alt="learn curve for bayes" /></p>
<h1 id="learn-curve-for-boostrap">Learn curve for Boostrap</h1>
<p>We have plotted a learning curve for the Bayesian approach,
how does the bootstrap method compare? Let’s add a learn curve
for the bootstrap method to the plot:</p>
<p><img src="/assets/images/learncurve-both.png" alt="learn curve of both" /></p>
<p>What we can see here is, that with enough data points, both
methods give very similar results, to the point where they
seem equivalent. However, the Bayesian approach is better
when infering from fewer data points. Why is that?</p>
<ul>
<li>
<p>Bootstrap relies on constructing ad-hoc data-sets from the
original samples. With few data points, it suffers from
the same bias as the original sample.</p>
<p>In our data set, the first few samples are <code class="highlighter-rouge">1</code>
consistently, and thus the bootstrap approach must yield a
rate of 1 and 0-sized error bars.</p>
</li>
<li>
<p>The Bayesian approach uses a prior. This prevents the
method from focussing too strongly on the first few
events. Recording the first event, we do have error bars
that are fairly large (which fits nicely with our
expectation of being uncertain about the true value of our
parameter).</p>
</li>
</ul>
<p>The subtle, systematic difference that remains between the
Bayesian and the bootstrap method for larger sample sizes
(the Bayesian rate is consistently smaller than the
bootstrapped rate) is the influence of the prior, that never
completely vanishes (although it is negligible given the
statistical fluctuations).</p>
<h1 id="conclusion">Conclusion</h1>
<p>There are many conclusions that one can draw from this
simple example. Of course not everything that we can learn
from this example generalize to all questions about Bayes
and frequentism. So I’ll limit my conclusion here to the
one very simple advice: I learned a lot more about
statistical methods and algorithms by constantly looking at
how Bayesianists and frequentists approach and derive them.
Most statistical topics are treated in Bayesian and
frequentist literature.</p>
<h1 id="notes">Notes</h1>
<p>Thank you Christopher and Daniel for your feedback on this
blog post.</p>
<h2 id="further-reading">Further Reading</h2>
<ul>
<li><a href="http://www.sumsar.net/blog/2015/04/the-non-parametric-bootstrap-as-a-bayesian-model/">The Non-parametric Bootstrap as a Bayesian Model</a>
by Rasmus Bååth</li>
</ul>
<h2 id="footnotes">Footnotes</h2>
<div class="footnotes">
<ol>
<li id="fn:likelihood">
<p>It is often said, that the likelihood isn’t a
probability (distribution), but another kind of function.
This isn’t correct. The likelihood is a probability
(distribution). What people mean when they say it isn’t
a probability is, that they don’t use it as a
probability (distribution) in that context, but as a
function of some parameter, that is maximized (maximum
likelihood estimation). <a href="#fnref:likelihood" class="reversefootnote">↩</a> <a href="#fnref:likelihood:1" class="reversefootnote">↩<sup>2</sup></a></p>
</li>
<li id="fn:prop">
<p>A proposition is a statement that can either be true or
false. <a href="#fnref:prop" class="reversefootnote">↩</a></p>
</li>
<li id="fn:evidence">
<p>It doesn’t have a much of a canonical name. Calling it
the <em>evidence</em> is popular. Since <script type="math/tex">\wp(D) = \sum \wp(D
\mid M_i) \wp(M_i)</script>, I like “marginalized likelihood”. <a href="#fnref:evidence" class="reversefootnote">↩</a></p>
</li>
<li id="fn:jeffrey">
<p>The prior distribution <script type="math/tex">\text{Beta}\left(\mu \mid
\frac{1}{2}, \frac{1}{2}\right)</script> is called the
<a href="https://www.wikiwand.com/en/Jeffreys_prior">Jeffreys prior</a>.
Choice of priors would be the material for a series of
blog posts. <a href="#fnref:jeffrey" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Sun, 11 Dec 2016 11:48:00 +0100
http://www.holger-peters.de/data-science/2016/12/11/bayes-and-bootstrap.html
http://www.holger-peters.de/data-science/2016/12/11/bayes-and-bootstrap.htmldata sciencebayesData-ScienceFixing Python-only feeds<p>I moved this blog to a new blog software and had lost the
ability to build custom feeds on tags. This should at least
work now for the Python tag again and so your feed readers
should be able to load <a href="http://www.holger-peters.de/feeds/python.atom.xml">Python feed</a>
again. Let me know if this is not working.</p>
<p>The IDs of posts in that feed are probably not the same as
in the static site generator I used before, so your feed
reader might show some old posts as unread, although you
have read them already.</p>
Sat, 03 Dec 2016 21:08:00 +0100
http://www.holger-peters.de/python/2016/12/03/feed.html
http://www.holger-peters.de/python/2016/12/03/feed.htmlPythonpythonUsing pyenv and tox<p>I usually use <a href="https://github.com/yyuu/pyenv">pyenv</a> to manage my Python
interpreters and obtain them in whatever version I need. Another tool I
occasionally use is <a href="tox.readthedocs.io">tox</a> by Holger Krekel, which
nicely generates build matrices for library and python interpreter
versions, that come handy when you develop a library targeting multiple
Python versions (and dependencies).</p>
<p>However, until recently I didn’t know how to use the two of them
together. With <code class="highlighter-rouge">pyenv</code>, I usually ended up with one python interpreter
in my path, so tox had only one interpreter to choose from, and I was
missing out on tox’ selling point: testing your code over various
versions of Python.</p>
<h1 id="install-multiple-python-version-with-pyenv">Install Multiple Python Version With Pyenv</h1>
<p>Setting up your pyenv usually looks like this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>% pyenv install 3.5.1
% pyenv install 2.7.10
% cd my_project_dir
% pyenv local 3.5.1
</code></pre></div></div>
<p>Now it is possible to use multiple Python versions here:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>% pyenv local 3.5.1 2.7.10
% python3.5 --version
Python 3.5.1
% python2.7 --version
Python 2.7.10
</code></pre></div></div>
<p>Then, tox can find interpreters, typically you will have a <code class="highlighter-rouge">tox.ini</code> in
your project that starts with something like this:</p>
<figure class="highlight"><pre><code class="language-ini" data-lang="ini"><span class="nn">[tox]</span>
<span class="py">envlist</span> <span class="p">=</span> <span class="s">py27,py34,py35</span>
<span class="py">skip_missing_interpreters</span> <span class="p">=</span> <span class="s">True</span>
<span class="nn">[testenv]</span>
<span class="py">commands</span><span class="p">=</span><span class="s">py.test</span>
<span class="py">deps</span> <span class="p">=</span> <span class="s">-rrequirements.txt</span></code></pre></figure>
<p>Invoking <code class="highlighter-rouge">tox</code> should now run tox with the two available Python
versions, 2.7 and 3.5, skipping 3.4 unless it is installed.</p>
Sat, 14 May 2016 17:26:00 +0200
http://www.holger-peters.de/using-pyenv-and-tox.html
http://www.holger-peters.de/using-pyenv-and-tox.htmlPythontoxpytestpythonLearning Haskell by Type (Signatures)<p>Getting a better understand of Haskell has always been on my list. My
typical toolbox<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> for learning another programming language is not so
effective with Haskell, because in contrast to say - Ruby<sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup> - learning
Haskell requires me to learn new concepts. On the other hand, Haskell
offers some unique features, which make learning it surprisingly easy
again. Of these tools, type signatures have quickly become invaluable.
Embrace them or perish, I might say, for if you don’t learn to utilize
them, everything people typically criticize about the Haskell ecosystem
(sparse documentation, obscure love for operators, being completely lost
in abstraction) will hit you hard. On the other hand, if you learn to
read the Haskell type (signatures), you often know things from quick,
formal considerations early on, without having even started to think
about the semantics of that piece of code.</p>
<p>Much can be written about type signatures but in this blog post, I try
to focus on type signatures of Haskell’s most common abstractions, and
point out some patterns and parallels in them (and as it turns out,
these are not only parallels in the type signatures, but in semantics
too.)</p>
<p>With all the talk about Monads, a lot of introductory material kind of
leaves out Functors, Applicative Functors und the merrits of Applicative
Functor style. If you have so far diligently learned some Haskell, but
were put off by Haskell’s liberal use of <em>weird</em> operators, applicative
Functor style will show you how operators can be used for great benefit.</p>
<p>The following compilation is of things I rather understood recently, so
bear in mind, that I might have missed one or the other connection.</p>
<h1 id="overview">Overview</h1>
<p>The purpose of this blog post is to explain some properties of typical
Haskell type classes by looking at the type signatures<sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup> of their
member functions. So first of all, the objective is to have these
signatures ready for reading. The following signatures were infered by
looking for the type signatures interactively in ghc’s ghci prompt. One
can also <a href="https://hackage.haskell.org/package/base-4.8.2.0/docs/Control-Applicative.html">look into the
source</a>,
though, they should tell you the same.</p>
<h2 id="normal-functions">Normal Functions</h2>
<p>We’ll start having a look at normal function applications.</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="p">(</span><span class="o">$</span><span class="p">)</span> <span class="o">::</span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="n">a</span> <span class="o">-></span> <span class="n">b</span>
<span class="p">(</span><span class="o">.</span><span class="p">)</span> <span class="o">::</span> <span class="p">(</span><span class="n">b</span> <span class="o">-></span> <span class="n">c</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="n">a</span> <span class="o">-></span> <span class="n">c</span></code></pre></figure>
<p>The . operator for function composition allows us in Haskell to write
<code class="highlighter-rouge">(f . g) x</code> instead of <code class="highlighter-rouge">f (g x)</code>.</p>
<p><code class="highlighter-rouge">$</code> is a low-priority operator which represents the function
application, so instead of <code class="highlighter-rouge">f x</code>, we can also write <code class="highlighter-rouge">f $ x</code>. It is
mostly used to avoid parentheses in code (to write <code class="highlighter-rouge">f . g $ x</code> for the
above example), but in this blog post, I will use it to represent
function application in general.</p>
<h2 id="functor">Functor</h2>
<p>In a functional, statically typed programming language without the
mathematical obsession of the haskell community, a Functor might have
been named “Mappable”. Haskell took the name Functor<sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup> from a
<a href="http://www.wikipedia.com/wiki/Functor">mathematical concept in category
theory</a></p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="p">(</span><span class="o"><$></span><span class="p">)</span> <span class="o">::</span> <span class="kt">Functor</span> <span class="n">f</span> <span class="o">=></span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="n">f</span> <span class="n">a</span> <span class="o">-></span> <span class="n">f</span> <span class="n">b</span></code></pre></figure>
<p>Depending on personal preference and style, there is also <code class="highlighter-rouge">fmap</code>, which
is just another name for <code class="highlighter-rouge">(<$>)</code>.</p>
<h2 id="applicative">Applicative</h2>
<p>An Applicative is a special kind of Functor, that extends Functors. It
features the operator <code class="highlighter-rouge"><*></code> for sequencing computations (combining their
results), and <code class="highlighter-rouge">pure</code>, a function to bring values into an applicative
context.</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="n">pure</span> <span class="o">::</span> <span class="kt">Applicative</span> <span class="n">f</span> <span class="o">=></span> <span class="n">a</span> <span class="o">-></span> <span class="n">f</span> <span class="n">a</span>
<span class="p">(</span><span class="o"><*></span><span class="p">)</span> <span class="o">::</span> <span class="kt">Applicative</span> <span class="n">f</span> <span class="o">=></span> <span class="n">f</span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="n">f</span> <span class="n">a</span> <span class="o">-></span> <span class="n">f</span> <span class="n">b</span> <span class="c1">-- sequential application</span></code></pre></figure>
<p>While <code class="highlighter-rouge">pure</code> and <code class="highlighter-rouge"><*></code> constitute a minimal implementation, typically
the operators <code class="highlighter-rouge"><*</code> and <code class="highlighter-rouge">*></code> are also used, which discard some
computation results instead of combining them like <code class="highlighter-rouge"><*></code>, this is very
handy when <a href="https://hackage.haskell.org/package/megaparsec-4.4.0">writing
megaparsec</a>
parsers. My mnemonic to not confuse them: the angle bracket points to
values the value not discarded:</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="p">(</span><span class="o">*></span><span class="p">)</span> <span class="o">::</span> <span class="kt">Applicative</span> <span class="n">f</span> <span class="o">=></span> <span class="n">f</span> <span class="n">a</span> <span class="o">-></span> <span class="n">f</span> <span class="n">b</span> <span class="o">-></span> <span class="n">f</span> <span class="n">b</span> <span class="c1">-- discard the first value</span>
<span class="p">(</span><span class="o"><*</span><span class="p">)</span> <span class="o">::</span> <span class="kt">Applicative</span> <span class="n">f</span> <span class="o">=></span> <span class="n">f</span> <span class="n">a</span> <span class="o">-></span> <span class="n">f</span> <span class="n">b</span> <span class="o">-></span> <span class="n">f</span> <span class="n">a</span> <span class="c1">-- discard the second value</span></code></pre></figure>
<p>Just by looking at the type signature, you can infer that <code class="highlighter-rouge">(*>)</code> keeps
its right-hand-side value and discards the one to the left, because
<code class="highlighter-rouge">f a -> f b -> f b</code></p>
<h2 id="monad">Monad</h2>
<p>Monads are characterized by the bind operator <code class="highlighter-rouge">>>=</code> and the <code class="highlighter-rouge">return</code>
operator. <code class="highlighter-rouge">>>=</code> passes a “monadic” value <code class="highlighter-rouge">m a</code> to a monadic function
<code class="highlighter-rouge">(a -> m b)</code>, <code class="highlighter-rouge">return</code> puts a value into a monadic container.</p>
<p>Monads are also Applicatives and Functors, i.e. they also implement
<code class="highlighter-rouge"><$></code>, <code class="highlighter-rouge"><*></code>, etc.</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="c1">-- Sequentially compose two actions, passing any value produced</span>
<span class="c1">-- by the first as an argument to the second</span>
<span class="p">(</span><span class="o">>>=</span><span class="p">)</span> <span class="o">::</span> <span class="kt">Monad</span> <span class="n">m</span> <span class="o">=></span> <span class="n">m</span> <span class="n">a</span> <span class="o">-></span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">m</span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="n">m</span> <span class="n">b</span>
<span class="n">return</span> <span class="o">::</span> <span class="kt">Monad</span> <span class="n">m</span> <span class="o">=></span> <span class="n">a</span> <span class="o">-></span> <span class="n">m</span> <span class="n">a</span>
<span class="p">(</span><span class="o">>></span><span class="p">)</span> <span class="o">::</span> <span class="kt">Monad</span> <span class="n">m</span> <span class="o">=></span> <span class="n">m</span> <span class="n">a</span> <span class="o">-></span> <span class="n">m</span> <span class="n">b</span> <span class="o">-></span> <span class="n">m</span> <span class="n">b</span> <span class="c1">-- discards value of first monad</span>
<span class="p">(</span><span class="o"><=<</span><span class="p">)</span> <span class="o">::</span> <span class="kt">Monad</span> <span class="n">m</span> <span class="o">=></span> <span class="p">(</span><span class="n">b</span> <span class="o">-></span> <span class="n">m</span> <span class="n">c</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">m</span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">m</span> <span class="n">c</span><span class="p">)</span> <span class="c1">-- kleisli composition</span></code></pre></figure>
<p>Note: Trying to explain a Monad by allegories and metaphors is in my
experience often futile (and a common pitfall for Haskell learners). Way
more effective is to gain some basic understanding on the type level and
imitate Monad usage with various examples.</p>
<h1 id="operations-that-apply">Operations that Apply</h1>
<p>If you think about it, the <code class="highlighter-rouge"><*></code> operation of the Applicative
(sequential application) and the function application operator <code class="highlighter-rouge">$</code> have
a pretty similar signature, this is also true for <code class="highlighter-rouge"><$></code>, the map
operation</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="p">(</span><span class="o">$</span><span class="p">)</span> <span class="o">::</span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="n">a</span> <span class="o">-></span> <span class="n">b</span>
<span class="p">(</span><span class="o"><$></span><span class="p">)</span> <span class="o">::</span> <span class="kt">Functor</span> <span class="n">f</span> <span class="o">=></span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="n">f</span> <span class="n">a</span> <span class="o">-></span> <span class="n">f</span> <span class="n">b</span>
<span class="p">(</span><span class="o"><*></span><span class="p">)</span> <span class="o">::</span> <span class="kt">Applicative</span> <span class="n">f</span> <span class="o">=></span> <span class="n">f</span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="n">f</span> <span class="n">a</span> <span class="o">-></span> <span class="n">f</span> <span class="n">b</span></code></pre></figure>
<p>The first operand of those operators all map from one type <code class="highlighter-rouge">a</code> to the
other <code class="highlighter-rouge">b</code> (in the case of <code class="highlighter-rouge"><*></code> that <code class="highlighter-rouge">a -> b</code> is hidden in an
applicative). The second operand is the argument to the application. In
the case of normal function application this is plainly the function
argument, with the Functor (“Mappable”) it is a Functor, in the case of
the applicative it is an applicative.</p>
<p>The result of the operation is either of type <code class="highlighter-rouge">b</code>, Functor of <code class="highlighter-rouge">b</code> or
applicative of <code class="highlighter-rouge">b</code>.</p>
<p>One instance of Functor and Applicative (an Applicative is always a
Functor) is the list <code class="highlighter-rouge">[]</code> type. The following ghci interactive session
will demonstrate the three applying operators:</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="o">></span> <span class="p">(</span><span class="o">+</span><span class="mi">10</span><span class="p">)</span> <span class="o">$</span> <span class="mi">1</span>
<span class="mi">11</span>
<span class="o">></span> <span class="p">(</span><span class="o">+</span><span class="mi">10</span><span class="p">)</span> <span class="o"><$></span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span>
<span class="p">[</span><span class="mi">11</span><span class="p">,</span><span class="mi">12</span><span class="p">,</span><span class="mi">13</span><span class="p">]</span>
<span class="o">></span> <span class="p">(</span><span class="o">+</span><span class="p">)</span> <span class="o"><$></span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="o"><*></span> <span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">30</span><span class="p">]</span>
<span class="p">[</span><span class="mi">11</span><span class="p">,</span><span class="mi">21</span><span class="p">,</span><span class="mi">31</span><span class="p">,</span><span class="mi">12</span><span class="p">,</span><span class="mi">22</span><span class="p">,</span><span class="mi">32</span><span class="p">,</span><span class="mi">13</span><span class="p">,</span><span class="mi">23</span><span class="p">,</span><span class="mi">33</span><span class="p">]</span></code></pre></figure>
<p>In Haskell, the list type implements <code class="highlighter-rouge">Monad</code>, which means it also is an
<code class="highlighter-rouge">Applicative</code> and a <code class="highlighter-rouge">Functor</code>. Treating the list as a Functor, we can
apply the function that increments by 10 to each element, and treating
the list as an applicative, we can sequentially join two lists by adding
their elements (building the sum of the cartesian product of their
combinations).</p>
<p>Let’s investigate the type properties of that last statement that used
the <code class="highlighter-rouge">f <$> arg1 <*> arg2</code> pattern (we call this “applicative style”):</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="o">></span> <span class="kr">let</span> <span class="n">mapAndApply</span> <span class="n">f</span> <span class="n">arg1</span> <span class="n">arg2</span> <span class="o">=</span> <span class="n">f</span> <span class="o"><$></span> <span class="n">arg1</span> <span class="o"><*></span> <span class="n">arg2</span>
<span class="o">></span> <span class="o">:</span><span class="n">t</span> <span class="n">mapAndApply</span>
<span class="n">mapAndApply</span> <span class="o">::</span> <span class="kt">Applicative</span> <span class="n">f</span> <span class="o">=></span> <span class="p">(</span><span class="n">a1</span> <span class="o">-></span> <span class="n">a</span> <span class="o">-></span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="n">f</span> <span class="n">a1</span> <span class="o">-></span> <span class="n">f</span> <span class="n">a</span> <span class="o">-></span> <span class="n">f</span> <span class="n">b</span></code></pre></figure>
<p>Thus, Haskell infers types for <code class="highlighter-rouge">f :: (a1 -> a -> b)</code>, for the second
argument <code class="highlighter-rouge">arg1 :: f a1</code> and <code class="highlighter-rouge">arg2 :: f b</code>.</p>
<h2 id="lifting">Lifting</h2>
<p>This combination is a common function, called <code class="highlighter-rouge">liftA2</code></p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="n">liftA2</span> <span class="o">::</span> <span class="kt">Applicative</span> <span class="n">f</span> <span class="o">=></span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">b</span> <span class="o">-></span> <span class="n">c</span><span class="p">)</span> <span class="o">-></span> <span class="n">f</span> <span class="n">a</span> <span class="o">-></span> <span class="n">f</span> <span class="n">b</span> <span class="o">-></span> <span class="n">f</span> <span class="n">c</span></code></pre></figure>
<p>We can read <code class="highlighter-rouge">liftA2 (+)</code> as “lift the addition to an applicative
action”. After lifting, he have an addition for all applicatives.</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="o">></span> <span class="kr">let</span> <span class="n">addApplicative</span> <span class="o">=</span> <span class="n">liftA2</span> <span class="p">(</span><span class="o">+</span><span class="p">)</span>
<span class="n">addApplicative</span> <span class="o">::</span> <span class="p">(</span><span class="kt">Num</span> <span class="n">c</span><span class="p">,</span> <span class="kt">Applicative</span> <span class="n">f</span><span class="p">)</span> <span class="o">=></span> <span class="n">f</span> <span class="n">c</span> <span class="o">-></span> <span class="n">f</span> <span class="n">c</span> <span class="o">-></span> <span class="n">f</span> <span class="n">c</span></code></pre></figure>
<p>To prove the point, we can experiment with this using various
applicatives in the Haskell’s std. library</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="o">></span> <span class="n">addApplicative</span> <span class="p">(</span><span class="kt">Just</span> <span class="mi">1</span><span class="p">)</span> <span class="kt">Nothing</span>
<span class="kt">Nothing</span>
<span class="o">></span> <span class="n">addApplicative</span> <span class="p">(</span><span class="kt">Just</span> <span class="mi">1</span><span class="p">)</span> <span class="p">(</span><span class="kt">Just</span> <span class="mi">2</span><span class="p">)</span>
<span class="kt">Just</span> <span class="mi">3</span>
<span class="o">></span> <span class="n">addApplicative</span> <span class="kt">Nothing</span> <span class="p">(</span><span class="kt">Just</span> <span class="mi">2</span><span class="p">)</span>
<span class="kt">Nothing</span>
<span class="o">></span> <span class="n">addApplicative</span> <span class="kt">Nothing</span> <span class="kt">Nothing</span>
<span class="kt">Nothing</span>
<span class="o">></span> <span class="n">addApplicative</span> <span class="kt">Nothing</span> <span class="kt">Nothing</span>
<span class="kt">Nothing</span>
<span class="o">></span> <span class="n">addApplicative</span> <span class="p">(</span><span class="kt">Right</span> <span class="mi">5</span><span class="p">)</span> <span class="p">(</span><span class="kt">Right</span> <span class="mi">6</span><span class="p">)</span>
<span class="kt">Right</span> <span class="mi">11</span>
<span class="o">></span> <span class="n">addApplicative</span> <span class="p">(</span><span class="kt">Right</span> <span class="mi">5</span><span class="p">)</span> <span class="p">(</span><span class="kt">Left</span> <span class="s">"a"</span><span class="p">)</span>
<span class="kt">Left</span> <span class="s">"a"</span>
<span class="o">></span> <span class="n">addApplicative</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="p">[</span><span class="mi">10</span><span class="p">,</span><span class="mi">20</span><span class="p">,</span><span class="mi">30</span><span class="p">]</span>
<span class="p">[</span><span class="mi">11</span><span class="p">,</span><span class="mi">21</span><span class="p">,</span><span class="mi">31</span><span class="p">,</span><span class="mi">12</span><span class="p">,</span><span class="mi">22</span><span class="p">,</span><span class="mi">32</span><span class="p">,</span><span class="mi">13</span><span class="p">,</span><span class="mi">23</span><span class="p">,</span><span class="mi">33</span><span class="p">]</span>
<span class="o">></span> <span class="n">addApplicative</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="kt">[]</span>
<span class="kt">[]</span></code></pre></figure>
<p>Using a lifted function gives you the impression of working with
ordinary functions, the symmetry between <code class="highlighter-rouge">(f $ x) y</code> and <code class="highlighter-rouge">f <$> x <*> y</code>
makes this possible.</p>
<h2 id="applicative-style">Applicative Style</h2>
<p>The same evaluations can also be written in applicative style.</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="o">></span> <span class="p">(</span><span class="o">+</span><span class="p">)</span> <span class="o"><$></span> <span class="kt">Just</span> <span class="mi">1</span> <span class="o"><*></span> <span class="kt">Nothing</span>
<span class="kt">Nothing</span>
<span class="o">></span> <span class="p">(</span><span class="o">+</span><span class="p">)</span> <span class="o"><$></span> <span class="kt">Just</span> <span class="mi">1</span> <span class="o"><*></span> <span class="kt">Just</span> <span class="mi">2</span>
<span class="kt">Just</span> <span class="mi">3</span>
<span class="o">></span> <span class="p">(</span><span class="o">+</span><span class="p">)</span> <span class="o"><$></span> <span class="kt">Nothing</span> <span class="o"><*></span> <span class="kt">Just</span> <span class="mi">2</span>
<span class="kt">Nothing</span>
<span class="o">></span> <span class="p">(</span><span class="o">+</span><span class="p">)</span> <span class="o"><$></span> <span class="kt">Nothing</span> <span class="o"><*></span> <span class="kt">Nothing</span>
<span class="kt">Nothing</span></code></pre></figure>
<p>Using applicative style emphasizes the resemblance of function
application with arguments <code class="highlighter-rouge">f $ x y</code> and applicative <code class="highlighter-rouge">f <$> x <*> y</code>,
without requiring pre-registered <code class="highlighter-rouge">liftAx</code> functions (x representing the
arity).</p>
<h2 id="example-generating-a-stream-of-unique-labels">Example: Generating a stream of unique labels</h2>
<p>This will be a “more real-world” example that applicative style. Suppose
we need to generate labels in code, for example while performing
operations on an abstract syntax tree. Each label needs to be unique,
and we need labels in various functions. Since we use Haskell and
pure-functions, we cannot just mutate some counter-variable.</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="kr">import</span> <span class="nn">Control.Monad.State</span>
<span class="kr">import</span> <span class="nn">Control.Applicative</span>
<span class="kr">type</span> <span class="kt">LabelM</span> <span class="o">=</span> <span class="kt">State</span> <span class="kt">Int</span>
<span class="n">increment</span> <span class="o">::</span> <span class="kt">LabelM</span> <span class="kt">String</span>
<span class="n">increment</span> <span class="o">=</span> <span class="n">state</span> <span class="o">$</span> <span class="nf">\</span><span class="n">i</span> <span class="o">-></span> <span class="kr">let</span> <span class="n">j</span> <span class="o">=</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span>
<span class="kr">in</span> <span class="p">(</span><span class="s">"$"</span> <span class="o">++</span> <span class="n">show</span> <span class="n">j</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span>
<span class="n">labels</span> <span class="o">::</span> <span class="kt">Bool</span> <span class="o">-></span> <span class="kt">LabelM</span> <span class="p">[(</span><span class="kt">String</span><span class="p">,</span> <span class="kt">String</span><span class="p">)]</span>
<span class="n">labels</span> <span class="n">discard</span> <span class="o">=</span> <span class="n">f</span> <span class="o"><$></span> <span class="n">twoLabels</span>
<span class="o"><*></span> <span class="n">twoLabels</span>
<span class="o"><*></span> <span class="n">twoLabels</span>
<span class="kr">where</span> <span class="n">f</span> <span class="n">a</span> <span class="n">b</span> <span class="n">c</span> <span class="o">=</span> <span class="kr">if</span> <span class="n">discard</span>
<span class="kr">then</span> <span class="p">[</span><span class="n">a</span><span class="p">,</span> <span class="n">c</span><span class="p">]</span>
<span class="kr">else</span> <span class="p">[</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">]</span>
<span class="c1">-- (,) <- is an operator creating a tuple</span>
<span class="n">twoLabels</span> <span class="o">::</span> <span class="kt">LabelM</span> <span class="p">(</span><span class="kt">String</span><span class="p">,</span> <span class="kt">String</span><span class="p">)</span>
<span class="n">twoLabels</span> <span class="o">=</span> <span class="p">(,)</span> <span class="o"><$></span> <span class="n">increment</span> <span class="o"><*></span> <span class="n">increment</span>
<span class="n">main</span> <span class="o">::</span> <span class="kt">IO</span> <span class="nb">()</span>
<span class="n">main</span> <span class="o">=</span> <span class="kr">do</span> <span class="n">putStrLn</span> <span class="s">"Enter `True`, or `False`"</span>
<span class="n">discard</span> <span class="o"><-</span> <span class="n">getLine</span>
<span class="n">print</span> <span class="p">(</span><span class="n">evalState</span> <span class="p">(</span><span class="n">labels</span> <span class="o">.</span> <span class="n">read</span> <span class="o">$</span> <span class="n">discard</span><span class="p">)</span> <span class="mi">0</span><span class="p">)</span></code></pre></figure>
<p>When executed, this program will prompt you to enter either <code class="highlighter-rouge">True</code> or
<code class="highlighter-rouge">False</code>, and then it will print out results, depending on the input.
Either <code class="highlighter-rouge">[("$1","$2"), ("$5","$6")]</code> or
<code class="highlighter-rouge">[("$1","$2"),("$3","$4"),("$5","$6")]</code>. Notice how even if the second
label-pair is discarded after all, the counter is still incremented. The
entry point is the evaluation of <code class="highlighter-rouge">evalState</code> in <code class="highlighter-rouge">main</code>. Here, we
initialize the state monad’s state with 0 and evaluate the monadic
<code class="highlighter-rouge">test</code> function. The state is managed by the state monad
<code class="highlighter-rouge">LabelM = State Int</code>, which directly tells us that our state consists of
an integer variable. Finally we have <code class="highlighter-rouge">increment</code>, which increments, that
internal state and returns a label, as well as <code class="highlighter-rouge">twoLabels</code>, which
generates a pair of such labels (by lifting <code class="highlighter-rouge">increment</code>). Note that both
<code class="highlighter-rouge">increment</code> and <code class="highlighter-rouge">twoLabels</code> are of type <code class="highlighter-rouge">LabelM _</code>, once <code class="highlighter-rouge">LabelM String</code>
and <code class="highlighter-rouge">LabelM (String, String)</code>.</p>
<p>We use <code class="highlighter-rouge">twoLabels</code> in the <code class="highlighter-rouge">labels</code> function, where we use applicative
style to obtain the unique labels and either return them all, or throw
away some<sup id="fnref:intuition"><a href="#fn:intuition" class="footnote">5</a></sup>. I condensed this use case from abstract syntax tree (AST)
rewriting code, and if it wouldn’t blow up the example code, I would
show code here, that introduced labels depending on the AST input to the
program.</p>
<p>Solving this issue with label has some benfits. First of all, it makes
the state explicit in the type signatures, which gives you the guarantee
that if you are not using the <code class="highlighter-rouge">LabelM</code> type, you are not touching that
state. Then, the state is handled just like any other value in Haskell
– immutable. <code class="highlighter-rouge">evalState</code> is the bottleneck (in a good sense), that
allows us to evaluate our “stateful” code and fetch it over in the
LabelM-free world.</p>
<h1 id="composition-patterns">Composition Patterns</h1>
<p>Another interesting pair of operations with a similar signature are the
operators <code class="highlighter-rouge">(.)</code> and <code class="highlighter-rouge">(<=<)</code>.</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="p">(</span><span class="o">.</span><span class="p">)</span> <span class="o">::</span> <span class="p">(</span><span class="n">b</span> <span class="o">-></span> <span class="n">c</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">c</span><span class="p">)</span>
<span class="p">(</span><span class="o"><=<</span><span class="p">)</span> <span class="o">::</span> <span class="kt">Monad</span> <span class="n">m</span> <span class="o">=></span> <span class="p">(</span><span class="n">b</span> <span class="o">-></span> <span class="n">m</span> <span class="n">c</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">m</span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">m</span> <span class="n">c</span><span class="p">)</span></code></pre></figure>
<p>The correspondence here is between functions of type <code class="highlighter-rouge">(b -> c)</code> and
monadic functions of signature <code class="highlighter-rouge">Monad m => (b -> m c)</code>. I.e. the
signatures of <code class="highlighter-rouge">(.)</code> and <code class="highlighter-rouge">(<=<)</code> have almost the same pattern.</p>
<p>We know this <code class="highlighter-rouge">Monad m => (b -> m c)</code> signatures from the bind-operator’s
second operand:</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="p">(</span><span class="o">>>=</span><span class="p">)</span> <span class="o">::</span> <span class="kt">Monad</span> <span class="n">m</span> <span class="o">=></span> <span class="n">m</span> <span class="n">a</span> <span class="o">-></span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="n">m</span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="n">m</span> <span class="n">b</span></code></pre></figure>
<p>By joining two <code class="highlighter-rouge">M a >>= \x -> M b</code> operations, I aim to infer <code class="highlighter-rouge">(<=<)</code>,
we’ll use the <code class="highlighter-rouge">Maybe</code> monad and I’ll write the signatures of the lambda
functions to the right.</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="n">printLengthPrint</span> <span class="o">::</span> <span class="kt">Int</span> <span class="o">-></span> <span class="kt">Maybe</span> <span class="kt">Double</span>
<span class="n">printLengthPrint</span> <span class="o">=</span> <span class="nf">\</span><span class="n">w</span> <span class="o">-></span> <span class="kt">Just</span> <span class="p">(</span><span class="n">show</span> <span class="n">w</span><span class="p">)</span> <span class="c1">-- :: Int -> Maybe String</span>
<span class="o">>>=</span> <span class="nf">\</span><span class="n">x</span> <span class="o">-></span> <span class="kt">Just</span> <span class="p">(</span><span class="n">length</span> <span class="n">x</span><span class="p">)</span> <span class="c1">-- :: String -> Maybe Int</span>
<span class="o">>>=</span> <span class="nf">\</span><span class="n">y</span> <span class="o">-></span> <span class="kt">Just</span> <span class="p">(</span><span class="mf">2.0</span> <span class="o">^^</span> <span class="n">y</span><span class="p">)</span> <span class="c1">-- :: Int -> Maybe Double</span></code></pre></figure>
<p>We can kind of identify the signature of <code class="highlighter-rouge">(<=<)</code> just by looking at
this. Now spell out the lambda functions in point-free style (I called
them <code class="highlighter-rouge">f,g,h</code>) and we can implement the <code class="highlighter-rouge">printLengthPrint</code> function by
Kleisli’s composition</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="n">f</span> <span class="o">::</span> <span class="kt">Int</span> <span class="o">-></span> <span class="kt">Maybe</span> <span class="kt">String</span>
<span class="n">f</span> <span class="o">=</span> <span class="kt">Just</span> <span class="o">.</span> <span class="n">show</span>
<span class="n">g</span> <span class="o">::</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">Maybe</span> <span class="kt">Int</span>
<span class="n">g</span> <span class="o">=</span> <span class="kt">Just</span> <span class="o">.</span> <span class="n">length</span>
<span class="n">h</span> <span class="o">::</span> <span class="kt">Int</span> <span class="o">-></span> <span class="kt">Maybe</span> <span class="kt">Double</span>
<span class="n">h</span> <span class="o">=</span> <span class="kt">Just</span> <span class="o">.</span> <span class="p">(</span><span class="mf">2.0</span> <span class="o">^^</span><span class="p">)</span>
<span class="n">plp1</span> <span class="o">=</span> <span class="n">h</span> <span class="o"><=<</span> <span class="n">g</span> <span class="o"><=<</span> <span class="n">f</span>
<span class="n">plp2</span> <span class="o">=</span> <span class="n">f</span> <span class="o">>=></span> <span class="n">g</span> <span class="o">>=></span> <span class="n">h</span></code></pre></figure>
<p>To sum it up: Functional programming is often defined as programming by
function composition and application. Monads are a functional concepts
and we can see that monads compose in a very similar way. This
underlines the fact that Monads are indeed a functional concept (and not
– like sometimes stated – imperative programming in sheep’s clothing).</p>
<h1 id="example">Example</h1>
<p>So far this blog post was a bit abstract, looking at type signatures and
type signatures. So now we’ll see an example: A parser for simple
arithmetic expressions and see when we can use the applicative style
shown above, and when not.</p>
<p>The first parser is parsing <a href="https://www.wikiwand.com/en/Reverse_Polish_notation">Reverse Polish
Notation</a> style
expressions, in RPN, the infix expression we are used to <code class="highlighter-rouge">1 + 2 * 3</code>
would be written as <code class="highlighter-rouge">+ 1 * 2 3</code>, it is especially simple to parse in
contrast to the more common infix notation. We use megaparsec here.</p>
<p>First of all we need to import our parser library and the Identity
Functor.</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="kr">import</span> <span class="k">qualified</span> <span class="nn">Text.Megaparsec.Lexer</span> <span class="k">as</span> <span class="n">L</span>
<span class="kr">import</span> <span class="nn">Text.Megaparsec</span>
<span class="kr">import</span> <span class="nn">Text.Megaparsec.String</span></code></pre></figure>
<p>Now we define an algebraic datatype representing our computation:
<code class="highlighter-rouge">Term</code>. A term can either be an addition, a subtraction, a
multiplication, a division, or an integer value here.</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="kr">data</span> <span class="kt">Term</span> <span class="o">=</span> <span class="kt">Add</span> <span class="kt">Term</span> <span class="kt">Term</span>
<span class="o">|</span> <span class="kt">Sub</span> <span class="kt">Term</span> <span class="kt">Term</span>
<span class="o">|</span> <span class="kt">Mul</span> <span class="kt">Term</span> <span class="kt">Term</span>
<span class="o">|</span> <span class="kt">Div</span> <span class="kt">Term</span> <span class="kt">Term</span>
<span class="o">|</span> <span class="kt">Val</span> <span class="kt">Integer</span>
<span class="kr">deriving</span> <span class="p">(</span><span class="kt">Show</span><span class="p">,</span> <span class="kt">Eq</span><span class="p">)</span></code></pre></figure>
<p>Our parsing strategy is to always consume trailing whitespaces with
every parsed term.</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="n">trimTrailing</span> <span class="o">=</span> <span class="kt">L</span><span class="o">.</span><span class="n">lexeme</span> <span class="n">space</span>
<span class="n">op</span> <span class="o">::</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">Parser</span> <span class="kt">String</span>
<span class="n">op</span> <span class="o">=</span> <span class="n">trimTrailing</span> <span class="o">.</span> <span class="n">string</span></code></pre></figure>
<p>Define multiplication, division, addition and subtraction expressions in
applicative style (the next 5 expressions all have the type
<code class="highlighter-rouge">Parser Term</code>)</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="n">mult</span> <span class="o">=</span> <span class="kt">Add</span> <span class="o"><$></span> <span class="p">(</span><span class="n">op</span> <span class="s">"+"</span> <span class="o">*></span> <span class="n">term</span><span class="p">)</span> <span class="o"><*></span> <span class="n">term</span>
<span class="n">divi</span> <span class="o">=</span> <span class="kt">Div</span> <span class="o"><$></span> <span class="p">(</span><span class="n">op</span> <span class="s">"/"</span> <span class="o">*></span> <span class="n">term</span><span class="p">)</span> <span class="o"><*></span> <span class="n">term</span>
<span class="n">addi</span> <span class="o">=</span> <span class="kt">Mul</span> <span class="o"><$></span> <span class="p">(</span><span class="n">op</span> <span class="s">"*"</span> <span class="o">*></span> <span class="n">term</span><span class="p">)</span> <span class="o"><*></span> <span class="n">term</span>
<span class="n">subt</span> <span class="o">=</span> <span class="kt">Sub</span> <span class="o"><$></span> <span class="p">(</span><span class="n">op</span> <span class="s">"-"</span> <span class="o">*></span> <span class="n">term</span><span class="p">)</span> <span class="o"><*></span> <span class="n">term</span>
<span class="n">intval</span> <span class="o">=</span> <span class="kt">Val</span> <span class="o"><$></span> <span class="n">trimTrailing</span> <span class="kt">L</span><span class="o">.</span><span class="n">integer</span></code></pre></figure>
<p>Now all left to do is define a parser for our expression as an
alternative of all arithmetic operations:</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="n">term</span> <span class="o">::</span> <span class="kt">Parser</span> <span class="kt">Term</span>
<span class="n">term</span> <span class="o">=</span> <span class="n">mult</span>
<span class="o"><|></span> <span class="n">divi</span>
<span class="o"><|></span> <span class="n">addi</span>
<span class="o"><|></span> <span class="n">subt</span>
<span class="o"><|></span> <span class="n">intval</span></code></pre></figure>
<h2 id="infix-parsing">Infix Parsing</h2>
<p>If you are interested in infix parsing: it is algorithmically more
complex. I.e. in infix parsing when the parser arrives at a number, it
cannot easily know whether this number can stand alone, or whether it
belongs to a binary operation with the operator to the right (in
<code class="highlighter-rouge">3 * 4 + 5</code> the parser would have to find out that 3 is part of a
multiplication expression, and then find out that the multiplication is
part of an addition expression later on).</p>
<p>Luckily the megaparsec library has utilities to make parsing infix
notation easier. I included a snippet for completeness.</p>
<figure class="highlight"><pre><code class="language-haskell" data-lang="haskell"><span class="n">parens</span> <span class="o">=</span> <span class="n">between</span> <span class="p">(</span><span class="n">symbol</span> <span class="s">"("</span><span class="p">)</span> <span class="p">(</span><span class="n">symbol</span> <span class="s">")"</span><span class="p">)</span>
<span class="kr">where</span> <span class="n">symbol</span> <span class="o">=</span> <span class="kt">L</span><span class="o">.</span><span class="n">symbol</span> <span class="n">space</span>
<span class="n">infixExpr</span> <span class="o">=</span> <span class="n">makeExprParser</span> <span class="n">infixTerm</span> <span class="n">table</span>
<span class="n">infixTerm</span> <span class="o">=</span> <span class="n">parens</span> <span class="n">infixExpr</span>
<span class="o"><|></span> <span class="n">intval</span>
<span class="n">table</span> <span class="o">=</span> <span class="p">[</span> <span class="p">[</span> <span class="kt">InfixL</span> <span class="p">(</span><span class="n">op</span> <span class="s">"*"</span> <span class="o">>></span> <span class="n">return</span> <span class="kt">Mul</span><span class="p">)</span>
<span class="p">,</span> <span class="kt">InfixL</span> <span class="p">(</span><span class="n">op</span> <span class="s">"/"</span> <span class="o">>></span> <span class="n">return</span> <span class="kt">Div</span><span class="p">)]</span>
<span class="p">,</span> <span class="p">[</span> <span class="kt">InfixL</span> <span class="p">(</span><span class="n">op</span> <span class="s">"+"</span> <span class="o">>></span> <span class="n">return</span> <span class="kt">Add</span><span class="p">)</span>
<span class="p">,</span> <span class="kt">InfixL</span> <span class="p">(</span><span class="n">op</span> <span class="s">"-"</span> <span class="o">>></span> <span class="n">return</span> <span class="kt">Sub</span><span class="p">)]]</span></code></pre></figure>
<p>We can see at least here, that for this kind of a problem applicatives
are not enough and we need Monads.</p>
<h1 id="resources">Resources</h1>
<p>For more detail on Haskell’s types see the
<a href="https://wiki.haskell.org/Typeclassopedia">Typeclassopedia</a>.</p>
<p>To familiarize yourself with Functors and Applicatives, it is really
great to write parsers with
<a href="https://mrkkrp.github.io/megaparsec/">Megaparsec</a>.</p>
<p><a href="http://dev.stephendiehl.com/hask/">What I wish I knew when learning
Haskell</a> by Stephen Diehl is also a
great source.</p>
<h1 id="footnotes">Footnotes</h1>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Some notes on tooling</p>
<p>In my experience, I learned the best with Haskell, when I used
appropriate tooling. They accelerate learning Haskell so much.</p>
<p><a href="https://hackage.haskell.org/package/hlint">hlint</a> is your friend
with invaluable information. It notifies you when you use redundant
brackets and this feedback will familiarize you with operator
precedence much quicker. Like any linter, I suppose that hlint’s
value is probably at its peak when used by beginners and I expect it
will be less valuable to me over time. Nevertheless I don’t want to
go without it right now.</p>
<p>I use neovim with the plugins :</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Plug 'benekastah/neomake'
Plug 'dag/vim2hs'
Plug 'bitc/vim-hdevtools'
</code></pre></div> </div>
<p>Pointfree is another tool, that I use (curiously), it transforms
your code to point-free style. I often use it when I feel that a
line of code could possibly be written in point free style, check it
out and revert back if I feel normal-style Haskell is better. This
has taught me some things I probably wouldn’t have discovered for a
long time, for example that <code class="highlighter-rouge">(,)</code> and <code class="highlighter-rouge">(+3)</code> exist, etc. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>A Python programmer will probably pick up Ruby’s language features
rather quickly and huge portions of the time learning Ruby will be
spent on familiarizing onesself with the standard library. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>type signatures can be obtained by running ghci and asking it for
types</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Prelude> import Control.Monad
> :t (>>=)
(>>=) :: Monad m => m a -> (a -> m b) -> m b
> :t (>>)
(>>) :: Monad m => m a -> m b -> m b
> :t return
return :: Monad m => a -> m a
> :t fail
fail :: Monad m => String -> m a
> :t (<$>)
(<$>) :: Functor f => (a -> b) -> f a -> f b
> :t (<$)
(<$) :: Functor f => a -> f b -> f a
> :t pure
pure :: Applicative f => a -> f a
> :t (<*>)
(<*>) :: Applicative f => f (a -> b) -> f a -> f b
> :t (*>)
(*>) :: Applicative f => f a -> f b -> f b
> :t (<*)
(<*) :: Applicative f => f a -> f b -> f a
> :t ($)
($) :: (a -> b) -> a -> b
> :t fmap
fmap :: Functor f => (a -> b) -> f a -> f b
> :t (<=<)
(<=<) :: Monad m => (b -> m c) -> (a -> m b) -> a -> m c
> :t (.)
(.) :: (b -> c) -> (a -> b) -> a -> c
</code></pre></div> </div>
<p><a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>In Haskell, Functors are something entirely different from
Functors in C++. <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:intuition">
<p>My first intuition here was to use monadic functionality (<code class="highlighter-rouge">>>=</code>),
yet as it turns out, Functor and applicative (<code class="highlighter-rouge"><*></code>) is enough. This
confused me: If applicatives were about sequential actions, where
the current item does not know about its predecessor, how could it
increment the state-monads state? The answer is in the signatures:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(<*>) :: Applicative f => f (a -> b) -> f a -> f b
</code></pre></div> </div>
<p>The <code class="highlighter-rouge">f (a -> b)</code> piece tells us, that we map from one value of the
applicative to another. the consecutive <code class="highlighter-rouge">-> f a -> f b</code> tell us,
that our <code class="highlighter-rouge">(a -> b)</code> operation is applied to <code class="highlighter-rouge">f a</code> to yield <code class="highlighter-rouge">f b</code>.
Thus shouldn’t have surprised me that applicative is in fact capable
of incrementing the counter.</p>
<p>For comparison, Monad’s bind also has this mapping from <code class="highlighter-rouge">a</code> to <code class="highlighter-rouge">b</code>
in it’s signature, however in the form of <code class="highlighter-rouge">(a -> m b)</code>.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(>>=) :: Monad m => m a -> (a -> m b) -> m b
</code></pre></div> </div>
<p><a href="#fnref:intuition" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Thu, 21 Apr 2016 21:00:00 +0200
http://www.holger-peters.de/haskell-by-types.html
http://www.holger-peters.de/haskell-by-types.htmlFunctional ProgrammingHaskellHaskellPython Data Science Going Functional - Or: Benchmarking Dask<p>This weekend, I visited PyCon Italy in the pittoreque town of
<a href="http://en.wikipedia.com/wiki/Florence">Firenze</a>. It was a great
conference with great talks and encounters (great thanks to all the
volunteers who made it happen) and amazing coffee.</p>
<p>I held a talk with the title “Python Data Science Going Functional”
Science Track”, where I mostly presented on Dask, one of those
libraries-to-watch in the Python data science eco system. Slides are
available on speaker deck.</p>
<script async="" class="speakerdeck-embed" data-id="4e611c21c0564db3a37dc3db37cd4e1c" data-ratio="1.33333333333333" src="//speakerdeck.com/assets/embed.js"></script>
<h1 id="dask">Dask</h1>
<p>The talk introduces Dask as a functional abstraction in the Python data
science stack. While creating the slides I had stumbled over an exciting
tweet about Dask by Travis Oliphant</p>
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Cool to see dask.array achieving similar performance to Cython + OpenMP: <a href="https://t.co/3tsWCAgWWQ">https://t.co/3tsWCAgWWQ</a> Much simpler code with <a href="https://twitter.com/hashtag/dask?src=hash">#dask</a>. <a href="https://twitter.com/PyData">@PyData</a></p>— Travis Oliphant (@teoliphant) <a href="https://twitter.com/teoliphant/status/717077047000965120">April 4, 2016</a></blockquote>
<script async="" src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>And after verifying the results on my machine (with some modifications,
as I do not trust <code class="highlighter-rouge">timeit</code>), I included a very similar benchmark in my
slides. While reproducing and adapting the benchmarks, I stumbled over
some weirdly long execution times for the dask
<a href="http://dask.pydata.org/en/latest/array-api.html#dask.array.core.from_array">from_array</a>
classmethod. So I included this finding in my talk’s slides without
really being able to attribute this delay to a specific reason.</p>
<p>After delivering my talk I felt a bit unsatisfied about this. Why did
<code class="highlighter-rouge">from_array</code> perform so badly? So I decided to ask. The answer: Dask
hashes down the whole array in <code class="highlighter-rouge">from_array</code> to generate a key for it,
which is the reason for it to be so slow. The solution is surprisingly
simple. By passing a <code class="highlighter-rouge">name='identifier'</code> to the <code class="highlighter-rouge">from_array</code>, one can
provide a custom key and <code class="highlighter-rouge">from_array</code> is a suddenly a cheap operation.
So the current state of my benchmark shows that Dask improves upon pure
numpy or numexpr performance, however does not quite reach the
performance of a Cython implementation:</p>
<p><img src="assets/images/dask-corrected-benchmark.png" alt="A corrected benchmark showing execution times for numexpr (NX), numpy
(np) and Cython (with OpenMP parallelization) and Cython (including
`from_array`)." width="100%" /></p>
<p>The expression evaluated in that benchmark was</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">x</span> <span class="o">=</span> <span class="n">da</span><span class="o">.</span><span class="n">from_array</span><span class="p">(</span><span class="n">x_np</span><span class="p">,</span> <span class="n">chunks</span><span class="o">=</span><span class="n">arr</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">/</span> <span class="n">CPU_COUNT</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">'x'</span><span class="p">)</span>
<span class="n">mx</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="nb">max</span><span class="p">()</span>
<span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">/</span> <span class="n">mx</span><span class="p">)</span><span class="o">.</span><span class="nb">sum</span><span class="p">()</span> <span class="o">*</span> <span class="n">mx</span>
<span class="n">x</span><span class="o">.</span><span class="n">compute</span><span class="p">()</span></code></pre></figure>
<p>I plan to upload a revised edition of the slides on Speakerdeck (once I
have a decenet internet connection again), to include the improved
benchmark, so that they are not misleading for people who stumble on
them without context.</p>
<h1 id="learnings">Learnings</h1>
<p>What can we conclude from this?</p>
<ul>
<li>The conversion overhead of converting a dask array to a numpy array
is not as bad as I feared.</li>
<li>There are two aspects in a benchnark: performance and usability.</li>
<li>Dask should be watched not only for out-of-core computations, but
also for parallelizing simple, blocking numpy expressions.</li>
</ul>
Mon, 18 Apr 2016 11:35:00 +0200
http://www.holger-peters.de/python-data-science-going-functional-or-benchmarking-dask.html
http://www.holger-peters.de/python-data-science-going-functional-or-benchmarking-dask.htmlPythonData-ScienceSpeakingpythonAn Interesting Fact About The Python Garbage Collector<p>While Python prides itsself of being a simple, straightforward
programming language and being explicit is pointed out as a core value,
of course, one can always discover interpreter specifics and
implementation detail, that one did not expect to find when working at
the surface. These days I learned more about a peculiar property of the
Python garbage collector, that I would like to share.</p>
<p>Let’s start by introducing the problem quickly. Python manages its
objects primarily by reference counting. I.e. each object stores how
many times it is referenced from other places, and this reference count
is updated over the runtime of the program. If the reference count drops
to zero, the object cannot be reached by the Python code anymore, and
the memory can be freed/reused by the interpreter.</p>
<p>An optional method <code class="highlighter-rouge">__del__</code> is called by the Python interpreter when
the object is about to be destroyed. This allows us to do some cleanup,
for example closing database connections, etc. Typically <code class="highlighter-rouge">__del__</code>
rarely has to be defined. For our example we will use it to illustrate
when the disposal of an object happens:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">>>></span> <span class="k">class</span> <span class="nc">A</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="o">...</span> <span class="k">def</span> <span class="nf">__del__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="o">...</span> <span class="k">print</span><span class="p">(</span><span class="s">"no reference to {}"</span><span class="o">.</span><span class="nb">format</span><span class="p">(</span><span class="bp">self</span><span class="p">))</span>
<span class="o">...</span>
<span class="o">>>></span> <span class="n">a</span> <span class="o">=</span> <span class="n">A</span><span class="p">()</span>
<span class="o">>>></span> <span class="n">b</span> <span class="o">=</span> <span class="n">a</span>
<span class="o">>>></span> <span class="n">c</span> <span class="o">=</span> <span class="n">a</span></code></pre></figure>
<p>The situation in memory resembles this schematic:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌────┐
│ a │────────────┐
└────┘ ▼
┌────┐ ┌───────────────┐
│ b │───▶│A() refcount=3 │
└────┘ └───────────────┘
┌────┐ ▲
│ c │────────────┘
└────┘
</code></pre></div></div>
<p>Now we let the variables <code class="highlighter-rouge">a</code>, <code class="highlighter-rouge">b</code>, and <code class="highlighter-rouge">c</code> point to <code class="highlighter-rouge">None</code> instead of
the instance <code class="highlighter-rouge">A()</code>:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">>>></span> <span class="n">a</span> <span class="o">=</span> <span class="bp">None</span>
<span class="o">>>></span> <span class="n">b</span> <span class="o">=</span> <span class="bp">None</span>
<span class="o">>>></span> <span class="n">c</span> <span class="o">=</span> <span class="bp">None</span>
<span class="n">No</span> <span class="n">reference</span> <span class="n">to</span> <span class="o"><</span><span class="n">__main__</span><span class="o">.</span><span class="n">A</span> <span class="nb">object</span> <span class="n">at</span> <span class="mh">0x102ace9d0</span><span class="o">></span></code></pre></figure>
<p>Changing the situation to:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌────┐ ┌────┐
│ a │─┬─▶│None│
└────┘ │ └────┘
┌────┐ │ ┌───────────────┐
│ b │─┤ │A() refcount=0 │
└────┘ │ └───────────────┘
┌────┐ │
│ c │─┘
└────┘
</code></pre></div></div>
<p>After we have overwritten the last reference (<code class="highlighter-rouge">c</code>) to our instance of
<code class="highlighter-rouge">A</code>, the object is destroyed, which triggers a call to <code class="highlighter-rouge">__del__</code> just
before really destroying the object.</p>
<h1 id="cyclic-references">Cyclic References</h1>
<p>However, there are instances where the reference count cannot simply go
down to zero, it is the case of cylic references:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> ┌────┐
│ a │
└────┘
│
▼
┌───────────────┐
┌──│A() refcount=2 │◀─┐
│ └───────────────┘ │
│ ┌───────────────┐ │
└─▶│B() refcount=1 │──┘
└───────────────┘
</code></pre></div></div>
<p>Setting <code class="highlighter-rouge">a</code> to <code class="highlighter-rouge">None</code>, we will still have refcounts of <code class="highlighter-rouge">>= 1</code>. For these
cases, Python employs a garbage collector, some code that traverses
memory and applies more complicated heuristics to discover unused
objects. We can use the <code class="highlighter-rouge">gc</code> module to manually trigger a garbage
collection run.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">>>></span> <span class="n">a</span> <span class="o">=</span> <span class="n">A</span><span class="p">()</span>
<span class="o">>>></span> <span class="n">b</span> <span class="o">=</span> <span class="n">A</span><span class="p">()</span>
<span class="o">>>></span> <span class="n">a</span><span class="o">.</span><span class="n">other</span> <span class="o">=</span> <span class="n">b</span>
<span class="o">>>></span> <span class="n">b</span><span class="o">.</span><span class="n">other</span> <span class="o">=</span> <span class="n">a</span>
<span class="o">>>></span> <span class="n">a</span> <span class="o">=</span> <span class="bp">None</span>
<span class="o">>>></span> <span class="n">b</span> <span class="o">=</span> <span class="bp">None</span>
<span class="o">>>></span> <span class="kn">import</span> <span class="nn">gc</span>
<span class="o">>>></span> <span class="n">gc</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span>
<span class="mi">11</span></code></pre></figure>
<p>However, since <code class="highlighter-rouge">A</code> implements <code class="highlighter-rouge">__del__</code>, Python refuses to clean them,
arguing that it cannot not tell, which <code class="highlighter-rouge">__del__</code> method to call first.
Instead of doing the wrong thing (invoking them in the wrong sequence),
Python decides to rather do nothing – avoiding undefined behaviour, but
introducing a potential memory leak.</p>
<p>In fact, Python will not clean any objects in the cycle, which can
possibly render a huger group of objects to pollute memory (see
<a href="https://docs.python.org/2/library/gc.html#gc.garbage">https://docs.python.org/2/library/gc.html#gc.garbage</a> ). We can inspect
the list of objects, which could not be garbage collected:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">>>></span> <span class="n">gc</span><span class="o">.</span><span class="n">garbage</span>
<span class="p">[</span><span class="o"><</span><span class="n">__main__</span><span class="o">.</span><span class="n">A</span> <span class="nb">object</span> <span class="n">at</span> <span class="mh">0x102ace9d0</span><span class="o">></span><span class="p">,</span> <span class="o"><</span><span class="n">__main__</span><span class="o">.</span><span class="n">A</span> <span class="nb">object</span> <span class="n">at</span> <span class="mh">0x102aceb10</span><span class="o">></span><span class="p">]</span></code></pre></figure>
<p>Finally, if you remove the <code class="highlighter-rouge">__del__</code> method from the class, you would
not find these objects in <code class="highlighter-rouge">gc.garbage</code>, as Python would just dispose of
them.</p>
<h1 id="python-3">Python 3</h1>
<p>As it turns out, from Python 3.4 on, the issue I wrote about does not
exist anymore. <code class="highlighter-rouge">__del__</code> s do not impede garbage collection any more, so
<code class="highlighter-rouge">gc.garbage</code> will only be filled for other reasons. For details, you can
read <a href="https://www.python.org/dev/peps/pep-0442/">PEP 442</a> and the
<a href="https://docs.python.org/3.5/library/gc.html#gc.garbage">Python docs</a>.</p>
<p>Considering the adoption of Python 3.4, most Python code bases have to
be careful about when to use <code class="highlighter-rouge">__del__</code>.</p>
Tue, 16 Feb 2016 20:34:00 +0100
http://www.holger-peters.de/an-interesting-fact-about-the-python-garbage-collector.html
http://www.holger-peters.de/an-interesting-fact-about-the-python-garbage-collector.htmlPythonInterpreterpythonExceptions - The Dark Side of the Force<p>A recent blog post <a href="http://stupidpythonideas.blogspot.de/2015/05/if-you-dont-like-exceptions-you-dont.html">“If you don’t like exceptions, you don’t like
Python”</a>
has made rounds lately, and compelled me to write a partial rebuttal. It
is not like that blog post is completely wrong, but it is not the be-all
and end-all of this topic. And if I may add, it is kind of opinionated.</p>
<p>The original article states that exceptions are central to Python, that
the common advice “exceptions should only be for errors, not for normal
flow control” is wrong, goes on, explaining that exceptions are used in
core implementations, such as the iterator protocol, and attribute
access, thus that they are a central feature of the language. Some
longer parts of the blog posts are concerned debunking commonly held
misconceptions by Java and C++ programmers.</p>
<p>Roughly speaking, exceptions in this article are portrait very
favourably, with all that praise, all criticism and questions regarding
their use are eclipsed.</p>
<h1 id="use-exceptions-for-error-handling">Use exceptions for error handling</h1>
<p>This is a point where I just whole-heartedly agree with barnert. Errors
should be propagated using exceptions, so</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">min</span><span class="p">(</span><span class="o">*</span><span class="n">lst</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">lst</span><span class="p">:</span>
<span class="k">raise</span> <span class="nb">ValueError</span><span class="p">(</span><span class="s">"min(..) requires a list with at least one element"</span><span class="p">)</span>
<span class="n">minimum</span> <span class="o">=</span> <span class="n">lst</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">lst</span><span class="p">:</span>
<span class="k">if</span> <span class="n">minimum</span> <span class="o">></span> <span class="n">item</span><span class="p">:</span>
<span class="n">minimum</span> <span class="o">=</span> <span class="n">item</span>
<span class="k">return</span> <span class="n">minimum</span></code></pre></figure>
<p>is a perfectly fine usage of exceptions, and callers should check for
these exceptions if their code does not guarantee that the argument is a
list of length above 0.</p>
<h1 id="exceptions-are-dissociated-from-values-and-variables">Exceptions are dissociated from values and variables</h1>
<p>Sometimes I stumble over code that uses a pattern like this:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">result</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">result</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">dosomething</span><span class="p">(</span><span class="n">bar</span><span class="p">))</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">foo</span> <span class="o">=</span> <span class="n">bar</span><span class="p">[</span><span class="n">key</span><span class="p">][</span><span class="n">anotherkey</span><span class="p">]</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">dosomething</span><span class="p">(</span><span class="n">foo</span><span class="p">)</span>
<span class="n">result</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">res</span><span class="p">[</span><span class="n">evenanotherkey</span><span class="p">])</span>
<span class="k">except</span> <span class="nb">KeyError</span><span class="p">:</span>
<span class="o">....</span>
<span class="k">finally</span><span class="p">:</span>
<span class="k">return</span> <span class="n">result</span></code></pre></figure>
<p>This snippet has many exception-related issues and shows how not to use
exceptions. First of all, it is unclear which key-access in the try
block does raise the exception. It could be in <code class="highlighter-rouge">bar[key]</code>, or in
<code class="highlighter-rouge">_[anotherkey]</code>, then in <code class="highlighter-rouge">res[evenanotherkey]</code>, or finally it could
happen in <code class="highlighter-rouge">dosomething(foo)</code>. The exception mechanism dissociates error
handling from the values and variables. My question is: can you tell
whether catching KeyErrors from <code class="highlighter-rouge">dosomething()</code> is intended?</p>
<p>So when using exceptions, one has to be really careful about which
exceptions are caught and which aren’t. With defensive programming (i.e.
<code class="highlighter-rouge">haskey()</code>)-style checks, it is unambiguous and hardly as “intrusive” to
the code as writing out individual <code class="highlighter-rouge">try-catch</code> blocks for each indexing
operation.</p>
<h2 id="exceptional-dangers">Exceptional Dangers</h2>
<p>So there are basically two risks when using exceptions:</p>
<ol>
<li>An exception that should be caught is not caught</li>
<li>An exception is caught wrongfully</li>
</ol>
<p>The first risk is definitely a risk, but one that I don’t worry too much
about. The second is a risk I definitely fear. How many functions in
your code can throw <code class="highlighter-rouge">KeyErrors</code>, <code class="highlighter-rouge">ValueError</code>, <code class="highlighter-rouge">IndexError</code>,
<code class="highlighter-rouge">TypeError</code>, and <code class="highlighter-rouge">RuntimeError</code> can your code throw?</p>
<h1 id="exceptions-as-pythonic-gotos">Exceptions as Pythonic gotos</h1>
<p>Exceptions can emulate goto statements. Of course they are jumps to
upper levels on the stack, but also within statements. In C code, goto’s
are a primary means of function-local control flow and error handling
(and for error-handling, they are rather uncontroversial):</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">int</span>
<span class="nf">max_in_two_dim</span><span class="p">(</span><span class="kt">double</span> <span class="o">*</span> <span class="n">array</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">N</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">M</span><span class="p">,</span> <span class="kt">double</span> <span class="o">*</span><span class="n">out</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">N</span> <span class="o">*</span> <span class="n">M</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">goto</span> <span class="n">empty_array_lbl</span><span class="p">;</span>
<span class="kt">double</span> <span class="n">max</span> <span class="o">=</span> <span class="n">array</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">N</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">j</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="n">M</span><span class="p">;</span> <span class="o">++</span><span class="n">j</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">double</span> <span class="n">val</span> <span class="o">=</span> <span class="n">array</span><span class="p">[</span><span class="n">j</span> <span class="o">*</span> <span class="n">N</span> <span class="o">+</span><span class="n">k</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">val</span> <span class="o">!=</span> <span class="n">val</span><span class="p">)</span> <span class="c1">// NaN case</span>
<span class="k">goto</span> <span class="n">err_lbl</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">max</span> <span class="o"><</span> <span class="n">val</span><span class="p">)</span>
<span class="n">max</span> <span class="o">=</span> <span class="n">val</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="nl">nan_lbl:</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"encountered a not-a-number value when unexpected"</span><span class="p">);</span>
<span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="nl">empty_array_lbl:</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"no data in array with given dims"</span><span class="p">);</span>
<span class="k">return</span> <span class="o">-</span><span class="mi">2</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>You can model this usage with exceptions in Python. I have seen such
code in the wild.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">whatever</span><span class="p">(</span><span class="n">arg1</span><span class="p">,</span> <span class="n">arg2</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">M</span><span class="p">):</span>
<span class="c1"># ..
</span> <span class="k">if</span> <span class="o">...</span><span class="p">:</span>
<span class="k">raise</span> <span class="nb">RuntimeError</span><span class="p">(</span><span class="s">"jump"</span><span class="p">)</span>
<span class="k">return</span> <span class="n">out</span>
<span class="k">except</span> <span class="nb">RuntimeError</span><span class="p">:</span>
<span class="c1"># cleanup
</span> <span class="c1"># ..</span></code></pre></figure>
<p>In most cases there are ways to avoid this pattern that are preferrable.
Python’s for loops have an optional <code class="highlighter-rouge">else</code> branch that helps avoiding
such jumps. Nevertheless, this pattern can go awry with a <code class="highlighter-rouge">RuntimeError</code>
happending at some other place in the loop, etc.</p>
<h1 id="meta-ingroup-outgroup-thinking">Meta: Ingroup, Outgroup Thinking</h1>
<p>What I disklike the most about barnert’s article is probably mostly what
one can read in the title: “If …, you don’t like Python”. It is in
line with a lot of talk I hear about code/software/solutions being
“Pythonic”. What this seems to imply is, that must take sides: Either
you are in line with an orthodox Python community, or you are an
outsider, someone who is not “Pythonic” enough. All of this is not
helpful for improving code.</p>
<h1 id="conclusion">Conclusion</h1>
<p>Exceptions are a central and powerful tool in Python. But use them with
care and caution. Do not pretend that they are like a magic wand, don’t
use them to show your love for python. Use them when the individual
situation calls for exception usage.</p>
Wed, 13 Jan 2016 23:16:00 +0100
http://www.holger-peters.de/exceptions-the-dark-side-of-the-force.html
http://www.holger-peters.de/exceptions-the-dark-side-of-the-force.htmlPythonClean CodeBest Practicepython