data.onebiglibrary.nethttp://data.onebiglibrary.net/2014-11-12T00:00:00-05:00animating Anscombe's Quartet regression diagnostics2014-11-12T00:00:00-05:00dchudtag:data.onebiglibrary.net,2014-11-12:2014/11/12/animated-anscombe-quartet-regression-diagnostics/<p>Using the sketch developed in animating regression parts <a href="http://data.onebiglibrary.net/2014/09/18/animating-regression/">1</a> and <a href="http://data.onebiglibrary.net/2014/10/18/animating-regression-part-2/">2</a>, let's take a look at <a href="http://en.wikipedia.org/wiki/Anscombe's_quartet">Anscombe's Quartet</a>. What makes these datasets useful, as wikipedia points out, is their near-equivalent stats: the x and y sets share the same mean, sample variance, correlation and simple linear regression model. It's instructive as a clear example of what to watch out for when developing simple linear regressions, and the issues each dataset highlights come clear in the different diagnostic plots.</p> <p>The technical challenge here is to use the sketch developed in part 2 four times. That code is a mess; it reflects my learning process, but it's not anything I'd want to reuse. The simplest approach to solving this is to turn the viz element into a <a href="http://bost.ocks.org/mike/chart/">reusable chart</a> (and to-read: <a href="http://bocoup.com/weblog/reusability-with-d3/">Exploring Reusability with D3.js</a>)</p> <p>I'm under several class deadlines just now, so I won't go as far as possible in making this nice and cleanly configurable and modifiable, but I certainly don't want to write the same code out four times, so I'll look for a middle ground that achieves some code cleanup and a modicum of reuse.</p> <p>First off, we need to pull the source data into this page. The Anscombe datasets and their summary statistics are readily available, but their linear model residuals and cook's distance values require a little calculation. There are javascript stats libraries that can handle the regression, but they don't seem to ship with a cook's distance implementation. (to-do: pull request.) Fortunately R ships with the anscombe data pre-loaded, and it's easy to put all this together and draw it out as JSON for easy use here:</p> <div class="highlight"><pre><span class="kn">library</span><span class="p">(</span>rjson<span class="p">)</span> a1 <span class="o">&lt;-</span> <span class="kt">data.frame</span><span class="p">(</span>anscombe<span class="o">\$</span>x1<span class="p">,</span> anscombe<span class="o">\$</span>y1<span class="p">)</span> <span class="kp">names</span><span class="p">(</span>a1<span class="p">)</span> <span class="o">&lt;-</span> <span class="kt">c</span><span class="p">(</span><span class="s">&quot;x&quot;</span><span class="p">,</span> <span class="s">&quot;y&quot;</span><span class="p">)</span> a1fit <span class="o">&lt;-</span> lm<span class="p">(</span>y <span class="o">~</span> x<span class="p">,</span> a1<span class="p">)</span> a1<span class="o">\$</span>cooks <span class="o">&lt;-</span> cooks.distance<span class="p">(</span>a1fit<span class="p">)</span> a1<span class="o">\$</span>error <span class="o">&lt;-</span> a1fit<span class="o">\$</span>residuals a1<span class="o">\$</span>quantile <span class="o">&lt;-</span> <span class="kp">scale</span><span class="p">(</span>a1<span class="o">\$</span>error<span class="p">)</span> <span class="c1"># repeat for a2, a3, a4</span> aout <span class="o">&lt;-</span> <span class="kt">vector</span><span class="p">(</span>mode<span class="o">=</span><span class="s">&quot;list&quot;</span><span class="p">,</span> length<span class="o">=</span><span class="m">4</span><span class="p">)</span> <span class="kp">names</span><span class="p">(</span>aout<span class="p">)</span> <span class="o">&lt;-</span> <span class="kt">c</span><span class="p">(</span><span class="s">&quot;a1&quot;</span><span class="p">,</span> <span class="s">&quot;a2&quot;</span><span class="p">,</span> <span class="s">&quot;a3&quot;</span><span class="p">,</span> <span class="s">&quot;a4&quot;</span><span class="p">)</span> aout<span class="o">\$</span>a1 <span class="o">&lt;-</span> a1 aout<span class="o">\$</span>a2 <span class="o">&lt;-</span> a2 aout<span class="o">\$</span>a3 <span class="o">&lt;-</span> a3 aout<span class="o">\$</span>a4 <span class="o">&lt;-</span> a4 toJSON<span class="p">(</span>aout<span class="p">)</span> </pre></div> <p>This can be written to a file for later use, like here.</p> <p>The plan is to make at least these following changes to the chart:</p> <ul> <li>define function <code>regcycle()</code> as the reusable chart and invoke it four times</li> <li>instead of creating multiple scales and axes, rewrite each within each plot/view mode function instead so they're self-contained</li> <li>move the axis updates to the top of each plot function</li> <li>load the source Anscombe data and initialize the charts using a <code>d3.json()</code> callback</li> <li>bind the selections and the data to each of the four charts</li> </ul> <p>Let's see how it goes.</p> <div class="container-fluid"> <div class="row"> <div id="a1" class="col-xs-6"></div> <div id="a2" class="col-xs-6"></div> </div> <div class="row"> <div id="a3" class="col-xs-6"></div> <div id="a4" class="col-xs-6"></div> </div> </div> <style> .axis path, .axis line { fill: none; stroke: black; shape-rendering: crispEdges; } .axis text { font-family: sans-serif; font-size: 11px; } .label { font-family: sans-serif; font-variant: small-caps; font-weight: normal; font-size: x-large; } </style> <script> // Each set in Anscombe's Quartet has the same summary numbers // Better to use a javascript stats lib to calculate all this // inside the chart; oh well, a shortcut for now var xmean = 9; var ymean = 7.5; var xsd = 11; var ysd = 4.1245; // fudging slightly three digits down var slope = 0.5; var intercept = 3.0; var qnorm = [-1.383, -0.967, -0.674, -0.431, -0.210, 0.0, 0.210, 0.431, 0.674, 0.967, 1.383]; // colorbrewer "spectral" 11 var colors = ["#9e0142", "#d53e4f", "#f46d43", "#fdae61", "#fee08b", "#ffffbf", "#e6f598", "#abdda4", "#66c2a5", "#3288bd", "#5e4fa2"]; var color_scale = d3.scale.ordinal() .domain([0, 10]) .range(colors); function expected(index) { return (slope * index) + intercept; }; function regcycle() { var width = 400; var height = 400; var padding = 30; var buffer = 1.1; var duration = 2000; var delay = 2000; function my(sel) { // generate a unique id for the named anchors var data = []; // this seems wrong var seldata = sel.data(); // reshape the data for (i=0; i < seldata.x.length; i++) { var obs = { x: seldata.x[i], y: seldata.y[i], residual: seldata.error[i], cooks: seldata.cooks[i], quantile: seldata.quantile[i], }; data.push(obs); }; var uid = Math.round(Math.random() * 1024); var min_x = d3.min(data, function(d) { return d.x; }) - 1; var max_x = d3.max(data, function(d) { return d.x; }) + 1; var min_y = d3.min(data, function(d) { return d.y; }) - 1; var max_y = d3.max(data, function(d) { return d.y; }) + 1; var max_residual = d3.max(data, function(d) { return Math.abs(d.residual); }); var max_cooks = d3.max(data, function(d) { return d.cooks; }) var max_quantile = d3.max(data, function(d) { return Math.abs(d.quantile); }); var max_qnorm = d3.max(qnorm); // if the Cook's values are all low, lower the threshold so // we can still discern individual values if (max_cooks >= 0.5) { if (max_cooks <= 1.1) { max_cooks = 1.1; }; }; // check for NaN values, set a high value if present if (seldata.cooks.some(isNaN)) { max_cooks = 2; }; var svg = sel.append("svg") .attr("width", width) .attr("height", height); // how much of the setup should be outside of the specific // functions? it's repeating a lot for this first one... // x and y scales, axes, for the basic fit plot var x = d3.scale.linear() .domain([min_x, max_x]) .range([padding, width - padding]); var x_axis = d3.svg.axis() .orient("bottom") .scale(x); var y = d3.scale.linear() .domain([min_y, max_y]) .range([height - padding, padding]); var y_axis = d3.svg.axis() .orient("left") .scale(y); // sel contains general data/info like the regression line svg.append("line") .attr("id", "line" + uid) .attr("x1", x(min_x)) .attr("y1", y(expected(min_x))) .attr("x2", x(max_x)) .attr("y2", y(expected(max_x))) .attr("stroke-width", 2) .attr("stroke", "steelblue"); // g binds to the data; this feels like an unneeded two-step // when sel is already bound too, perhaps a mistake? var g = svg.selectAll("g") .data(data) .enter().append("g") .attr("class", "object"); // styling elements should be in css, not here g.each(function(d, i) { var o = d3.select(this); o.attr("class", "observation"); o.append("line") .attr("x1", x(d.x)) .attr("y1", y(d.y)) .attr("x2", x(d.x)) .attr("y2", y(expected(d.y))) .attr("class", "residual-bar") .attr("stroke-width", 0) .attr("stroke", "gray"); o.append("circle") .attr("r", 5) // hard-coded! .attr("cx", x(d.x)) .attr("cy", y(d.y)) .attr("class", "data-point") .attr("stroke", "black") .attr("fill", color_scale(i)); }); // establish initial axes svg.append("g") .attr("id", "x_axis" + uid) .attr("class", "axis") .attr("transform", "translate(0, " + (height - padding) + ")") .call(x_axis); svg.append("g") .attr("id", "y_axis" + uid) .attr("class", "axis") .attr("transform", "translate(" + padding + ", 0)") .call(y_axis); // initial label svg.append("text") .attr("id", "label" + uid) .attr("class", "label") .attr("x", 40) // hard-coded! .attr("y", 40) // hard-coded! .text("model fit"); setTimeout(residual, delay); // should these be inside this function or one level up? // does it matter? function fit() { // reset scales/axes for fit plot x = d3.scale.linear() .domain([min_x, max_x]) .range([padding, width - padding]); x_axis = d3.svg.axis() .orient("bottom") .scale(x); y = d3.scale.linear() .domain([min_y, max_y]) .range([height - padding, padding]); y_axis = d3.svg.axis() .orient("left") .scale(y); svg.select("#x_axis" + uid).transition() .duration(duration) .call(x_axis); svg.select("#y_axis" + uid).transition() .duration(duration) .call(y_axis); label = svg.select("#label" + uid).transition() .duration(duration) .text("fit model"); line = svg.select("#line" + uid).transition() .duration(duration) .attr("x1", x(min_x)) .attr("y1", y(expected(min_x))) .attr("x2", x(max_x)) .attr("y2", y(expected(max_x))); var c = svg.selectAll(".observation"); c.each(function(d, i) { var o = d3.select(this); o.select(".residual-bar").transition() .duration(duration) .attr("x1", x(d.x)) .attr("y1", y(d.y)) .attr("x2", x(d.x)) .attr("y2", y(expected(i))) .attr("stroke-width", 0); o.select(".data-point").transition() .duration(duration) .attr("cx", x(d.x)) .attr("cy", y(d.y)); }); setTimeout(residual, delay + duration); }; function residual() { // reset y scale/axis y = d3.scale.linear() .domain([-max_residual, max_residual]) .range([height - padding, padding]); y_axis = d3.svg.axis() .orient("left") .scale(y); svg.select("#y_axis" + uid).transition() .duration(duration) .call(y_axis); label = svg.select("#label" + uid).transition() .duration(duration) .text("residuals"); line = svg.select("#line" + uid).transition() .duration(duration) .attr("y1", y(0)) .attr("y2", y(0)); var c = svg.selectAll(".observation"); c.each(function(d, i) { var o = d3.select(this); o.select(".data-point").transition() .duration(duration) .attr("cy", y(d.residual)); o.select(".residual-bar").transition() .delay(duration) .attr("x1", x(d.x)) .attr("y1", y(d.residual)) .attr("x2", x(d.x)) .attr("y2", y(0)) .attr("stroke-width", 3); // style hard-coded }); setTimeout(cooks, delay + duration); }; function cooks() { // reset scale / axis for cooks, x in order, not by value x = d3.scale.linear() .domain([0, data.length]) .range([padding, width - padding]); x_axis = d3.svg.axis() .orient("bottom") .scale(x); svg.select("#x_axis" + uid).transition() .duration(duration) .call(x_axis); y = d3.scale.linear() .domain([0, max_cooks]) .range([height - padding, padding]); y_axis= d3.svg.axis() .orient("left") .scale(y); svg.select("#y_axis" + uid).transition() .duration(duration) .call(y_axis); label = svg.select("#label" + uid).transition() .duration(duration) .text("cook's distance"); line = svg.select("#line" + uid).transition() .duration(duration) .attr("x1", x(0)) .attr("y1", y(1)) .attr("x2", x(data.length)) .attr("y2", y(1)); var c = svg.selectAll(".observation"); c.each(function(d, i) { var o = d3.select(this); o.select(".data-point").transition() .duration(duration) .attr("cx", x(i + 1)) .attr("cy", y(isNaN(d.cooks) ? 50 : d.cooks)); o.select(".residual-bar").transition() .duration(duration) .attr("x1", x(i + 1)) .attr("y1", y(isNaN(d.cooks) ? 50 : d.cooks)) .attr("x2", x(i + 1)) .attr("y2", y(0)); }); setTimeout(qq, delay + duration); }; function qq() { // reset x scale/axis to normal quantiles x = d3.scale.linear() .domain([-max_qnorm * buffer, max_qnorm * buffer]) .range([padding, width - padding]); x_axis = d3.svg.axis() .orient("bottom") .scale(x); svg.select("#x_axis" + uid).transition() .duration(duration) .call(x_axis); // reset y scale/axis to observed quantiles y = d3.scale.linear() .domain([-max_quantile * buffer, max_quantile * buffer]) .range([height - padding, padding]); y_axis= d3.svg.axis() .orient("left") .scale(y); svg.select("#y_axis" + uid).transition() .duration(duration) .call(y_axis); label = svg.select("#label" + uid).transition() .duration(duration) .text("q-q normal vs. observed"); line = svg.select("#line" + uid).transition() .duration(duration) .attr("y1", y(-max_quantile)) .attr("y2", y(max_quantile)); // sort the data to align Q-Q var quantiles = data.map(function(d) { return d.quantile; }); var sorted = quantiles.sort(function(a, b) { return a - b; }); var c = svg.selectAll(".observation"); c.each(function(d, i) { var o = d3.select(this); o.select(".data-point").transition() .duration(duration) .attr("cx", x(qnorm[i])) .attr("cy", y(sorted[i])); o.select(".residual-bar").transition() .attr("stroke-width", 0); }); setTimeout(fit, delay + duration); }; }; // add accessors here some other time :) return my; }; // init the four charts var a1_cycle = regcycle(); var a2_cycle = regcycle(); var a3_cycle = regcycle(); var a4_cycle = regcycle(); // grab data, bind to charts, and render d3.json("/data/20141112-animating-anscombe.json", function(data) { d3.select("#a1") .datum(data.a1) .call(a1_cycle); d3.select("#a2") .datum(data.a2) .call(a2_cycle); d3.select("#a3") .datum(data.a3) .call(a3_cycle); d3.select("#a4") .datum(data.a4) .call(a4_cycle); }); </script> <p>This seems about right. Some additional changes proved necessary:</p> <ul> <li> <p>color! highlighting each data point with a color from the <a href="http://colorbrewer2.org/">color brewer</a> "spectral" should support a viewer's ability to follow any specific observation through the four plots.</p> </li> <li> <p>the JSON output from R I described above was more awkward to work with than a more row- or observation-oriented dataset shape, so there's a quick reshaping step. This results in simple references to the data values.</p> <div class="highlight"><pre>// reshape the data into observations for (i=0; i &lt; seldata.x.length; i++) { var obs = { x: seldata.x[i], y: seldata.y[i], residual: seldata.error[i], cooks: seldata.cooks[i], quantile: seldata.quantile[i], }; data.push(obs); }; </pre></div> </li> <li> <p>the Cook's Distance calculation for dataset four results in a NaN value for the far-right value, so I added a check for that to result in shooting the data point straight up way off the viewpane. This is perhaps not viable statistically but it feeds the animation well, specifically in the transition to the Q-Q plot, making the story told clearer to my eye. Following the bottom right pane, watch the green dot snap back into place at the very end of the transition to Q-Q and you get the effect. The data check for the NaN is simple but effective:</p> <div class="highlight"><pre><span class="kd">var</span> <span class="nx">c</span> <span class="o">=</span> <span class="nx">svg</span><span class="p">.</span><span class="nx">selectAll</span><span class="p">(</span><span class="s2">&quot;.observation&quot;</span><span class="p">);</span> <span class="nx">c</span><span class="p">.</span><span class="nx">each</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">d</span><span class="p">,</span> <span class="nx">i</span><span class="p">)</span> <span class="p">{</span> <span class="kd">var</span> <span class="nx">o</span> <span class="o">=</span> <span class="nx">d3</span><span class="p">.</span><span class="nx">select</span><span class="p">(</span><span class="k">this</span><span class="p">);</span> <span class="nx">o</span><span class="p">.</span><span class="nx">select</span><span class="p">(</span><span class="s2">&quot;.data-point&quot;</span><span class="p">).</span><span class="nx">transition</span><span class="p">()</span> <span class="p">.</span><span class="nx">duration</span><span class="p">(</span><span class="nx">duration</span><span class="p">)</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;cx&quot;</span><span class="p">,</span> <span class="nx">x</span><span class="p">(</span><span class="nx">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;cy&quot;</span><span class="p">,</span> <span class="nx">y</span><span class="p">(</span><span class="nb">isNaN</span><span class="p">(</span><span class="nx">d</span><span class="p">.</span><span class="nx">cooks</span><span class="p">)</span> <span class="o">?</span> <span class="mi">50</span> <span class="o">:</span> <span class="nx">d</span><span class="p">.</span><span class="nx">cooks</span><span class="p">));</span> <span class="nx">o</span><span class="p">.</span><span class="nx">select</span><span class="p">(</span><span class="s2">&quot;.residual-bar&quot;</span><span class="p">).</span><span class="nx">transition</span><span class="p">()</span> <span class="p">.</span><span class="nx">duration</span><span class="p">(</span><span class="nx">duration</span><span class="p">)</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;x1&quot;</span><span class="p">,</span> <span class="nx">x</span><span class="p">(</span><span class="nx">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;y1&quot;</span><span class="p">,</span> <span class="nx">y</span><span class="p">(</span><span class="nb">isNaN</span><span class="p">(</span><span class="nx">d</span><span class="p">.</span><span class="nx">cooks</span><span class="p">)</span> <span class="o">?</span> <span class="mi">50</span> <span class="o">:</span> <span class="nx">d</span><span class="p">.</span><span class="nx">cooks</span><span class="p">))</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;x2&quot;</span><span class="p">,</span> <span class="nx">x</span><span class="p">(</span><span class="nx">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;y2&quot;</span><span class="p">,</span> <span class="nx">y</span><span class="p">(</span><span class="mi">0</span><span class="p">));</span> <span class="p">});</span> </pre></div> </li> <li> <p>added a 1.1 buffer factor around the input domain for several of the axes to draw the data inside the axis lines.</p> </li> <li> <p>bringing the residual and distance bars into the two relevant plots proved to require more attention. because the data points move around, the bars can be left in a position from an earlier plot that doesn't make sense two plots later. that could lead to the bars whooshing in from odd angles as they reappear, which is awkward, detracting from the intended narrative. This temporary mistake evokes that awkwardness:</p> </li> </ul> <p><img alt="this temporary mistake" src="http://data.onebiglibrary.net/images/20141112-anscombe-error.png" /> </p> <h3>Left as an exercise for the writer</h3> <p>There are several unresolved issues I would like to revisit:</p> <ul> <li> <p>synchronizing the transitions among four different chart instances doesn't seem to have a single obvious solution. you can see them fall out of sync if you want the cycle long enough, and avoiding that might require some sort of clock check or simple communication pattern. even so, the eye can only meaningfully follow one plot at a time, so it doesn't ruin the effect, and if you believe google analytics few readers spend more than one minute per page on this site, so it's not a serious problem here, now.</p> </li> <li> <p>i can see case for pulling the axis resetting back out of the individual plot modes again; it's a little cumbersome to keep reassigning each time. on the other hand, this way all the logic for a plot is self-contained, so it would feel a little cleaner to add more plots to the reel without having to bounce around and keep track of a dozen different scale and axis variables.</p> </li> <li> <p>would be nice to jitter or spread out the residuals so the bars don't overlap like on the fourth dataset.</p> </li> <li> <p>no configuration accessors keeps this from being particularly resuable by anybody else, but that's the other side of that line i drew in going for that middle ground. other homework awaits!</p> </li> <li> <p>several details are hard-coded, like the color scale and the size and styling of different elements.</p> </li> <li> <p>if <a href="http://jstat.github.io/">jstat</a> or <a href="https://github.com/tmcw/simple-statistics">simple statistics</a> had a cook's distance function we could take arbitrary datasets and render them all inline, or at least as part of the reusable graph.</p> </li> </ul> <p>Always good to have something to work on next.</p>animating regression, part 22014-10-18T00:00:00-04:00dchudtag:data.onebiglibrary.net,2014-10-18:2014/10/18/animating-regression-part-2/<p>Returning to the question of animating a regression model and its residuals. <a href="http://data.onebiglibrary.net/2014/09/18/animating-regression/">Part 1</a> stepped forward through basic animation with d3 toward a simplistic regression model and a first view of residuals. Let's take that through two new views, a Q-Q plot and a plot of potential outliers with Cook's Distance. In this part we'll complete a full cycle through these four plots, and in a final piece we'll review and tweak the look of it all and add some more interesting data into the mix.</p> <p>Picking up where we left off, we have a regression line transitioning to a view of residuals. Let's start by emphasizing that those residual distances are most valuable in the latter view, and remove them from the model view.</p> <div id='fig1'></div> <script> var width = 400; var height = 400; var duration = 2000; var delay = 2000; var data = [15, 22, 34, 53, 48, 60, 95, 79, 88, 109, 92]; var slope = 9.036; var intercept = 18; function expected(index) { return (slope * index) + intercept; }; var fig1 = d3.select("#fig1").append("svg") .attr("width", width) .attr("height", height); var padding = 30; var x = d3.scale.linear() .domain([0, data.length]) .range([padding, width - padding]); var y = d3.scale.linear() .domain([d3.min(data), d3.max(data)]) .range([height - padding, padding]); var g = fig1.selectAll("g") .data(data) .enter().append("g") .attr("class", "object"); fig1.append("line") .attr("id", "line") .attr("x1", x(0)) .attr("y1", y(intercept)) .attr("x2", x(11)) .attr("y2", y(expected(11))) .attr("stroke-width", 2) .attr("stroke", "steelblue"); g.each(function(d, i) { var o = d3.select(this); o.attr("class", "observation"); o.append("line") .attr("x1", x(i)) .attr("y1", y(d)) .attr("x2", x(i)) .attr("y2", y(expected(i))) .attr("class", "residual-bar") .attr("stroke-width", 0) .attr("stroke", "gray"); o.append("circle") .attr("r", 5) .attr("cx", x(i)) .attr("cy", y(d)) .attr("stroke", "black") .attr("fill", "darkslategrey"); }); setTimeout(residual, delay); function fit() { line = fig1.select("#line").transition() .duration(duration) .attr("y1", y(intercept)) .attr("y2", y(expected(11))); var c = fig1.selectAll(".observation"); c.each(function(d, i) { var o = d3.select(this); o.transition() .duration(duration) .attr("transform", "translate(0, 0)"); residual_bars = o.select(".residual-bar").transition() .duration(duration) .attr("stroke-width", 0); }); setTimeout(residual, delay + duration); }; function residual() { line = fig1.select("#line").transition() .duration(duration) .attr("y1", height/2) .attr("y2", height/2); var c = fig1.selectAll(".observation"); c.each(function(d, i) { var o = d3.select(this); o.transition() .duration(duration) .attr("transform", "translate(0, " + (200 - y(expected(i))) + ")"); residual_bars = o.select(".residual-bar").transition() .delay(delay / 2) .duration(duration) .attr("stroke-width", 3); }); setTimeout(fit, delay + duration); }; </script> <h4>Showing residuals properly</h4> <p>Let's improve on this by showing appropriate axes for each view. Then, during the transitions, we will re-scale the y-axis to the residual values and back to the true data points again.</p> <div id='fig2'></div> <style> .axis path, .axis line { fill: none; stroke: black; shape-rendering: crispEdges; } .axis text { font-family: sans-serif; font-size: 11px; } </style> <script> // using same data, add in the residuals too this time var residuals = []; data.forEach(function(d, i) { residuals.push(expected(i) - d); }); var max_residual = d3.max(residuals, function(d) { return Math.abs(d); }); var fig2 = d3.select("#fig2").append("svg") .attr("width", width) .attr("height", height); var x = d3.scale.linear() .domain([0, data.length]) .range([padding, width - padding]); var y = d3.scale.linear() .domain([d3.min(data), d3.max(data)]) .range([height - padding, padding]); var y_residuals = d3.scale.linear() .domain([-max_residual, max_residual]) .range([height - padding, padding]); var x_axis = d3.svg.axis() .orient("bottom") .scale(x); var y_axis = d3.svg.axis() .orient("left") .scale(y); var y_residuals_axis = d3.svg.axis() .orient("left") .scale(y_residuals); var g = fig2.selectAll("g") .data(data) .enter().append("g") .attr("class", "object"); fig2.append("line") .attr("id", "line") .attr("x1", x(0)) .attr("y1", y(intercept)) .attr("x2", x(11)) .attr("y2", y(expected(11))) .attr("stroke-width", 2) .attr("stroke", "steelblue"); g.each(function(d, i) { var o = d3.select(this); o.attr("class", "observation"); o.append("line") .attr("x1", x(i)) .attr("y1", y(d)) .attr("x2", x(i)) .attr("y2", y(expected(i))) .attr("class", "residual-bar") .attr("stroke-width", 0) .attr("stroke", "gray"); o.append("circle") .attr("r", 5) .attr("cx", x(i)) .attr("cy", y(d)) .attr("stroke", "black") .attr("fill", "darkslategrey"); }); fig2.append("g") .attr("id", "x_axis") .attr("class", "axis") .attr("transform", "translate(0, " + (height - padding) + ")") .call(x_axis); fig2.append("g") .attr("id", "y_axis") .attr("class", "axis") .attr("transform", "translate(" + padding + ", 0)") .call(y_axis); setTimeout(residual2, delay); function fit2() { line = fig2.select("#line").transition() .duration(duration) .attr("y1", y(intercept)) .attr("y2", y(expected(11))); var c = fig2.selectAll(".observation"); c.each(function(d, i) { var o = d3.select(this); o.transition() .duration(duration) .attr("transform", "translate(0, 0)"); residual_bars = o.select(".residual-bar").transition() .duration(duration) .attr("stroke-width", 0); }); fig2.select("#y_axis").transition() .duration(duration) .call(y_axis); setTimeout(residual2, delay + duration); }; function residual2() { line = fig2.select("#line").transition() .duration(duration) .attr("y1", y_residuals(0)) .attr("y2", y_residuals(0)); var c = fig2.selectAll(".observation"); c.each(function(d, i) { var o = d3.select(this); o.transition() .duration(duration) .attr("transform", "translate(0, " + (y_residuals(0) - y(expected(i))) + ")"); residual_bars = o.select(".residual-bar").transition() .duration(duration) .attr("stroke-width", 3); }); fig2.select("#y_axis").transition() .duration(duration) .call(y_residuals_axis); setTimeout(fit2, delay + duration); }; </script> <p>Some quick notes about this:</p> <ul> <li> <p>To relocate the residuals and the corresponding scale, the residual values are now explicitly calculated:</p> <div class="highlight"><pre><span class="kd">var</span> <span class="nx">residuals</span> <span class="o">=</span> <span class="cp">[]</span><span class="p">;</span> <span class="nx">data</span><span class="p">.</span><span class="nx">forEach</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">d</span><span class="p">,</span> <span class="nx">i</span><span class="p">)</span> <span class="p">{</span> <span class="nx">residuals</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">expected</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span> <span class="o">-</span> <span class="nx">d</span><span class="p">);</span> <span class="p">});</span> </pre></div> </li> <li> <p>Then, we look for the maximum residual value to define the domain of the y-scale:</p> <div class="highlight"><pre><span class="kd">var</span> <span class="nx">max_residual</span> <span class="o">=</span> <span class="nx">d3</span><span class="p">.</span><span class="nx">max</span><span class="p">(</span><span class="nx">residuals</span><span class="p">,</span> <span class="kd">function</span><span class="p">(</span><span class="nx">d</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">abs</span><span class="p">(</span><span class="nx">d</span><span class="p">);</span> <span class="p">});</span> <span class="kd">var</span> <span class="nx">y_residuals</span> <span class="o">=</span> <span class="nx">d3</span><span class="p">.</span><span class="nx">scale</span><span class="p">.</span><span class="nx">linear</span><span class="p">()</span> <span class="p">.</span><span class="nx">domain</span><span class="p">(</span><span class="cp">[</span><span class="na">-max_residual</span><span class="p">,</span> <span class="nx">max_residual</span><span class="cp">]</span><span class="p">)</span> <span class="p">.</span><span class="nx">range</span><span class="p">(</span><span class="cp">[</span><span class="nx">height</span> <span class="o">-</span> <span class="nx">padding</span><span class="p">,</span> <span class="nx">padding</span><span class="cp">]</span><span class="p">);</span> </pre></div> </li> <li> <p>Finally, whereas in the first version above, I just picked an arbitrary y-location (200) to anchor the residual bars after the transition...</p> <div class="highlight"><pre>o.transition() .duration(duration) .attr(&quot;transform&quot;, &quot;translate(0, &quot; + (200 - y(expected(i))) + &quot;)&quot;); </pre></div> </li> <li> <p>...now we can locate these correctly according to the residuals scale. We just have to replace the arbitrary location with the exact midpoint of the y_residuals scale:</p> <div class="highlight"><pre>o.transition() .duration(duration) .attr(&quot;transform&quot;, &quot;translate(0, &quot; + (y_residuals(0) - y(expected(i))) + &quot;)&quot;); </pre></div> </li> </ul> <h4>Adding Cook's Distance</h4> <p>In simple linear regression models like this, outlier values can influence the slope model significantly, making predictions based on resulting model with strong outliers less accurate than desired. A standard way to evaluate whether any outliers exist in a dataset is to examine <a href="http://en.wikipedia.org/wiki/Cook's_distance">Cook's Distance</a>. It's easy enough to calculate in R:</p> <div class="highlight"><pre> fit <span class="o">&lt;-</span> lm<span class="p">(</span>formula<span class="o">=</span>d <span class="o">~</span> <span class="kp">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">10</span><span class="p">))</span> cd <span class="o">&lt;-</span> cooks.distance<span class="p">(</span>fit<span class="p">)</span> </pre></div> <p>The basic rule of thumb is to look for any Cook's Distance values of 1 or greater. It's easy enough to plot with this in mind, typical graphs for Cook's D show the values as vertical bars with a horizontal line at 1. We'll need to transition the y scale/axis again, and a detail to notice is that the Cook's D values might all be well below 1, so we need to make a choice: if the values are well below 1, leave the line off completely. If any data points approach or pass 1, show the line. The risk is that if none of the values are particularly large at all (e.g. all below 0.10, as in the example diagnostics image at the top of <a href="http://data.onebiglibrary.net/2014/09/18/animating-regression/">part 1</a>) then if we scale the y axis all the way to 1, the variation among the small values will blur down into nothing. To handle this, we'll look for a mid-range value like 0.33 or 0.5, and if the max Cook's D is below that line, we'll scale the axis with the narrower value domain; otherwise, we'll scale it up through 1.</p> <div id='fig3'></div> <style> .axis path, .axis line { fill: none; stroke: black; shape-rendering: crispEdges; } .axis text { font-family: sans-serif; font-size: 11px; } </style> <script> // using same data, add in the residuals too this time var residuals = []; data.forEach(function(d, i) { residuals.push(d - expected(i)); }); var max_residual = d3.max(residuals, function(d) { return Math.abs(d); }); var cooks = [0.0267, 0.0445, 0.0047, 0.045, 0.0202, 0.0048, 0.2774, 0.0037, 0.0057, 0.1642, 0.7934]; var max_cooks = d3.max(cooks); if (max_cooks >= 0.5) { if (max_cooks >= 1.1) { ; } else { max_cooks = 1.1; } }; var fig3 = d3.select("#fig3").append("svg") .attr("width", width) .attr("height", height); var x = d3.scale.linear() .domain([0, data.length]) .range([padding, width - padding]); var y = d3.scale.linear() .domain([d3.min(data), d3.max(data)]) .range([height - padding, padding]); var y_residuals = d3.scale.linear() .domain([-max_residual, max_residual]) .range([height - padding, padding]); var y_cooks = d3.scale.linear() .domain([0, max_cooks]) .range([height - padding, padding]); var x_axis = d3.svg.axis() .orient("bottom") .scale(x); var y_axis = d3.svg.axis() .orient("left") .scale(y); var y_residuals_axis = d3.svg.axis() .orient("left") .scale(y_residuals); var y_cooks_axis = d3.svg.axis() .orient("left") .scale(y_cooks) var g = fig3.selectAll("g") .data(data) .enter().append("g") .attr("class", "object"); fig3.append("line") .attr("id", "line") .attr("x1", x(0)) .attr("y1", y(intercept)) .attr("x2", x(11)) .attr("y2", y(expected(11))) .attr("stroke-width", 2) .attr("stroke", "steelblue"); g.each(function(d, i) { var o = d3.select(this); o.attr("class", "observation"); o.append("line") .attr("x1", x(i)) .attr("y1", y(d)) .attr("x2", x(i)) .attr("y2", y(expected(i))) .attr("class", "residual-bar") .attr("stroke-width", 0) .attr("stroke", "gray"); o.append("circle") .attr("r", 5) .attr("cx", x(i)) .attr("cy", y(d)) .attr("class", "data-point") .attr("stroke", "black") .attr("fill", "darkslategrey"); }); fig3.append("g") .attr("id", "x_axis") .attr("class", "axis") .attr("transform", "translate(0, " + (height - padding) + ")") .call(x_axis); fig3.append("g") .attr("id", "y_axis") .attr("class", "axis") .attr("transform", "translate(" + padding + ", 0)") .call(y_axis); setTimeout(residual3, delay); function fit3() { line = fig3.select("#line").transition() .duration(duration) .attr("y1", y(intercept)) .attr("y2", y(expected(11))); var c = fig3.selectAll(".observation"); c.each(function(d, i) { var o = d3.select(this); o.select(".residual-bar").transition() .duration(duration) .attr("y1", y(d)) .attr("y2", y(expected(i))) .attr("stroke-width", 0); o.select(".data-point").transition() .duration(duration) .attr("cy", y(d)); }); fig3.select("#y_axis").transition() .duration(duration) .call(y_axis); setTimeout(residual3, delay + duration); }; function residual3() { line = fig3.select("#line").transition() .duration(duration) .attr("y1", y_residuals(0)) .attr("y2", y_residuals(0)); var c = fig3.selectAll(".observation"); c.each(function(d, i) { var o = d3.select(this); o.select(".data-point").transition() .duration(duration) .attr("cy", y_residuals(residuals[i])); o.select(".residual-bar").transition() .duration(duration) .attr("y1", y_residuals(residuals[i])) .attr("y2", y_residuals(0)) .attr("stroke-width", 3); }); fig3.select("#y_axis").transition() .duration(duration) .call(y_residuals_axis); setTimeout(cooks3, delay + duration); }; function cooks3() { line = fig3.select("#line").transition() .duration(duration) .attr("y1", y_cooks(1)) .attr("y2", y_cooks(1)); var c = fig3.selectAll(".observation"); c.each(function(d, i) { var o = d3.select(this); o.select(".data-point").transition() .duration(duration) .attr("cy", y_cooks(cooks[i])); o.select(".residual-bar").transition() .duration(duration) .attr("y1", y_cooks(cooks[i])) .attr("y2", y_cooks(0)); }); fig3.select("#y_axis").transition() .duration(duration) .call(y_cooks_axis); setTimeout(fit3, delay + duration); } </script> <p>This took some fiddling - ultimately, removing the transform/translate() bits from the section described earlier made this simpler. Instead of translating the values, directly setting the y values based on the appropriate scale functions for each view mode is more direct. As it turns out, the translation above wasn't even correct, so this anchors us back in cleaner code with less of a cognitive gap to verify that the values are accurate. Lesson learned: don't fall back to SVG translate() when simpler (and higher-order) d3 scales will do the job.</p> <p>Once it started working correctly an immediate benefit of this animation approach became clear. Look at the data point at <code>x=6</code>. It looks as if it's a big outlier, especially when we shift into the residuals view, where it carries the largest residual error. But when shifting into the Cook's Distance view, its impact as an outlier proves to be much less than that of the point at <code>x=10</code>, which is roughly 0.8. This makes intuitive sense when shifting back to the model fit view; <code>x=10</code> drags the slope of the model down substantially, enough to be wary of, even if not enough to consider throwing the value out.</p> <h4>Adding the Q-Q Plot</h4> <p>To add the <a href="http://en.wikipedia.org/wiki/Q%E2%80%93Q_plot">Q-Q plot</a> we have to calculate the quantile of each value and plot that against normal quantiles. To make it work with the animation loop, though, we further have to sort the quantiles and plot them all in the correct order.</p> <p>The plot itself should be straightforward, with the normal quantiles and observed values on the x- and y-axis, respectively, and a normal line running through it all.</p> <p>Because the axes and points are moving around so much, we'll add a simple title label and update it as we switch plots.</p> <div id='fig4'></div> <style> .label { font-family: sans-serif; font-variant: small-caps; font-weight: normal; font-size: x-large; } </style> <script> var data4 = [ {"cooks": 0.0267, "error": -3.0, "q": -1.522, "index": 0, "raw": 15, "qqindex": 0}, {"cooks": 0.0445, "error": -5.036, "q": -1.3009, "index": 1, "raw": 22, "qqindex": 1}, {"cooks": 0.0047, "error": -2.072, "q": -0.9218, "index": 2, "raw": 34, "qqindex": 2}, {"cooks": 0.045, "error": 7.892, "q": -0.3216, "index": 3, "raw": 53, "qqindex": 4}, {"cooks": 0.0202, "error": -6.144, "q": -0.4796, "index": 4, "raw": 48, "qqindex": 3}, {"cooks": 0.0048, "error": -3.18, "q": -0.1005, "index": 5, "raw": 60, "qqindex": 5}, {"cooks": 0.2774, "error": 22.784, "q": 1.0051, "index": 6, "raw": 95, "qqindex": 7}, {"cooks": 0.0037, "error": -2.252, "q": 0.4997, "index": 7, "raw": 79, "qqindex": 8}, {"cooks": 0.0057, "error": -2.288, "q": 0.784, "index": 8, "raw": 88, "qqindex": 10}, {"cooks": 0.1642, "error": 9.676, "q": 1.4473, "index": 9, "raw": 109, "qqindex": 6}, {"cooks": 0.7934, "error": -16.36, "q": 0.9103, "index": 10, "raw": 92, "qqindex": 9} ]; var qnorm = [-1.383, -0.967, -0.674, -0.431, -0.210, 0.0, 0.210, 0.431, 0.674, 0.967, 1.383]; var xbar = 63.182; var min_raw4 = d3.min(data4, function(d) { return d.raw; }); var max_raw4 = d3.max(data4, function(d) { return d.raw; }); var max_residual4 = d3.max(data4, function(d) { return Math.abs(d.error); }); var max_cooks4 = d3.max(data4, function(d) { return d.cooks; }); var max_q = d3.max(data4, function(d) { return Math.abs(d.q); }); var max_qnorm = d3.max(qnorm); var buffer = 1.1; if (max_cooks4 >= 0.5) { if (max_cooks4 >= 1.1) { ; } else { max_cooks4 = 1.1; } }; var fig4 = d3.select("#fig4").append("svg") .attr("width", width) .attr("height", height); var x4 = d3.scale.linear() .domain([0, data4.length]) .range([padding, width - padding]); var x_qnorm4 = d3.scale.linear() .domain([-max_qnorm * buffer, max_qnorm * buffer]) .range([padding, width - padding]); var y4 = d3.scale.linear() .domain([min_raw4, max_raw4]) .range([height - padding, padding]); var y_residuals4 = d3.scale.linear() .domain([-max_residual4, max_residual4]) .range([height - padding, padding]); var y_cooks4 = d3.scale.linear() .domain([0, max_cooks4]) .range([height - padding, padding]); var y_q4 = d3.scale.linear() .domain([-max_q * buffer, max_q * buffer]) .range([height - padding, padding]); var x_axis4 = d3.svg.axis() .orient("bottom") .scale(x4); var x_qnorm_axis4 = d3.svg.axis() .orient("bottom") .scale(x_qnorm4); var y_axis4 = d3.svg.axis() .orient("left") .scale(y4); var y_residuals_axis4 = d3.svg.axis() .orient("left") .scale(y_residuals4); var y_cooks_axis4 = d3.svg.axis() .orient("left") .scale(y_cooks4); var y_q_axis4 = d3.svg.axis() .orient("left") .scale(y_q4); var g4 = fig4.selectAll("g") .data(data4) .enter().append("g") .attr("class", "object"); fig4.append("line") .attr("id", "line4") .attr("x1", x4(0)) .attr("y1", y4(intercept)) .attr("x2", x4(11)) .attr("y2", y4(expected(11))) .attr("stroke-width", 2) .attr("stroke", "steelblue"); g4.each(function(d, i) { var o = d3.select(this); o.attr("class", "observation"); o.append("line") .attr("x1", x4(i)) .attr("y1", y4(d.raw)) .attr("x2", x4(i)) .attr("y2", y4(expected(i))) .attr("class", "residual-bar") .attr("stroke-width", 0) .attr("stroke", "gray"); o.append("circle") .attr("r", 5) .attr("cx", x4(i)) .attr("cy", y4(d.raw)) .attr("class", "data-point") .attr("stroke", "black") .attr("fill", "darkslategrey"); }); fig4.append("g") .attr("id", "x_axis4") .attr("class", "axis") .attr("transform", "translate(0, " + (height - padding) + ")") .call(x_axis4); fig4.append("g") .attr("id", "y_axis4") .attr("class", "axis") .attr("transform", "translate(" + padding + ", 0)") .call(y_axis4); fig4.append("text") .attr("id", "label") .attr("class", "label") .attr("x", 40) .attr("y", 40) .text("model fit"); setTimeout(residual4, delay); function fit4() { label = fig4.select("#label").transition() .duration(duration) .text("fit model"); line = fig4.select("#line4").transition() .duration(duration) .attr("y1", y4(intercept)) .attr("y2", y4(expected(11))); var c = fig4.selectAll(".observation"); c.each(function(d, i) { var o = d3.select(this); o.select(".residual-bar").transition() .duration(duration) .attr("x1", x4(i)) .attr("y1", y4(d.raw)) .attr("y2", y4(expected(i))) .attr("stroke-width", 0); o.select(".data-point").transition() .duration(duration) .attr("cx", x4(i)) .attr("cy", y4(d.raw)); }); fig4.select("#x_axis4").transition() .duration(duration) .call(x_axis4); fig4.select("#y_axis4").transition() .duration(duration) .call(y_axis4); setTimeout(residual4, delay + duration); }; function residual4() { label = fig4.select("#label").transition() .duration(duration) .text("residuals"); line = fig4.select("#line4").transition() .duration(duration) .attr("y1", y_residuals4(0)) .attr("y2", y_residuals4(0)); var c = fig4.selectAll(".observation"); c.each(function(d, i) { var o = d3.select(this); o.select(".data-point").transition() .duration(duration) .attr("cy", y_residuals4(d.error)); o.select(".residual-bar").transition() .duration(duration) .attr("y1", y_residuals4(d.error)) .attr("y2", y_residuals4(0)) .attr("stroke-width", 3); }); fig4.select("#y_axis4").transition() .duration(duration) .call(y_residuals_axis4); setTimeout(cooks4, delay + duration); }; function cooks4() { label = fig4.select("#label").transition() .duration(duration) .text("cook's distance"); line = fig4.select("#line4").transition() .duration(duration) .attr("y1", y_cooks4(1)) .attr("y2", y_cooks4(1)); var c = fig4.selectAll(".observation"); c.each(function(d, i) { var o = d3.select(this); o.select(".data-point").transition() .duration(duration) .attr("cy", y_cooks4(d.cooks)); o.select(".residual-bar").transition() .duration(duration) .attr("y1", y_cooks4(d.cooks)) .attr("y2", y_cooks4(0)); }); fig4.select("#y_axis4").transition() .duration(duration) .call(y_cooks_axis4); setTimeout(qq4, delay + duration); } function qq4() { label = fig4.select("#label").transition() .duration(duration) .text("q-q normal vs. observed"); line = fig4.select("#line4").transition() .duration(duration) .attr("y1", y_q4(-max_q)) .attr("y2", y_q4(max_q)); var sorted = data4.sort(function(a, b) { return a.q - b.q; }); var c = fig4.selectAll(".observation"); c.each(function(d, i) { var o = d3.select(this); o.select(".data-point").transition() .duration(duration) .attr("cx", x_qnorm4(qnorm[i])) .attr("cy", y_q4(sorted[i].q)); o.select(".residual-bar").transition() .duration(duration) .attr("stroke-width", 0); }); fig4.select("#x_axis4").transition() .duration(duration) .call(x_qnorm_axis4); fig4.select("#y_axis4").transition() .duration(duration) .call(y_q_axis4); setTimeout(fit4, delay + duration); } </script> <p>That rounds out the sketch.</p> <p>Now: to clean this up enough to be able to use it to render multiple regressions side-by-side. Stay tuned...</p>year's worth of dots2014-10-01T00:00:00-04:00dchudtag:data.onebiglibrary.net,2014-10-01:2014/10/01/years-worth-of-dots/<p>For a project at work we've collected a year's worth of samples from a major non-US social media site. The samples are taken every 30 seconds, a snapshot of the most recent 200 public posts from all users. This created a lot of files, and along the way we missed some in chunks for various reasons (network outage, service error, reboot, etc.). The researcher we're supporting has happily taken a copy of the 100+ GB (compressed) of data to start poring through, but asked that we help prepare a simple visualization of the data that's present - or more importantly, what's missing.</p> <p>Because it's natural to miss a few files here and there over a year's time, it's not a problem unless there are big chunks missing or patterns of errors that make sampling from this data problematic. An image of what's there and what's not there needs to hit a few key points:</p> <ul> <li>cover the entire collection period (actually ~13 months)</li> <li>show missing files</li> <li>show empty files</li> <li>easily spot large gaps </li> <li>easily spot significant patterns</li> </ul> <p>In addition to the immediate use (the researcher's own knowledge of what they have) this visualization needs to work for their advisors and others interested in the work, so it should be readily digested, by which I mean:</p> <ul> <li>should fit on one screen</li> <li>shouldn't require much explanation</li> </ul> <p>To give a sense of volume, this set of files should be roughly <code>365 day/yr * 24 hr/day * 120 files/hr = 1,051,200</code> files. It's a good number. It's too many to read from disk in realtime, so this will require preprocessing.</p> <h4>First sketch</h4> <p>Let's start with a rough picture of what it will take to fit the dots onto one screen. One day's worth is <code>24 * 120 = 2880</code> dots, which is too much for one screen width of pixels, but if we can divide it at least in half, we're getting closer. The 365-day year is easier; we can multiply it by two or three and still fit a good number of pixels in. So with this in mind, here's a <code>1440x730</code> grid.</p> <div id='sketch1'></div> <script> var width = 1440; var height = 730; var sketch1 = d3.select("#sketch1").append("svg") .attr("width", width) .attr("height", height); var y = d3.scale.linear() .domain([0, 365]) .range([0, height]); d3.range(0, 365).forEach(function(ye, yi, ya) { sketch1.append("line") .attr("x1", 0) .attr("y1", y(ye)) .attr("x2", width) .attr("y2", y(ye)) .attr("stroke", "cadetblue") .attr("stroke-width", 1); } ); </script> <p>Yah ok that's too wide.</p> <p>Let's try again, but half again as wide, but just for fun (and because vertical scrolling isn't so hard) let's make it taller.</p> <div id='sketch2'></div> <script> var width = 720; var height = 1095; var sketch2 = d3.select("#sketch2").append("svg") .attr("width", width) .attr("height", height); var y = d3.scale.linear() .domain([0, 365]) .range([0, height]); d3.range(0, 365).forEach(function(ye, yi, ya) { sketch2.append("line") .attr("x1", 0) .attr("y1", y(ye)) .attr("x2", width) .attr("y2", y(ye)) .attr("stroke", "cadetblue") .attr("stroke-width", 1); } ); </script> <p>One more time, with a grid and some date scales to shape it all out better:</p> <div id='sketch3'></div> <style> .axis path, .axis line { fill: none; stroke: black; shape-rendering: crispEdges; } .axis text { font-family: sans-serif; font-size: 11px; } </style> <script> var padding = 40; var width = 720 + padding; var height = 1095 + padding; var sketch3 = d3.select("#sketch3").append("svg") .attr("width", width) .attr("height", height); var x = d3.scale.linear() .domain([0, 720]) .range([padding, width]); var y = d3.scale.linear() .domain([0, 365]) .range([padding/2, height - padding/2]); var x_hours = d3.scale.linear() .domain([0, 23]) .range([padding, width]); var y_months = d3.scale.linear() .domain([0, 12]) .range([padding/2, height - padding/2]); d3.range(0, 24).forEach(function(he, hi, ha) { sketch3.append("line") .attr("x1", x_hours(he)) .attr("y1", y(0)) .attr("x2", x_hours(he)) .attr("y2", y(365)) .attr("stroke", "#ccc") .attr("stroke-width", 2); } ); d3.range(0, 13).forEach(function(me, mi, ma) { sketch3.append("line") .attr("x1", x(0)) .attr("y1", y_months(me)) .attr("x2", x(720)) .attr("y2", y_months(me)) .attr("stroke", "#ccc") .attr("stroke-width", 2); } ); d3.range(0, 365).forEach(function(ye, yi, ya) { sketch3.append("line") .attr("x1", x(0)) .attr("y1", y(ye)) .attr("x2", x(720)) .attr("y2", y(ye)) .attr("stroke", "cadetblue") .attr("stroke-width", 1); } ); var x_axis1 = d3.svg.axis() .scale(x_hours) .orient("bottom"); var x_axis2 = d3.svg.axis() .scale(x_hours) .orient("top"); sketch3.append("g") .attr("class", "axis") .attr("transform", "translate(0, " + y(365 + 1) + ")") .call(x_axis1); sketch3.append("g") .attr("class", "axis") .attr("transform", "translate(0, " + y(0 - 1) + ")") .call(x_axis2); var y_axis = d3.svg.axis() .scale(y_months) .orient("left"); sketch3.append("g") .attr("class", "axis") .attr("transform", "translate(" + x(0 - 3) + ", 0)") .call(y_axis); </script> <h4>Adding real data</h4> <p>Okay, now we're getting somewhere. It's time to work with some real data and place it onto the scales using dates and times. As a first cut, I've extracted a file count for each hour in the dataset. This resulted in a json file with content like this:</p> <div class="highlight"><pre>... &quot;2014-09-29 08:00:00Z&quot;: 120, &quot;2014-09-29 09:00:00Z&quot;: 120, &quot;2014-09-29 10:00:00Z&quot;: 119, &quot;2014-09-29 11:00:00Z&quot;: 120, &quot;2014-09-29 12:00:00Z&quot;: 120, &quot;2014-09-29 13:00:00Z&quot;: 120, ... </pre></div> <p>Loading this into a sketch is easy with <code>d3.json()</code>. The keys are sorted, but just to be thorough I'll also use <code>d3.min()</code> and <code>d3.max()</code> to get the first and last date/times from the set.</p> <p>The next piece of all this is to set the scales to use the dates. I created that data file knowing that javascript should be able to parse the dates cleanly; hopefully this will feed right into the <a href="https://github.com/mbostock/d3/wiki/Time-Scales">d3 time scaling functions</a>.</p> <p>Finally, it'll all come together with line segments drawn in for each hour. The percentage of files available (should be 120 total for each hour) will feed into a color scale. To see the contrast of missing files well, the scale will have to be exponential rather than linear (earlier discussion of which via Albers is written up <a href="http://data.onebiglibrary.net/2014/09/04/albers-color-studies-part-2/">in this post</a>). Once again, d3 helps us out, with the <code>d3.scale.pow()</code> exponential scaling function. To scale the input domain to the output range using a power of two, we just set the exponent on the scale as well, and use colors as the range:</p> <div class="highlight"><pre>var color_scale = d3.scale.pow() .exponent(2) .domain([0, 120]) .range([&quot;#fff&quot;, &quot;#000&quot;]); </pre></div> <p>This should make missing files lighter, with a missing file or two barely noticeable, but more than a dozen or so should be noticeable.</p> <div id="sketch4"></div> <style> .hour line { shape-rendering: crispEdges; } </style> <script> var padding = 70; var width = 1200 + padding; var height = 1600 + padding; var sketch4 = d3.select("#sketch4").append("svg") .attr("width", width) .attr("height", height); // 120 is max number of files per hour var color_scale = d3.scale.pow() .exponent(2) .domain([0, 120]) .range(["lightsteelblue", "midnightblue"]); d3.json("/data/20141001-filecounts.json", render); var dataset; var mindate; var maxdate; function render(e, json) { if (e) return console.warn(e); dataset = json; dataset.forEach(function(de, di, da) { // construct a correct Date object dataset[di].push(new Date(de)); // construct a UTC-midnight-anchored Date for y-positioning dataset[di].push(new Date(de.slice(0, 10)));// + " 00:00:00Z")); }); // adjust hours to anchor extremes at UTC-midnight mindate = new Date(d3.min(dataset, function(d) { return d; })); maxdate = new Date(d3.max(dataset, function(d) { return d; })); var x = d3.scale.linear() .domain([0, 24]) .range([padding, width - padding/4]); var y = d3.time.scale() .domain([mindate, maxdate]) .nice(d3.time.day) .rangeRound([padding/2, height - padding/2]); var hours = sketch4.selectAll(".hour") .data(dataset) .enter().append("line") .attr("class", "hour") .attr("x1", function(d, i) { return x(d.getHours()); }) .attr("y1", function(d, i) { return y(d); }) .attr("x2", function(d, i) { return x(d.getHours() + 1); }) .attr("y2", function(d, i) { return y(d); }) .attr("stroke", function(d) { return color_scale(d); }) .attr("title", function(d) { return d + ": " + d + " files";}) .attr("stroke-width", 3.5); hours.append("svg:title") .attr("class", "hourtext") .text(function(d) { return d + ": " + d + " files"; }); // vertical gridlines for hours d3.range(0, 24).forEach(function(he, hi, ha) { sketch4.append("line") .attr("x1", x(he)) .attr("y1", y(mindate)) .attr("x2", x(he)) .attr("y2", y(maxdate)) .attr("stroke", "#ccc") .attr("stroke-width", 2); } ); var x_axis1 = d3.svg.axis() .scale(x) .orient("bottom"); sketch4.append("g") .attr("class", "axis") .attr("transform", "translate(0, " + y(maxdate) + ")") .call(x_axis1); var x_axis2 = d3.svg.axis() .scale(x) .orient("top"); sketch4.append("g") .attr("class", "axis") .attr("transform", "translate(0, " + y(mindate) + ")") .call(x_axis2); var y_axis = d3.svg.axis() .scale(y) .orient("left"); sketch4.append("g") .attr("class", "axis") .attr("transform", "translate(" + x(0) + ", 0)") .call(y_axis); }; </script> <p>That's the trick. This meets the purpose, but has two problems:</p> <ul> <li>The time zones are off. See the way the first day starts at 00:00 but stops at 20:00? That's the four-hour adjustment for eastern (US) time, which is happening in a way I'm not controlling properly. You can see this for yourself by mousing over a 00:00 block; it will show 04:00 as the hour. It doesn't make sense to do that because it introduces a discontinuity. More importantly, it's unclear what time we're looking at for any given block, and for this particular case the data was collected from a Chinese service, so it's doubly annoying for the researcher to have to correct for two offsets. Looks wrong, is wrong.</li> <li>Not working outside of chrome. Need to debug.</li> </ul> <p>For now I have to leave this aside to get back to other projects. It will be good to circle back around to these to figure out how to get it right. Can't leave it hanging.</p> <p>Fyi, I posted this up as a gist with similar text to be visible at <a href="http://bl.ocks.org/dchud/5b6f902d410e1e5253a1">bl.ocks.org/dchud</a>. If you want to poke at the code without futzing with the rest of all this text, follow that link through or go right to <a href="https://gist.github.com/dchud/5b6f902d410e1e5253a1">the original gist</a>.</p>animating regression2014-09-18T00:00:00-04:00dchudtag:data.onebiglibrary.net,2014-09-18:2014/09/18/animating-regression/<p>When performing a simple linear regression, it's important to review all the diagnostic plots that come with it. If the residual errors aren't normally distributed, you will have to rethink your model. Like I referenced in an <a href="http://data.onebiglibrary.net/2014/08/12/things-to-know-about-data-science/">earlier post</a> you can't just stop at the fit plot, even if it is pretty (here courtesy of SAS):</p> <p><img alt="regression plot" src="http://data.onebiglibrary.net/images/20140812-things/b3-simple-regression-plot.png" /></p> <p>You have to review its diagnostics:</p> <p><img alt="regression diagnostics" src="http://data.onebiglibrary.net/images/20140812-things/b3-simple-regression-diag.png" /></p> <p>Typically in a set of diagnostic plots like this, you look first at the top left chart to see if the residuals balance around 0. The Q-Q plot below that should be close to the 45&deg; line, the histogram below that should look normal the way most of us know, and the Cook's distance plot at middle right should show no outliers near or above 1. Any of these plots going wrong should be a sign that there's something amiss with your model. And this is all in addition to reviewing the numbers that come out of the model, like the p-value on the F test of the model, the R-square, the p-value on the t test of the dependent variable, and the p-value of a normality test on the residual errors.</p> <p>The trick is, though, it can take time to develop an intuitive feel for how to read all these numbers and plots, even for a model that's as (relatively) simple as linear regression.</p> <p><a href="http://d3js.org/">D3.js</a> offers something better, the chance to animate the relationships between these plots, with <a href="http://bost.ocks.org/mike/constancy/">object constancy</a>. Maybe it would be useful to do something like the transitions in the <a href="http://bost.ocks.org/mike/constancy/">showreel</a> with the main fit plot and the diagnostic charts, with constancy among the points in the dataset to show which lie where in the various plots I mentioned above from the fit through the diagnostic set. Let's try that.</p> <h4>Simple transitions</h4> <p>The first step is to wrap our heads around the timed transitions in that showreel; I haven't done those before. The key seems to be the use of <code>setTimeout(callback_function, delay)</code> calls at the end of each function in the <a href="http://bl.ocks.org/mbostock/1256572">showreel source</a>. Note that <code>setTimeout()</code> is a <a href="http://ejohn.org/blog/how-javascript-timers-work/">JavaScript timer</a>, not a D3 function. </p> <p>This should be easy to replicate. To try it out, let's just draw a box, then move it around.</p> <div id='test1'></div> <script> var width = 200; var height = 200; var duration = 1000; var delay = 1000; var test1 = d3.select("#test1").append("svg") .attr("width", width) .attr("height", height); var box = test1.append("rect") .attr("id", "box") .attr("x", 0) .attr("y", 0) .attr("width", 100) .attr("height", 100) .attr("fill", "darkolivegreen"); setTimeout(move_right, duration); function move_right() { test1.select("#box").transition() .duration(duration) .attr("x", 100); setTimeout(move_down, delay + duration); } function move_down() { test1.select("#box").transition() .duration(duration) .attr("y", 100); setTimeout(move_left, delay + duration); } function move_left() { test1.select("#box").transition() .duration(duration) .attr("x", 0); setTimeout(move_up, delay + duration); } function move_up() { test1.select("#box").transition() .duration(duration) .attr("y", 0); setTimeout(move_right, delay + duration); } </script> <p>This is pretty straightforward, we draw a <code>rect</code>, then we set the first timeout to call one of four similar functions that does what you'd expect:</p> <div class="highlight"><pre><span class="nx">setTimeout</span><span class="p">(</span><span class="nx">move_right</span><span class="p">,</span> <span class="nx">duration</span><span class="p">);</span> <span class="kd">function</span> <span class="nx">move_right</span><span class="p">()</span> <span class="p">{</span> <span class="nx">test1</span><span class="p">.</span><span class="nx">select</span><span class="p">(</span><span class="s2">&quot;#box&quot;</span><span class="p">).</span><span class="nx">transition</span><span class="p">()</span> <span class="p">.</span><span class="nx">duration</span><span class="p">(</span><span class="nx">duration</span><span class="p">)</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;x&quot;</span><span class="p">,</span> <span class="mi">100</span><span class="p">);</span> <span class="nx">setTimeout</span><span class="p">(</span><span class="nx">move_down</span><span class="p">,</span> <span class="nx">delay</span> <span class="o">+</span> <span class="nx">duration</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p><code>move_right()</code> uses d3's <code>transition()</code> to shift the <code>x</code> over to 100, then sets a time for <code>move_down()</code>, which shifts <code>y</code> to 100, then sets a timeout with a similar callback to <code>move_left()</code>, then we go to <code>move_up()</code>, which goes back to <code>move_right()</code>, and we have an endless loop of timed transitions. This might not be a model for building UI event-driven animations, of course, but we can settle on this kind of showreel-style series of repeating transitions to show a cycle of plots.</p> <p>Note that the <code>setTimeout</code> delay on each callback isn't just <code>delay</code> but is rather <code>delay + duration</code>. The delay alone runs concurrent with the transition duration, so if we don't add <code>duration</code>, the delay will end at nearly the same time as the duration! The duration means the transition will take <code>duration</code> milliseconds, but javascript still executes the following call to <code>setTimeout</code> immediately, so we have to set the delay value to something longer or the box will never appear to "pause" between transitions.</p> <h4>Adding constancy</h4> <p>The next trick is to do the same thing but with multiple moving points based on data. To do this, we'll expand our model above to include a simple three-value dataset. We'll still move it right, down, left, up, then right again, but in each of these quadrants we'll use a different set of scales to position each element. It's important to use d3's <a href="http://alignedleft.com/tutorials/d3/binding-data">data binding</a> for this rather than, say, a few <code>circle</code> and <code>rect</code> elements we could draw by hand because ultimately we will want to bind real data from a regression.</p> <p>We'll use the same structure - four functions with obvious names. The first time through, we'll place circles using the straight values as their x and y positions in the upper left quadrant, then for each of the other functions we'll use different scales to slide them around inside each following quadrant. I've added lines to help distinguish the quadrants.</p> <div id='test2'></div> <script> var width = 200; var height = 200; var duration = 1000; var delay = 1000; var data = [15, 38, 67, 85]; var test2 = d3.select("#test2").append("svg") .attr("width", width) .attr("height", height); var color_scale = d3.scale.ordinal() .domain([0, 3]) .range(["darkgoldenrod", "firebrick", "navajowhite", "slategrey"]); test2.append("line") .attr("x1", 100) .attr("y1", 0) .attr("x2", 100) .attr("y2", 200) .attr("stroke", "#bbb"); test2.append("line") .attr("x1", 0) .attr("y1", 100) .attr("x2", 200) .attr("y2", 100) .attr("stroke", "#bbb"); var g = test2.selectAll("g") .data(data) .enter().append("g") .attr("class", "object"); g.each(function(d, i) { var o = d3.select(this); o.append("circle") .attr("r", 15) .attr("cx", d) .attr("cy", d) .attr("fill-opacity", ".80") .attr("fill", color_scale(i)); }); setTimeout(move_right2, duration); function move_right2() { var x = d3.scale.linear() .domain([0, 100]) .range([70, 30]); var c = test2.selectAll(".object"); c.each(function(d, i) { var o = d3.select(this); o.select("circle").transition() .duration(duration) .attr("cx", x(d)) .attr("cy", x(d)) .attr("transform", "translate(100, 0)"); }); setTimeout(move_down2, delay + duration); } function move_down2() { var x = d3.scale.linear() .domain([0, 100]) .range([10, 90]); var c = test2.selectAll(".object"); c.each(function(d, i) { var o = d3.select(this); o.select("circle").transition() .duration(duration) .attr("cx", x(d)) .attr("cy", x(d)) .attr("transform", "translate(100, 100)"); }); setTimeout(move_left2, delay + duration); } function move_left2() { var x = d3.scale.linear() .domain([0, 100]) .range([60, 40]); var c = test2.selectAll(".object"); c.each(function(d, i) { var o = d3.select(this); o.select("circle").transition() .duration(duration) .attr("cx", x(d)) .attr("cy", x(d)) .attr("transform", "translate(0, 100)"); }); setTimeout(move_up2, delay + duration); } function move_up2() { var x = d3.scale.linear() .domain([0, 100]) .range([0, 100]); var c = test2.selectAll(".object"); c.each(function(d, i) { var o = d3.select(this); o.select("circle").transition() .duration(duration) .attr("cx", x(d)) .attr("cy", x(d)) .attr("transform", "translate(0, 0)"); }); setTimeout(move_right2, delay + duration); } </script> <p>This works pretty well once you are clear about the scope of the object you want to operate on. At first we create a set of svg <code>g</code> <a href="https://developer.mozilla.org/en-US/docs/Web/SVG/Element/g">group objects</a>, and place <code>circle</code>s inside of each:</p> <div class="highlight"><pre><span class="kd">var</span> <span class="nx">g</span> <span class="o">=</span> <span class="nx">test2</span><span class="p">.</span><span class="nx">selectAll</span><span class="p">(</span><span class="s2">&quot;g&quot;</span><span class="p">)</span> <span class="p">.</span><span class="nx">data</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span> <span class="p">.</span><span class="nx">enter</span><span class="p">().</span><span class="nx">append</span><span class="p">(</span><span class="s2">&quot;g&quot;</span><span class="p">)</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;class&quot;</span><span class="p">,</span> <span class="s2">&quot;object&quot;</span><span class="p">);</span> <span class="nx">g</span><span class="p">.</span><span class="nx">each</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">d</span><span class="p">,</span> <span class="nx">i</span><span class="p">)</span> <span class="p">{</span> <span class="kd">var</span> <span class="nx">o</span> <span class="o">=</span> <span class="nx">d3</span><span class="p">.</span><span class="nx">select</span><span class="p">(</span><span class="k">this</span><span class="p">);</span> <span class="nx">o</span><span class="p">.</span><span class="nx">append</span><span class="p">(</span><span class="s2">&quot;circle&quot;</span><span class="p">)</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;r&quot;</span><span class="p">,</span> <span class="mi">15</span><span class="p">)</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;cx&quot;</span><span class="p">,</span> <span class="nx">d</span><span class="p">)</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;cy&quot;</span><span class="p">,</span> <span class="nx">d</span><span class="p">)</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;fill-opacity&quot;</span><span class="p">,</span> <span class="s2">&quot;.80&quot;</span><span class="p">)</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;fill&quot;</span><span class="p">,</span> <span class="nx">color_scale</span><span class="p">(</span><span class="nx">i</span><span class="p">));</span> <span class="p">});</span> </pre></div> <p>This sets us up with the basic set of "data points" we'll move around. Then we just start firing up transitions like before, using <code>setTimeout()</code>, but with each move function doing a little more:</p> <div class="highlight"><pre><span class="kd">function</span> <span class="nx">move_left2</span><span class="p">()</span> <span class="p">{</span> <span class="kd">var</span> <span class="nx">x</span> <span class="o">=</span> <span class="nx">d3</span><span class="p">.</span><span class="nx">scale</span><span class="p">.</span><span class="nx">linear</span><span class="p">()</span> <span class="p">.</span><span class="nx">domain</span><span class="p">(</span><span class="cp">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">100</span><span class="cp">]</span><span class="p">)</span> <span class="p">.</span><span class="nx">range</span><span class="p">(</span><span class="cp">[</span><span class="mi">60</span><span class="p">,</span> <span class="mi">40</span><span class="cp">]</span><span class="p">);</span> <span class="kd">var</span> <span class="nx">c</span> <span class="o">=</span> <span class="nx">test2</span><span class="p">.</span><span class="nx">selectAll</span><span class="p">(</span><span class="s2">&quot;.object&quot;</span><span class="p">);</span> <span class="nx">c</span><span class="p">.</span><span class="nx">each</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">d</span><span class="p">,</span> <span class="nx">i</span><span class="p">)</span> <span class="p">{</span> <span class="kd">var</span> <span class="nx">o</span> <span class="o">=</span> <span class="nx">d3</span><span class="p">.</span><span class="nx">select</span><span class="p">(</span><span class="k">this</span><span class="p">);</span> <span class="nx">o</span><span class="p">.</span><span class="nx">select</span><span class="p">(</span><span class="s2">&quot;circle&quot;</span><span class="p">).</span><span class="nx">transition</span><span class="p">()</span> <span class="p">.</span><span class="nx">duration</span><span class="p">(</span><span class="nx">duration</span><span class="p">)</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;cx&quot;</span><span class="p">,</span> <span class="nx">x</span><span class="p">(</span><span class="nx">d</span><span class="p">))</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;cy&quot;</span><span class="p">,</span> <span class="nx">x</span><span class="p">(</span><span class="nx">d</span><span class="p">))</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;transform&quot;</span><span class="p">,</span> <span class="s2">&quot;translate(0, 100)&quot;</span><span class="p">);</span> <span class="p">});</span> <span class="nx">setTimeout</span><span class="p">(</span><span class="nx">move_up2</span><span class="p">,</span> <span class="nx">delay</span> <span class="o">+</span> <span class="nx">duration</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>First we define a new scale for each quadrant, changing the output range; in this case, it reverses the ordering, and places the circles in a narrow band just 20 pixels wide. Next, we select the <code>.object</code>s we created, which pulls up those <code>g</code>s we started with, then loops through the set of them, firing off a transition the moves the <code>cx</code> and <code>cy</code> of each according to the new scale, and also resets the coordinate space to each quadrant in turn.</p> <h4>Simulating a regression</h4> <p>We'll use a more substantial dataset when we put it all together, but for now let's assemble a small dataset and sketch a fit plot and residual plot transitioning back and forth. I've made up some values and used R to generate a regression (<code>d</code> is just the same data as in the javascript below):</p> <div class="highlight"><pre><span class="n">Call</span><span class="o">:</span> <span class="n">lm</span><span class="o">(</span><span class="n">formula</span> <span class="o">=</span> <span class="n">d</span> <span class="o">~</span> <span class="n">seq</span><span class="o">(</span><span class="mi">0</span><span class="o">,</span> <span class="mi">10</span><span class="o">))</span> <span class="n">Coefficients</span><span class="o">:</span> <span class="o">(</span><span class="n">Intercept</span><span class="o">)</span> <span class="n">seq</span><span class="o">(</span><span class="mi">0</span><span class="o">,</span> <span class="mi">10</span><span class="o">)</span> <span class="mf">18.000</span> <span class="mf">9.036</span> </pre></div> <p>We can use this regression line to get a feel for transitioning more elements together, and for some of the extra elements we'll want to add to make things pop a bit.</p> <div id='sim'></div> <script> var width = 400; var height = 400; var duration = 1000; var delay = 1000; var data = [15, 22, 34, 53, 48, 60, 95, 79, 88, 109, 92]; var slope = 9.036; var intercept = 18; function expected(index) { return (slope * index) + intercept; }; var sim = d3.select("#sim").append("svg") .attr("width", width) .attr("height", height); var padding = 20; var x = d3.scale.linear() .domain([0, data.length]) .range([padding, width - padding]); var y = d3.scale.linear() .domain([d3.min(data), d3.max(data)]) .range([height - padding, padding]); var g = sim.selectAll("g") .data(data) .enter().append("g") .attr("class", "object"); sim.append("line") .attr("id", "line") .attr("x1", x(0)) .attr("y1", y(intercept)) .attr("x2", x(11)) .attr("y2", y(expected(11))) .attr("stroke-width", 2) .attr("stroke", "steelblue"); g.each(function(d, i) { var o = d3.select(this); o.attr("class", "observation"); o.append("line") .attr("x1", x(i)) .attr("y1", y(d)) .attr("x2", x(i)) .attr("y2", y(expected(i))) .attr("stroke-width", 2) .attr("stroke", "gray"); o.append("circle") .attr("r", 5) .attr("cx", x(i)) .attr("cy", y(d)) .attr("stroke", "black") .attr("fill", "darkslategrey"); }); setTimeout(residual, delay); function fit() { line = sim.select("#line").transition() .duration(duration) .attr("y1", y(intercept)) .attr("y2", y(expected(11))); var c = sim.selectAll(".observation"); c.each(function(d, i) { var o = d3.select(this); o.transition() .duration(duration) .attr("transform", "translate(0, 0)"); }); setTimeout(residual, delay + duration); }; function residual() { line = sim.select("#line").transition() .duration(duration) .attr("y1", height/2) .attr("y2", height/2); var c = sim.selectAll(".observation"); c.each(function(d, i) { var o = d3.select(this); o.transition() .duration(duration) .attr("transform", "translate(0, " + (200 - y(expected(i))) + ")"); }); setTimeout(fit, delay + duration); }; </script> <p>For the regression, we apply the results R gave us to define the slope, intercept, and a function that returns expected values from the model:</p> <div class="highlight"><pre><span class="kd">var</span> <span class="nx">slope</span> <span class="o">=</span> <span class="mf">9.036</span><span class="p">;</span> <span class="kd">var</span> <span class="nx">intercept</span> <span class="o">=</span> <span class="mi">18</span><span class="p">;</span> <span class="kd">function</span> <span class="nx">expected</span><span class="p">(</span><span class="nx">index</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="p">(</span><span class="nx">slope</span> <span class="o">*</span> <span class="nx">index</span><span class="p">)</span> <span class="o">+</span> <span class="nx">intercept</span><span class="p">;</span> <span class="p">};</span> </pre></div> <p>This function lets us put in an index number for a data value and get back what the model expects the data value to be. We can then use this whenever we need to plot the residual, here in the original rendering of the data points and residual lines against the model:</p> <div class="highlight"><pre><span class="nx">g</span><span class="p">.</span><span class="nx">each</span><span class="p">(</span><span class="kd">function</span><span class="p">(</span><span class="nx">d</span><span class="p">,</span> <span class="nx">i</span><span class="p">)</span> <span class="p">{</span> <span class="kd">var</span> <span class="nx">o</span> <span class="o">=</span> <span class="nx">d3</span><span class="p">.</span><span class="nx">select</span><span class="p">(</span><span class="k">this</span><span class="p">);</span> <span class="nx">o</span><span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;class&quot;</span><span class="p">,</span> <span class="s2">&quot;observation&quot;</span><span class="p">);</span> <span class="nx">o</span><span class="p">.</span><span class="nx">append</span><span class="p">(</span><span class="s2">&quot;line&quot;</span><span class="p">)</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;x1&quot;</span><span class="p">,</span> <span class="nx">x</span><span class="p">(</span><span class="nx">i</span><span class="p">))</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;y1&quot;</span><span class="p">,</span> <span class="nx">y</span><span class="p">(</span><span class="nx">d</span><span class="p">))</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;x2&quot;</span><span class="p">,</span> <span class="nx">x</span><span class="p">(</span><span class="nx">i</span><span class="p">))</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;y2&quot;</span><span class="p">,</span> <span class="nx">y</span><span class="p">(</span><span class="nx">expected</span><span class="p">(</span><span class="nx">i</span><span class="p">)))</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;stroke-width&quot;</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;stroke&quot;</span><span class="p">,</span> <span class="s2">&quot;gray&quot;</span><span class="p">);</span> <span class="nx">o</span><span class="p">.</span><span class="nx">append</span><span class="p">(</span><span class="s2">&quot;circle&quot;</span><span class="p">)</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;r&quot;</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;cx&quot;</span><span class="p">,</span> <span class="nx">x</span><span class="p">(</span><span class="nx">i</span><span class="p">))</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;cy&quot;</span><span class="p">,</span> <span class="nx">y</span><span class="p">(</span><span class="nx">d</span><span class="p">))</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;stroke&quot;</span><span class="p">,</span> <span class="s2">&quot;black&quot;</span><span class="p">)</span> <span class="p">.</span><span class="nx">attr</span><span class="p">(</span><span class="s2">&quot;fill&quot;</span><span class="p">,</span> <span class="s2">&quot;darkslategrey&quot;</span><span class="p">);</span> <span class="p">});</span> </pre></div> <p>The line is vertical, so the x-scale places it horizontally using the index number. The vertical line segment representing the residual error starts at the actual value <code>d</code> and ends at the expected value <code>expected(i)</code>, with both adjusted to the y-scale using <code>y()</code>. Then, in the residual view/function, we just have to rotate the model line to "level" (<code>height/2</code>) and translate the <code>g</code>-wrapped residual line and data point to level minus the y-scale-adjusted expected value from the model:</p> <div class="highlight"><pre><span class="na">.attr</span><span class="p">(</span><span class="s">&quot;transform&quot;</span><span class="p">,</span> <span class="s">&quot;translate(0, &quot;</span> <span class="err">+</span> <span class="p">(</span><span class="mi">200</span> <span class="p">-</span> <span class="no">y</span><span class="p">(</span><span class="no">expected</span><span class="p">(</span><span class="no">i</span><span class="p">)))</span> <span class="err">+</span> <span class="s">&quot;)&quot;</span><span class="p">)</span><span class="err">;</span> </pre></div> <p>And when we switch back to the "fit" view, we just translate them back again to <code>(0, 0)</code>, and rotate the model line back to the original regression slope.</p> <p>This feels like a good stopping point for today. Next time, we'll pick up from here, add the additional diagnostic plots, and fill out each stage with axes and other niceties as appropriate.</p>Albers color studies in D3.js, part 22014-09-04T00:00:00-04:00dchudtag:data.onebiglibrary.net,2014-09-04:2014/09/04/albers-color-studies-part-2/<p>(See also part one, <a href="http://data.onebiglibrary.net/2014/08/08/simple-color-relationships/">simple color relationships w/d3</a>.)</p> <p>Picking up where we left off, in the middle of Josef Albers' <a href="http://yupnet.org/interactionofcolor/">Interaction of Color</a> (Yale Press's iPad edition), his study of the "middle mixture" affords a chance to bring in <a href="http://d3js.org/">D3.js</a> support for animations and transitions.</p> <p>In this study, Albers chooses a trio of colors where the middle is a mixture in the middle of the other two. He recommends sliding the lowest part up slowly, so we can observe how the increased ratio of the darker color draws out how that darker color contributes to the mix, and then as you slide it back away again, you can see the top (lighter) color come through in the middle mixture. Concentrate on the middle block as the lower one moves up and down, and you can also see an illusory gradient effect near the top and bottom.</p> <div id='middle'></div> <script> var width = 450, height = 720; var svg = d3.select("#middle").append("svg") .attr("width", width) .attr("height", height); var color_light = '#F5F57F'; var color_middle = '#C4BF7E'; var color_dark = '#918763'; // top block, light var block_top = svg.append("rect") .attr("x", 0) .attr("y", 0) .attr("width", width) .attr("height", height / 3) .attr("fill", color_light); // middle block var block_middle = svg.append("rect") .attr("x", 0) .attr("y", 240) .attr("width", width) .attr("height", height / 3) .attr("fill", color_middle); // bottom block, dark var block_bottom = svg.append("rect") .attr("x", 0) .attr("y", 480) .attr("width", width) .attr("height", height / 3) .attr("fill", color_dark); var animate = function() { block_bottom .transition() .delay(1000) .duration(5000) .attr("y", 280) .ease("quad-in-out") .transition() .duration(5000) .attr("y", 480) .ease("quad-in-out") .each("end", animate); }; animate(); </script> <hr /> <p>This study demonstrates how varying the quantity of each color present affects the relationships between colors and the overall feeling of a design even when the structure isn't altered in any other way. All of these use the same four colors and the same overall shape.</p> <div id='juxtaposition'></div> <script> var width = 630, height = 600; var svg = d3.select("#juxtaposition").append("svg") .attr("width", width) .attr("height", height); var xpad = 40; var ypad = 30; var xscale = d3.scale.linear() .domain([0, 4]) .range([0, width - (xpad * 3)]); var yscale = d3.scale.linear() .domain([0, 6]) .range([0, height - (ypad * 5)]); // colors var pink = '#DEBAD0'; var grey = '#C4C2D1'; var red = '#C95B44'; var green = '#526B5C'; var colors = [pink, grey, red, green]; var interiors = [ // pink column [[red, grey, green], [red, green, grey], [grey, green, red], [grey, red, green], [green, grey, red], [green, red, grey]], // grey column [[pink, green, red], [pink, red, green], [red, pink, green], [red, green, pink], [green, red, pink], [green, pink, red]], // red column [[pink, grey, green], [pink, green, grey], [grey, pink, green], [grey, green, pink], [green, grey, pink], [green, pink, grey]], // green column [[pink, grey, red], [pink, red, grey], [grey, red, pink], [grey, pink, red], [red, pink, grey], [red, grey, pink]] ]; colors.forEach(function(ce, ci, ca) { d3.range(0, 6).forEach(function(ye, yi, ya) { // outer box svg.append("rect") .attr("x", xscale(ci)) .attr("y", yscale(yi)) .attr("width", 100) .attr("height", 63) .attr("fill", ce); svg.append("rect") .attr("x", xscale(ci) + 10) .attr("y", yscale(yi) + 12) .attr("width", 80) .attr("height", 48) .attr("fill", interiors[ci][yi]); svg.append("rect") .attr("x", xscale(ci) + 17) .attr("y", yscale(yi) + 18) .attr("width", 66) .attr("height", 24) .attr("fill", interiors[ci][yi]); svg.append("rect") .attr("x", xscale(ci) + 23) .attr("y", yscale(yi) + 22) .attr("width", 54) .attr("height", 18) .attr("fill", interiors[ci][yi]); }); }); </script> <p>Each has its own distinct feel, right? Taken together they seem to dance chaotically, and it's not particularly pleasant, but its goal is instructive, of course, not aesthetic. Albers suggests using sheets of paper or your hands to block out smaller sets to look at in turn: a row, a column, etc., and considering which combinations are your favorite and why.</p> <p>In writing this one up I waffled between writing a routine to generate the color permutations and laying them out explicitly like the example in the book, and I ended up matching the book explicitly. The rest of these exercises have tried to match the book closely, so it seemed okay to just iterate over an array of arrays that had been lined up by hand. I also did a lot of pixel-nudging to get the boxes to line up just so (hence punting on fixing the extra white space at the bottom).</p> <hr /> <p>This next study is a similar look at color mixture. The individual lines can be laid out with scaling easily enough with D3, but to make them look uneven/wobbly is a bit of a challenge.</p> <div id='wobbly'></div> <script> var width = 420, height = 720; var svg = d3.select("#wobbly").append("svg") .attr("width", width) .attr("height", height); var padding = 10; var orange = '#D46D42'; var violet = '#A29BD1'; var grey = '#DCE3E1'; var green = '#476E51'; var colors = [green, orange, violet, grey]; var xscale = d3.scale.linear() .domain([0, 18]) .range([padding, width-(padding * 2)]); var yscale = d3.scale.linear() .domain([0, 4]) .range([padding, height-(padding * 2)]); var skewscale = d3.scale.linear() .domain([0, 1]) .range([-1, 1]); var skewer = function() { return skewscale(Math.random()); }; var sizescale = d3.scale.linear() .domain([0, 1]) .range([.96, 1.04]); var scaler = function() { return sizescale(Math.random()); }; var rotatescale = d3.scale.linear() .domain([0, 1]) .range([-2, 2]); var rotater = function() { return rotatescale(Math.random()); }; // backgrounds colors.forEach(function(ce, ci, ca) { svg.append("rect") .attr("x", padding) .attr("y", yscale(ci)) .attr("width", width - (padding * 2)) .attr("height", (height - (padding * 2)) / 4) .attr("fill", ce); }); var transformer = function() { return "scale(" + scaler() + ") rotate(" + rotater() + ")"; // skewX(" + skewer() + ") skewY(" + skewer() + ")"; }; [violet, grey, green, orange].forEach(function(ce, ci, ra) { svg.append("rect") .attr("x", 0) .attr("y", yscale(ci)) .attr("width", width / 17 - 8) .attr("height", height / 4) .attr("fill", ce); }); [grey, green, orange, violet].forEach(function(ce, ci, ra) { svg.append("rect") .attr("x", xscale(18)) .attr("y", yscale(ci)) .attr("width", width / 17 - 8) .attr("height", height / 4) .attr("fill", ce); }); [green, grey, violet, orange].forEach(function(ce, ci, ra) { d3.range(0, 18).forEach(function(re, ri, ra) { svg.append("rect") .attr("x", 0) .attr("y", 0) .attr("transform", "translate(" + (xscale(re) + 2) + ", " + yscale(3 - ci) + ") scale(" + scaler() + ") skewX(" + skewer() + ") skewY(" + skewer() + ") rotate(" + rotater() + ", " + width/36 + ", " + height/8 + ")") .attr("width", width / 17 - 8) .attr("height", height / 4) .attr("fill", ce); }) }); </script> <p>This mostly recreates the effect of the study in the book but is unsatisfying on a few counts. The "wobble" of the individual patches is decent, but they should scale and skew a little more. The ordering of the stacking throws off the effect, and the y-skew is a little too great. Perhaps the biggest issue is the use of <code>translate()</code> to locate each strip in its place sets the top-left <code>(x,y)</code> to too fixed of a point; it needs to vary more. There is a lot going on in the <a href="https://developer.mozilla.org/en-US/docs/Web/SVG/Attribute/transform">SVG transform attributes</a> that I don't fully understand, largely around the shift in coordinate systems, and that's holding me back from developing the right approach to skewing and placing each strip correctly. I'll have to revisit this to wrap my head around it more fully.</p> <p>This is definitely the most disappointing recreation of studies from the book I've done so far.</p> <hr /> <h4>The Weber-Fechner Law</h4> <p>Wikipedia points out <a href="http://en.wikipedia.org/wiki/Weber%E2%80%93Fechner_law">discrepancies in the term "Weber-Fechner law"</a> but as that's how Albers referred to the difference between the quantitative and perceptual effect of layering color: linear additions seem to lead to logarithmic effects, and exponential additions seem to lead to linear effects. In the book this study uses translucence, so I'll stick with SVG's opacity support to recreate it.</p> <p>Arithmetic increases in application of color here, by way of stacking, lead to only slight shifts in the perceived color.</p> <div id='yellow-stack'></div> <script> var width = 600, height = 600; var svg = d3.select("#yellow-stack").append("svg") .attr("width", width) .attr("height", height); var yellow = '#D7E650'; var opacity = '0.75'; // two horizontals svg.append("rect") .attr("x", 0) .attr("y", 300) .attr("width", 500) .attr("height", 175) .attr("fill-opacity", opacity) .attr("fill", yellow); svg.append("rect") .attr("x", 50) .attr("y", 220) .attr("width", 500) .attr("height", 175) .attr("fill-opacity", opacity) .attr("fill", yellow); // two verticals svg.append("rect") .attr("x", 100) .attr("y", 50) .attr("width", 175) .attr("height", 500) .attr("fill-opacity", opacity) .attr("fill", yellow); svg.append("rect") .attr("x", 180) .attr("y", 130) .attr("width", 175) .attr("height", 500) .attr("fill-opacity", opacity) .attr("fill", yellow); </script> <p>Ah, this recreates the effect of the study in the book much more effectively than the previous one (a relief). With 75% <code>fill-opacity</code> we can trace the distinct shades of color as 2, 3, and 4 patches are overlaid in different spots. The difference from one to two is much greater than the difference from three to four.</p> <hr /> <p>This next study repeats a similar process, showing off the difference between linear and exponential layer addition. At left, each succeding strip from top to bottom has one additional layer beyond the one above it; at right, the difference is a power of two. So at left, it is {1, 2, 3, 4, 5} layers, and at right, {1, 2, 4, 8, 16}.</p> <div id='red-stacks'></div> <script> var width = 600, height = 600; var svg = d3.select("#red-stacks").append("svg") .attr("width", width) .attr("height", height); var red = '#871315'; var black = '#000'; var opacity = '0.12'; // left svg.append("rect") .attr("x", 0) .attr("y", 0) .attr("width", 260) .attr("height", height) .attr("fill", red); // right svg.append("rect") .attr("x", 340) .attr("y", 0) .attr("width", 260) .attr("height", height) .attr("fill", red); // layer two, just like the first, but smaller // left svg.append("rect") .attr("x", 0) .attr("y", 120) .attr("width", 260) .attr("height", height) .attr("fill-opacity", opacity) .attr("fill", black); // right svg.append("rect") .attr("x", 340) .attr("y", 120) .attr("width", 260) .attr("height", height) .attr("fill-opacity", opacity) .attr("fill", black); // layer three, repeating on right // left svg.append("rect") .attr("x", 0) .attr("y", 240) .attr("width", 260) .attr("height", height) .attr("fill-opacity", opacity) .attr("fill", black); // right d3.range(0, 2).forEach(function(e, i, a) { svg.append("rect") .attr("x", 340) .attr("y", 240) .attr("width", 260) .attr("height", height) .attr("fill-opacity", opacity) .attr("fill", black); }); // layer four, repeating on right // left svg.append("rect") .attr("x", 0) .attr("y", 360) .attr("width", 260) .attr("height", height) .attr("fill-opacity", opacity) .attr("fill", black); // right d3.range(0, 4).forEach(function(e, i, a) { svg.append("rect") .attr("x", 340) .attr("y", 360) .attr("width", 260) .attr("height", height) .attr("fill-opacity", opacity) .attr("fill", black); }); // layer five, repeating on right // left svg.append("rect") .attr("x", 0) .attr("y", 480) .attr("width", 260) .attr("height", height) .attr("fill-opacity", opacity) .attr("fill", black); // right d3.range(0, 8).forEach(function(e, i, a) { svg.append("rect") .attr("x", 340) .attr("y", 480) .attr("width", 260) .attr("height", height) .attr("fill-opacity", opacity) .attr("fill", black); }); </script> <p>I misread this one, getting it completely wrong at first. The ground is a red, and the added layers are blacks, using SVG <code>fill-opacity</code>. I had thought at first that the layers were all red, but the part at right never converged to black until I re-read that they are indeed black layers added on top, on both sides.</p> <p>I haven't been able to recreate the subtlety of the shift to barely imperceptable on the left, but this is fairly close.</p> <hr /> <p>This final study is a look at near-equality of light intensity, the difficulty of choosing examples for which Albers warns us carefully. If chosen correctly, when the two saw-tooths come together, two colors with similar light intensity should start to blend into each other, even though they are dissimilar otherwise.</p> <div id='sawtooth'></div> <script> var width = 300, height = 600; var svg = d3.select("#sawtooth").append("svg") .attr("width", width) .attr("height", height); var c1 = '#D9C1A0'; var c2 = '#E3BABA'; var y = d3.scale.linear() .domain([0, 6]) .range([0, height]); d3.range(1, 6).forEach(function(e, i, a) { svg.append("rect") .attr("class", "left") .attr("x", 70) .attr("y", 0) .attr("width", 80) .attr("height", 100) .attr("transform", "translate(0, " + y(e) + ") skewX(-10)") .attr("fill", c1); }); d3.range(0, 5).forEach(function(e, i, a) { svg.append("rect") .attr("class", "right") .attr("x", 168) .attr("y", 0) .attr("width", 80) .attr("height", 100) .attr("transform", "translate(0, " + y(e) + ") skewX(-10)") .attr("fill", c2); }); var animate = function() { var left = svg.selectAll('.left'); left.transition() .delay(1000) .duration(3000) .attr("x", 79) .ease("quad-in-out") .transition() .duration(3000) .attr("x", 70) .ease("quad-in-out"); var right = svg.selectAll('.right'); right.transition() .delay(1000) .duration(3000) .attr("x", 157) .ease("quad-in-out") .transition() .duration(3000) .attr("x", 168) .ease("quad-in-out") .each("end", animate); }; animate(); </script> <p>This works nicely - for that brief instant when the two sides touch it seems like the sawtooth pattern at their mutual boundary disappears and the colors start to merge. </p> <p>This has been a great exercise, both in learning about color relativity and digging deeper into the basics of D3. Just makes me want to do more. There a lot of code in the studies I replicated here and in part one that could be much clearner, but I gave up on writing clean code in service of getting it done and keeping things simple. In future posts I'll be working with real datasets more often than not, and cleaner code will always help there. I aimed for staying true to the exact studies in the book, too, to have a target to aim towards, rather than taking the opportunity to do the exercises for myself, finding colors that would be a good match, because I wanted to learn about D3 at the same time, and reproduction is easier than original work. The app version of the book allows for creating your own studies, and I've played around with that some, so I don't feel like I'm missing out too much.</p> <p>I hope you'll stay tuned, it feels like it's just getting started.</p>7±2 things to know about data science2014-08-12T00:00:00-04:00dchudtag:data.onebiglibrary.net,2014-08-12:2014/08/12/things-to-know-about-data-science/<p><em>For a talk given at code4lib DC 2014.</em></p> <h2>Background</h2> <p>I am a professional librarian and software developer with 17 years in the job post master's. I studied at a strong school, worked at some great institutions and worked with many great people and between all this I've learned a lot about being a hacker / librarian, enough that the good people at GW Libraries saw fit to hire me to manage a team exactly three years ago.</p> <p>I am a student of data science, halfway through a two-year program at <a href="http://gwanalytics.org/">GW School of Business</a>. So far I have learned enough to understand a fair amount about what it is I need to be able to do to apply data science, but I am not yet very good at doing that.</p> <p>As a manager in tech in a research library, my job is to work to ensure that our team and our library do meaningful work well, reliably. I intend to develop my professional skill at working with data to meet this same goal: do meaningful work reliably well. With that in mind, I have a rough sense of what librarian and archivist colleagues might need to know about data science means, but I still have an awful lot to learn.</p> <h2>Defining "data science" and "business analytics"</h2> <p>Like many aspects of data science, this is best communicated visually.</p> <p>Here is a canonical industry view of required skills many of us like:</p> <p><img alt="Data Science Venn Diagram" src="http://static.squarespace.com/static/5150aec6e4b0e340ec52710a/t/51525c33e4b0b3e0d10f77ab/1364352052403/Data_Science_VD.png?format=1500w" /> </p> <p><cite>by Drew Conway, see <a href="http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram">http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram</a></cite></p> <p>A layer-cake view of analytics tasks that is also helpful:</p> <p><img alt="Categories of Analytics" src="http://www.theorsociety.com/Media/Images/Users/CaraQuinton01011978/ActualSize/17_05_2012-16_06_56.jpg" /> </p> <p><cite>by Gavin Blackett, see <a href="http://www.theorsociety.com/Media/Images/Users/CaraQuinton01011978/ActualSize/17_05_2012-16_06_56.jpg">http://www.theorsociety.com/Media/Images/Users/CaraQuinton01011978/ActualSize/17_05_2012-16_06_56.jpg</a></cite></p> <p>The questions we ask at these levels, phrased simply:</p> <p><img alt="Types of Business Analytics Capabilities" src="http://www.morganfranklin.com/website/assets/uploads/weblog/_658_370/4TypesofBusinessAnalyticsCapabilities_658x370px.jpg" /></p> <p><cite>by MorganFranklin Consulting, see <a href="http://www.morganfranklin.com/insights/article/4-types-of-business-analytics-capabilities">http://www.morganfranklin.com/insights/article/4-types-of-business-analytics-capabilities</a></cite></p> <p>Lisa Kurt defined a helpful paradigm for this view at code4lib 2012 in Seattle:</p> <p><img alt="DIPP Framework" src="http://www.mu-sigma.com/analytics/images/dipp.png" /></p> <p><cite>by Mu Sigma, see <a href="http://www.mu-sigma.com/analytics/ecosystem/dipp.html">http://www.mu-sigma.com/analytics/ecosystem/dipp.html</a></cite></p> <p>Most simply, we can speak of data science as the application of statistics to support decisions, to understand patterns in data, and to reduce or at least clarify uncertainty in a wide range of domains.</p> <h2>Applying data science</h2> <p>From where I sit (halfway through a degree program) the ability to apply data science techniques meaningfully comes down to something more like this:</p> <div id='skill-venn'></div> <script> var width = 800, height = 600; var svg = d3.select("#skill-venn").append("svg") .attr("width", width) .attr("height", height); svg.append("svg:circle") .attr("cx", 300) .attr("cy", 200) .attr("r", 200) .style("fill", "#1b9e77") .style("fill-opacity", ".5"); svg.append("svg:circle") .attr("cx", 500) .attr("cy", 200) .attr("r", 200) .style("fill", "#d95f02") .style("fill-opacity", ".5"); svg.append("svg:circle") .attr("cx", 400) .attr("cy", 400) .attr("r", 200) .style("fill", "#7570b3") .style("fill-opacity", ".5"); svg.append("svg:text") .attr("x", 160) .attr("y", 160) .style("font-size", "36px") .style("fill", "black") .text("Science"); svg.append("svg:text") .attr("x", 550) .attr("y", 160) .style("font-size", "36px") .style("fill", "black") .text("Skill"); svg.append("svg:text") .attr("x", 300) .attr("y", 480) .style("font-size", "36px") .style("fill", "black") .text("Good sense"); </script> <p>Can you identify the Danger Zone?</p> <p>I am weak on Science, but improving. I am confident in the hacking side of my Skill, but not yet in applying statistical models. I would like to believe I have good sense, but there is an art to applying it here.</p> <h2>Asking the right questions</h2> <p>Much of this work comes down to knowing which questions to ask and being steadfast in attempting to answer them honestly:</p> <ul> <li>What goal do we wish to achieve?</li> <li>What data do we have to work with?</li> <li>What gaps in data do we need to fill, and how can we fill them?</li> <li>What assumptions are we working under, and are they acceptable?</li> <li>Which of the many available models fits our data and goals well?</li> <li>What bias is inherent in our data, and what bias are we introducing?</li> <li>With what level of certainty can we make a claim?</li> </ul> <p>It is particularly risky to learn one model (e.g. linear regression) and one tool (e.g. R) and take whatever data you have and only ever attempt linear regressions with R without asking and answering these other questions. It's not about R (or SAS or Python or SPSS or Julia or Stata or Excel or ...) being a magic tool, and linear regression might be a poor fit for your data.</p> <h2>Data context switching, aka munging</h2> <blockquote> <p><strong>There is no such thing as a clean data set.</strong></p> </blockquote> <p>This is important.</p> <p>Any data you start with will have been collected or prepared with a particular purpose. That purpose might or might not have anything to do with your goals. You will most likely need to reframe the data you start with to fit your needs. This might involve ETL pipeline processing, recontextualizing, extracting, summarizing, merging, splitting, and otherwise reshaping data.</p> <p>Any decent data person will need to become proficient at some or all of these tasks.</p> <p>There are even style guides for data, such as Hadley Wickham's <a href="http://vita.had.co.nz/papers/tidy-data.pdf">Tidy Data</a>, which proposed the following principles of tidiness:</p> <blockquote> <ul> <li>Each variable forms a column.</li> <li>Each observation forms a row.</li> <li>Each type of observational unit forms a table.</li> </ul> </blockquote> <p>Some tools have their own preferences; in SAS, some procedures like "wide form" (each variable a column) and others "long form" (variable names parameterized). All the more reason to develop munging skills.</p> <p>Data munging is often the most time-consuming part of statistics work.</p> <p>Sound familiar?</p> <h2>Applying models</h2> <p>There are many, many types of models. Different models can be used for different tasks as shown in the diagram above. Some, like simple regression, are widely applicable and easy to understand. Many are narrowly applicable and hard to understand, but prove to be far more effective for certain use cases.</p> <p>Work with most models require similar steps:</p> <ul> <li>Prep sample data to apply the model</li> <li>Use 2-3 visualizations to explore the data</li> <li>Re-munge data to apply the model</li> <li>Run the model, evaluating results</li> <li>Review residuals/errors</li> <li>Check model assumptions, bias</li> <li>Lather, rinse, repeat</li> </ul> <p>After all that, you'll probably want to try all of the above again with another model. Or two or three.</p> <p>Half of understanding a model is understanding what to look for in results, and how to evaluate assumptions and results. It is easy to think you might have a great model, but if you don't know how to evaluate residuals and check basic model assumptions, your work might not be meaningful.</p> <p>Here is an example of what this might look like, using SAS.</p> <p>First, we import data in an attempt to find a relationship between age and weight. The data looks like this (thanks, <a href="https://csvkit.readthedocs.org/en/0.8.0/">csvkit</a>!):</p> <p><img alt="a few lines of data" src="http://data.onebiglibrary.net/images/20140812-things/csvlook.png" /></p> <p>A simple regression offers these results:</p> <p><img alt="regression results" src="http://data.onebiglibrary.net/images/20140812-things/b3-simple-regression-table.png" /></p> <p>Every stats app has a report format like this; SAS likes HTML tables. Important details in here are the p-value of the F test result, the R-Square, and the p-value of the t test on the dependent variable age.</p> <p>The plot is pretty:</p> <p><img alt="regression plot" src="http://data.onebiglibrary.net/images/20140812-things/b3-simple-regression-plot.png" /></p> <p>But we have to review its diagnostics:</p> <p><img alt="regression diagnostics" src="http://data.onebiglibrary.net/images/20140812-things/b3-simple-regression-diag.png" /></p> <p>And check residuals precisely:</p> <p><img alt="regression residuals" src="http://data.onebiglibrary.net/images/20140812-things/b3-simple-regression-residuals-b.png" /></p> <p>To be a good data scientist, you have to work any model through all of these steps, knowing which tests to run on results. Every model has its own characteristics.</p> <h2>Applying tools</h2> <p>In some cases, there are straightforward models that can be applied with straightforward tools. For example, this is a time series of the amount of recent airline travel in the US. In just a few lines of R, you can produce this decomposition of seasonal and trend lines:</p> <p><img alt="time series decomposition" src="http://data.onebiglibrary.net/images/20140812-things/ts-decomp.png" /></p> <p>This is wonderful, but keep in mind that there is always more to the story. The simplest-seeming models and tools often require a lot of subtlety to wield reliably well.</p> <p>For more on this particular example, see <a href="http://a-little-book-of-r-for-time-series.readthedocs.org/en/latest/src/timeseries.html">Using R for Time Series Analysis</a>.</p> <h2>Learning the craft</h2> <p>On this part I'm shaky, but my hunch is that like learning to be a competent programmer, applying meaningful data science reliably well is a craft that takes time to learn. It took me a good five years to learn many basic lessons about programming that no CS prof ever taught me (granted, I don't have a CS degree, but I've taken many of the basic courses in formal settings). After five years, I was a good enough programmer to get a job on a great team, where I really started to learn the craft - all the details you need to attend to if you want to build systems that scale up, and if you want to sustain projects over years, through staff turnover and changing technologies.</p> <p>Like with CS, the science is critical, but it seems like it will take me several years and a lot of repetition to develop the kind of intuitive feel for choosing models, checking assumptions, and explaining results. It's that "good sense" I know I need to strive for, but I don't have it yet. Before I start applying any of this on data in my workplace, I'll be sure to find someone more experienced than me to run ideas by, someone who knows the craft well already.</p> <h2>What can we do to help?</h2> <ul> <li>Fill in gaps on campus</li> <li>Support critical thinking in data selection, munging, and application</li> <li>Encourage a well-rounded view, especially with Ethics</li> <li>Apply our experience with workflows and conventions</li> <li>Learn and apply for ourselves</li> </ul> <h2>What next?</h2> <ul> <li><a href="https://www.khanacademy.org/math/probability">Probability and statistics</a> on Khan Academy</li> <li><a href="https://www.coursera.org/specialization/jhudatascience/1">Johns Hopkins Data Science Specialization</a> on Coursera</li> <li>Leek et al., <a href="https://github.com/jtleek/datasharing">How to share data with a statistician</a></li> <li>Provost and Fawcett, <a href="http://www.data-science-for-biz.com/">Data Science for Business</a></li> </ul>simple color relationships w/d32014-08-08T00:00:00-04:00dchudtag:data.onebiglibrary.net,2014-08-08:2014/08/08/simple-color-relationships/<p>I've been reading Josef Albers' <a href="http://yupnet.org/interactionofcolor/">Interaction of Color</a> (Yale Press's iPad edition) and am learning quite a lot from it. I particularly enjoy his details about what to expect in student reactions to particular exercises; you know he must have anticipated and savored these reactions each time, with every class.</p> <p>The basic principles of the first few chapters should be easy to demonstrate using <a href="http://d3js.org/">d3</a>.</p> <p>In "Chapter IV: A color has many faces" we see the first of several color plates and we are quickly drawn into what he has to teach us about the relativity of color, that "color is the most relative medium in art." Let's mimic the first experiment, making one color look different from itself, using different background colors. I'm guessing (poorly!) at colors somewhat close to those in the prepared studies in the text itself, using <a href="http://www.colorpicker.com/">this color picker</a>.</p> <div id='basic'></div> <script> var width = 700, height = 800; var svg = d3.select("#basic").append("svg") .attr("width", width) .attr("height", height); var outer1 = svg.append("rect") .attr("x", 50) .attr("y", 50) .attr("width", 600) .attr("height", 300) .attr("fill", "#4C0A73"); var inner1 = svg.append("rect") .attr("x", 100) .attr("y", 100) .attr("width", 500) .attr("height", 200) .attr("fill", "#5A6E5E"); var outer2 = svg.append("rect") .attr("x", 50) .attr("y", 450) .attr("width", 600) .attr("height", 300) .attr("fill", "#9DD1CE"); var inner2 = svg.append("rect") .attr("x", 100) .attr("y", 500) .attr("width", 500) .attr("height", 200) .attr("fill", "#5A6E5E"); </script> <p>This use of d3 demonstrates several features which make it an appealing toolkit, even for a beginner:</p> <ul> <li>it's just javascript</li> <li>it's just <a href="http://en.wikipedia.org/wiki/Scalable_Vector_Graphics">SVG</a></li> <li>you can do simple things very simply</li> </ul> <p>I learned SVG many years ago, back in 2004, when it was still a fairly new web standard and had very few useable implementations. The good news is that it is much more widely implemented now, and that it hasn't changed much since back then (there's only been one new revision, a "second edition" of the first version), so if you know a few SVG basics then it's easy to see that d3 just uses an API defined in javascript to generate SVG. This is a <em>lot</em> easier than generating SVG by hand yourself; I know from first-hand experience a decade ago.</p> <p>Another way to think of d3 is as a "domain specific language" for dynamic documents on the web. It's just javascript, but it's a flavor of types and techniques specific to generating SVG using javascript that lends itself well to visualizing data.</p> <p>In any case, this copied "plate" demonstrates the basic principle well: the inner color is exactly the same in both rectangles, and it is the interaction between this color and the differing surrounding / background colors that makes it look different from itself from one to the next.</p> <h4>Changing colors</h4> <p>To make this a little more dynamic (it is the web after all) let's add the ability to change colors by clicking on the inner boxes. The code will be the same, but with the "click" method defined on each.</p> <p>Click on the top inner box to make both inner boxes lighter. Click on the bottom inner box to make both inner boxes darker.</p> <div id='changing-colors'></div> <script> var width = 700, height = 800; var innercolor = "#5A6E5E"; var svg = d3.select("#changing-colors").append("svg") .attr("width", width) .attr("height", height); var outer1 = svg.append("rect") .attr("x", 50) .attr("y", 50) .attr("width", 600) .attr("height", 300) .attr("fill", "#4C0A73"); var inner1 = svg.append("rect") .attr("x", 100) .attr("y", 100) .attr("width", 500) .attr("height", 200) .attr("fill", innercolor) .on("click", function(){ brighten(); }); var outer2 = svg.append("rect") .attr("x", 50) .attr("y", 450) .attr("width", 600) .attr("height", 300) .attr("fill", "#9DD1CE"); var inner2 = svg.append("rect") .attr("x", 100) .attr("y", 500) .attr("width", 500) .attr("height", 200) .attr("fill", innercolor) .on("click", function(){ darken(); }); function brighten () { [inner1, inner2].forEach(function(item) { item.style("fill", d3.hsl(item.style("fill")).brighter(.1)); }); } function darken () { [inner1, inner2].forEach(function(item) { item.style("fill", d3.hsl(item.style("fill")).darker(.1)); }); } </script> <p>This further reinforces the effect; at some points as you click to ratchet the intensity up or down the two inner boxes look like wholly different colors, and at other points (especially the extremes) it is clear that they are the same.</p> <p>Of course this isn't quite what Albers had in mind with the lovely physical interactions designed into his text (which the Yale Press' folks very creatively transposed to the iPad app) but perhaps we can use the dynamic aspect of the web, made so easy by d3, usefully to embody some of the same lessons he taught.</p> <h4>Lighter and/or darker</h4> <p>To focus us in on light intensity, Albers presents several exersizes in subtle and not-so-subtle gradations of light. SVG's gradient support should help to recreate them.</p> <div id='light-stripes'></div> <script> var width = 450, height = 700; var svg = d3.select("#light-stripes").append("svg") .attr("width", width) .attr("height", height); // basic gradient var gradient_up = svg.append("svg:defs") .append("svg:linearGradient") .attr("id", "gradient_up") .attr("x1", "0%") .attr("y1", "0%") .attr("x2", "0%") .attr("y2", "100%"); gradient_up.append("svg:stop") .attr("offset", "0%") .attr("stop-color", "#222") .attr("stop-opacity", 1); gradient_up.append("svg:stop") .attr("offset", "100%") .attr("stop-color", "#ddd") .attr("stop-opacity", 1); // now the opposite; perhaps a transform instead? var gradient_down = svg.append("svg:defs") .append("svg:linearGradient") .attr("id", "gradient_down") .attr("x1", "0%") .attr("y1", "100%") .attr("x2", "0%") .attr("y2", "0%"); gradient_down.append("svg:stop") .attr("offset", "0%") .attr("stop-color", "#222") .attr("stop-opacity", 1); gradient_down.append("svg:stop") .attr("offset", "100%") .attr("stop-color", "#ddd") .attr("stop-opacity", 1); // the frame var outer = svg.append("rect") .attr("x", 0) .attr("y", 0) .attr("width", width) .attr("height", height) .attr("fill", "#888"); // the inner "background" var inner = svg.append("rect") .attr("x", 10) .attr("y", 10) .attr("width", width - 20) .attr("height", height - 20) .style("fill", "url(#gradient_up)"); var bar_width = (width-20) / 19; var x_scale = d3.scale.linear() .domain([0, 18]) .range([10, width - 10 - bar_width]); // the "foreground" for(var i=0; i<19; i++) { if(i % 2) { var barup = svg.append("rect") .attr("x", x_scale(i)) .attr("y", 10) .attr("width", bar_width) .attr("height", height - 20); barup.style("fill", "url(#gradient_down)"); } } </script> <p>This really comes alive as the intensity of the two gradients pass each other on the way up/down. It all seems to merge! And little shadows seem to appear around the frame at the top and bottom just past the strips' ends.</p> <p>The gradients above are explicit. In this next example from Albers, the gradients are illusions.</p> <div id='gradations'></div> <script> var width = 300, height = 700; var svg = d3.select("#gradations").append("svg") .attr("width", width) .attr("height", height); // the frame var outer = svg.append("rect") .attr("x", 0) .attr("y", 0) .attr("width", width) .attr("height", height) .attr("fill", "#888"); var bar_width = (width - 60) / 2; var bar_height = (height - 20) / 17; var y_scale = d3.scale.linear() .domain([0, 16]) .range([height - 20 - bar_height, 20]); var color_scale = d3.scale.linear() .domain([0, 16]) .range(['#222', '#ddd']); // the panels for(var i=0; i<17; i++) { var panel = svg.append("rect") .attr("x", 20) .attr("y", y_scale(i)) .attr("width", bar_width) .attr("height", bar_height) .style("fill", color_scale(i)); var panel = svg.append("rect") .attr("x", (width / 2) + 10) .attr("y", y_scale(i)) .attr("width", bar_width) .attr("height", bar_height) .style("fill", color_scale(i)); } </script> <p>Every one of the individual rectangles above is a solid color, even though it looks like each has its own gradient. It's the effect of the proximity to slightly lighter and darker colors above and below that makes the contrasts between them appear to form two ends of a gradient in each rectangle. It seems to be most pronounced in the corners.</p> <h4>Transparence and Optical Mixture</h4> <p>Albers teaches that we can simulate transparency and the apparent ordering/stacking of layers with color mixtures; SVG allows for specific opacity settings. Let's try it both ways, first with explicit color changes:</p> <div id='transparency'></div> <script> var width = 450, height = 700; var svg = d3.select("#transparency").append("svg") .attr("width", width) .attr("height", height); // the frame var outer = svg.append("rect") .attr("x", 0) .attr("y", 0) .attr("width", width) .attr("height", height) .attr("fill", "#ADA0BA"); // black "foreground" var foreground = svg.append("rect") .attr("x", 210) .attr("y", 50) .attr("width", 200) .attr("height", 600) .attr("fill", "#111"); // white strips, left side var strip1 = svg.append("rect") .attr("x", 60) .attr("y", 110) .attr("width", 150) .attr("height", 120) .attr("fill", "#eee"); var strip2 = svg.append("rect") .attr("x", 60) .attr("y", 290) .attr("width", 150) .attr("height", 120) .attr("fill", "#eee"); var strip3 = svg.append("rect") .attr("x", 60) .attr("y", 470) .attr("width", 150) .attr("height", 120) .attr("fill", "#eee"); // "white" strips, right side var strip4 = svg.append("rect") .attr("x", 210) .attr("y", 110) .attr("width", 120) .attr("height", 120) .attr("fill", "#333"); var strip5 = svg.append("rect") .attr("x", 210) .attr("y", 290) .attr("width", 120) .attr("height", 120) .attr("fill", "#888"); var strip3 = svg.append("rect") .attr("x", 210) .attr("y", 470) .attr("width", 120) .attr("height", 120) .attr("fill", "#ccc"); </script> <p>Note that none of the transparent-seeming sections are actually transparent; it is only simulated by shifting the color mix. Even so, it appears that the one at top is "behind" the black, and the one at bottom is "in front of" the black.</p> <p>Let's try doing it again, but this time with SVG opacity variations.</p> <div id='transparency2'></div> <script> var width = 450, height = 700; var svg = d3.select("#transparency2").append("svg") .attr("width", width) .attr("height", height); // the frame var outer = svg.append("rect") .attr("x", 0) .attr("y", 0) .attr("width", width) .attr("height", height) .attr("fill", "#ADA0BA"); // black "foreground" var foreground = svg.append("rect") .attr("x", 210) .attr("y", 50) .attr("width", 200) .attr("height", 600) .attr("fill", "#111"); // white strips, left side var strip1 = svg.append("rect") .attr("x", 60) .attr("y", 110) .attr("width", 150) .attr("height", 120) .attr("fill", "#eee"); var strip2 = svg.append("rect") .attr("x", 60) .attr("y", 290) .attr("width", 150) .attr("height", 120) .attr("fill", "#eee"); var strip3 = svg.append("rect") .attr("x", 60) .attr("y", 470) .attr("width", 150) .attr("height", 120) .attr("fill", "#eee"); // "white" strips, right side var strip4 = svg.append("rect") .attr("x", 210) .attr("y", 110) .attr("width", 120) .attr("height", 120) .attr("fill-opacity", 0.15) .attr("fill", "#eee"); var strip5 = svg.append("rect") .attr("x", 210) .attr("y", 290) .attr("width", 120) .attr("height", 120) .attr("fill-opacity", 0.5) .attr("fill", "#eee"); var strip3 = svg.append("rect") .attr("x", 210) .attr("y", 470) .attr("width", 120) .attr("height", 120) .attr("fill-opacity", 0.85) .attr("fill", "#eee"); </script> <p>Looks very similar, right?</p> <p>If you look at the source, you'll see that the structure of this second version is exactly the same. Only two things change: first, the right halves of the strips are set to the same initial color as the left halves, <code>#eee</code>, whereas in the first version each is set to an explicitly different color on the grey scale; second, the <code>fill-opacity</code> is varied for each of these three from <code>0.15</code> at the top (so more of the background black comes through) to <code>0.85</code> at the bottom (so more of the white stays "on top"). Just like the first version, each of the "strips" are actually rendered as two separate <code>rect</code> elements.</p> <p>So there it is, you can truly simulate transparency and ordering / stacking just by varying colors, and achieve results almost exactly like using actual transparency, as demonstrated by the second diagram.</p> <p>One more example from the book exhibiting the effects of "optical mixture". There are four colors in this example: white, blue, olive, and mint (for lack of better terms). The individual circles and their "donut holes" are all the same size, but the color mixing makes it appear otherwise. Also, changing contrast in the background colors relative to the foreground create their own effects, shifting the sense of what's foreground and background.</p> <div id='circles'></div> <script> var width = 380, height = 800; var svg = d3.select("#circles").append("svg") .attr("width", width) .attr("height", height); // colors var white = "#eee", olive = "#8A8049", blue = "#248591", mint = "#9BC9B2"; // the frame var outer = svg.append("rect") .attr("x", 0) .attr("y", 0) .attr("width", width) .attr("height", height) .attr("fill", olive); // padding elements var padding = 40; // scales for placing the circles var dia = 30; var x = d3.scale.linear() .domain([0, 9]) .range([padding + dia/2, width - (padding + dia/2)]); var ydia = (height - (padding * 2)) / 24; var y = d3.scale.linear() .domain([0, 23]) .range([padding + dia/2, height - (padding + dia/2)]); // ranges for counting the circles var xrange = d3.range(0, 10); var yrange = d3.range(0, 8); // draw outer circles, want to repeat per color var outer_circles = function(range_factor, color) { xrange.forEach(function (xe, xi, xa) { yrange.forEach(function (ye, yi, ya) { svg.append("circle") .attr("cx", x(xe)) .attr("cy", y(ye + range_factor)) .attr("r", dia/2) .attr("fill", color); }); }); }; outer_circles(0, white); outer_circles(8, mint); outer_circles(16, blue); // draw inner circles, arbitrary sets of y-lines and color var inner_circles = function(ystart, ystop, color) { xrange.forEach(function (xe, xi, xa) { d3.range(ystart, ystop).forEach(function (ye, yi, ya) { svg.append("circle") .attr("cx", x(xe)) .attr("cy", y(ye)) .attr("r", dia/5) .attr("fill", color); }); }); }; inner_circles(2, 4, mint); inner_circles(4, 6, olive); inner_circles(6, 10, blue); inner_circles(10, 12, olive); inner_circles(14, 18, white); inner_circles(18, 20, mint); inner_circles(20, 22, olive); </script> <p>Wow, that turned out better than I thought, but it took a while. This was a good exercise in framing scaled elements with padding in d3. I had tried to eyeball the inner frame shape and circle diameters based on calculations based on padding, width, and height, but it didn't line up right until I realized it's just an exact 10 x 24 grid.</p> <p>Once I reset the scaling to use that grid (worked right away), I rewrote the outer/inner circle rendering bits using one function for each; it could be taken a step further with one function for both that would allow the diameter as a parameter too, and the rows and colors could just be one simple data structure to loop over, but it's good enough as is.</p> <p>Finally, the colors were a bear to get right. I eyeballed a match to the colors in the iPad app but the contrast just didn't pop the way it does in the Yale-produced ebook. After playing with the colors a lot I remembered: I use the <a href="https://justgetflux.com/">flux app</a> on my desktop, and was working on this at night, so everything was completely wrong! After turning flux off I was able to get a lot closer, though the ebook version is still much better.</p> <h4>Summary</h4> <p>This has been a great exercise in working with the lessons in color Albers lays out so elegantly in his book. If this interests you at all I recommend you get a copy for yourself (the iPad ebook is worth every penny). A colleague at our library told me we have an early print edition with all the fold-outs and flaps, so I will have to take a look at that as well.</p> <p>It's also been a good lesson in using d3 to render simple shapes and colors, and remembering to look in the d3 docs for a cleanly defined function I'd have otherwise more awkwardly wired up myself in javascript. Even something as simple to do by hand as what <code>d3.range()</code> <a href="https://github.com/mbostock/d3/wiki/Arrays#d3_range">offers</a> has a familiar feel and semantic specificity that makes d3 just make all the more sense.</p> <p>I am about halfway through the text and could use a lot more d3 practice, so before I move on to rendering data more explicitly I might take a stab at a "part two" post along these same lines.</p> <p>If any of the specifics interest you I'd suggest you look at the source directly in your browser or using the github links to view or edit the full markdown+javascript file I'm writing here and feeding into <a href="http://blog.getpelican.com/">pelican</a>. Pull requests welcome, especially if you spot mistakes or just plain bad ideas, I know I still have a lot to learn.</p> <p>(See also part two, <a href="http://data.onebiglibrary.net/2014/09/04/albers-color-studies-part-2/">Albers color studies in D3.js, part 2</a>)</p>