<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>data.onebiglibrary.net</title><link href="https://data.onebiglibrary.net/" rel="alternate"/><link href="/feeds/all.atom.xml" rel="self"/><id>https://data.onebiglibrary.net/</id><updated>2026-03-03T00:00:00-05:00</updated><entry><title>Formally verifying mrrc</title><link href="https://data.onebiglibrary.net/2026/03/03/formally-verifying-mrrc/" rel="alternate"/><published>2026-03-03T00:00:00-05:00</published><updated>2026-03-03T00:00:00-05:00</updated><author><name>dchud</name></author><id>tag:data.onebiglibrary.net,2026-03-03:/2026/03/03/formally-verifying-mrrc/</id><summary type="html">&lt;p&gt;Using agents to improve engineering rigor for experimental MARC library&lt;/p&gt;</summary><content type="html">&lt;p&gt;One of my experiments these past few months has been developing a Rust-based
MARC21 record library with
a &lt;a href="https://pymarc.readthedocs.io/en/latest/"&gt;Pymarc&lt;/a&gt;-compatible Python wrapper.
It started as something of a lark when I was on a call with a few people in
mid-December, showing them how I was using various agentic tools. One of those
people was &lt;a href="https://inkdroid.org/"&gt;Ed&lt;/a&gt;, who knows a few things about building
MARC tools in new languages (see
&lt;a href="https://github.com/perl4lib/marc-perl"&gt;MARC/Perl&lt;/a&gt; and, well, Pymarc). So I said
"what if we try writing a MARC library in Rust?" I had noticed a few efforts out
there, but the joke was that I don't even know Rust other than to recognize it
by sight. The idea was to show how we could get something basic working
surprisingly quickly.&lt;/p&gt;
&lt;p&gt;And, well, now there's &lt;a href="https://github.com/dchud/mrrc"&gt;mrrc&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It might be useable already as a general-purpose MARC library, though I'm not
sure quite yet. At first I just wanted to see if I could get it to do something
useful quickly. I could, it did. Going from nothing to parsing thousands of binary
MARC records happened faster than we could believe at first. Maybe it was 20-30
minutes? In a language I didn't know.&lt;/p&gt;
&lt;p&gt;We spent a little more time on that call discussing tooling and
prompts as I was getting to know
&lt;a href="https://github.com/steveyegge/beads"&gt;beads&lt;/a&gt; at the time (and am
still happy with it, though I've cut over to the &lt;a href="https://github.com/Dicklesworthstone/beads_rust"&gt;Rust
port&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Then afterward I noticed some frontier model news and thought "well let's see if
it can wrap this Rust API with Python and have a Pymarc-like API" and of course
it could. Then I thought "well can we make it go faster" and the answer was yes.
You get the point.&lt;/p&gt;
&lt;p&gt;Suffice it to say that at this point, mrrc has a lot of useful features built
around the core MARC standards its obvious related standards like MARCXML, MODS,
and BIBFRAME. Have a look at &lt;a href="https://dchud.github.io/mrrc/"&gt;the mrrc
documentation&lt;/a&gt; to see for yourself. What mrrc
lacks is users, though. I mentioned it on the &lt;a href="https://code4lib.org/slack/"&gt;code4lib
slack&lt;/a&gt; #python channel a few weeks ago and some
people were supportive, but I haven't heard back. It's okay, I know there are
other tools that already do all this. And this is just an experiment.&lt;/p&gt;
&lt;p&gt;And as it turns out, some of them might have taken a look at it and dismissed it
quickly, because it wasn't quite what it claimed to be just yet. In particular,
it didn't support &lt;a href="https://www.loc.gov/standards/marcxml/"&gt;MARCXML&lt;/a&gt;, it actually
supported "MARC XML", which was the XML encoding it apparently made up itself.
It was &lt;em&gt;like&lt;/em&gt; MARCXML, but it wasn't actually MARCXML. When I realized this (by actually trying to use mrrc
through Python myself for something practical) it immediately errored out that
it couldn't handle the MARCXML namespace and that was that. Have a look at &lt;a href="https://github.com/dchud/mrrc/issues/15"&gt;the
ticket filed on this&lt;/a&gt; for some details
on where it went wrong, and keep reading  for a laugh about the absurd workaround it
gamely figured out (points for getting it to work, but that was not what I had
in mind).&lt;/p&gt;
&lt;p&gt;There have been a few other bumps in the road. The bot (which I won't name here)
made up some benchmark numbers. A few times. At one point I was playing with
trying different binary encodings using more common/contemporary formats like
&lt;a href="https://flatbuffers.dev/"&gt;flatbuffers&lt;/a&gt; and &lt;a href="https://protobuf.dev/"&gt;protobuf&lt;/a&gt; to
see if there might be some clear payoff over the stalwart &lt;a href="https://www.iso.org/standard/41319.html"&gt;ISO
2709&lt;/a&gt; binary format (upshot:
marginally, yes; performance, not for basic use cases; practically, there's no
demand, and initial respondents noticed how much they bloated the dependency
chain, so I scrapped it all, although the notes and discarded code are still in
another repo). Anyway in that process I templated out a per-format evaluation
approach and after building test implementations of 2-3 of the formats on my
list I noticed that the Nth one got built and tested suspiciously quickly. What
happened? That bot just looked at the other evaluations based on the same
template and made up its own evaluation using the same template without writing
any code. So there's that.&lt;/p&gt;
&lt;p&gt;If you're paying close attention, though, you'll know that something really
spiked in frontier model quality right around the end of 2025 and in early 2026
I upgraded and suddenly I had thoughts of making this library truly useful. And
at this point, I think it's really quite close to that.&lt;/p&gt;
&lt;p&gt;This is all still an experiment in learning, but I've shifted goals
from wanting to understand "how do I build things this way at all?"
to "how can I build something good?" And after listening to the
&lt;a href="https://oxide-and-friends.transistor.fm/episodes/engineering-rigor-in-the-llm-age"&gt;Oxide and Friends show about
rigor&lt;/a&gt;
(thanks Ed for the tip!) now I want to know "how can I build something
&lt;em&gt;reliably&lt;/em&gt; good?"&lt;/p&gt;
&lt;p&gt;So the latest round of work - apart from squashing the embarrassing
bugs as I find them - is to bring in formal methods for verification,
to help specify how the library should behave and confirm that it
does, rather than just to test whether it does certain specific
things the right way with specific examples. I'm building out a &lt;a href="https://github.com/dchud/mrrc-testbed"&gt;testbed for the
library&lt;/a&gt; to make it easy to throw piles
of real-world data at it, and then using that environment to start layering in
verification tools for both Rust and Python. You can see &lt;a href="https://github.com/dchud/mrrc-testbed/blob/main/formal-methods-verification-strategy.md"&gt;the overall
strategy&lt;/a&gt;
I'm working from and &lt;a href="https://github.com/dchud/mrrc-testbed/blob/main/formal-methods-implementation-plan.md"&gt;the implementation
plan&lt;/a&gt;
I'm working from.&lt;/p&gt;
&lt;p&gt;To be 100% clear, those documents, as was the whole &lt;a href="https://github.com/dchud/mrrc-testbed/blob/main/testbed-proposal.md"&gt;testbed
proposal&lt;/a&gt;
that started mrrc-testbed, are largely bot-generated, but I had
particular goals in mind while prompting those proposals, and have
learned a bit more about refining things until they have the right
shape. For example, I over-engineered the heck out of the testbed
proposal, and one morning I recognized that, and asked "have we
over-engineered this?" Thoughtfully, the bot replied "Yes, and
here's how I would simplify it." So we simplified. Still, I don't
feel quite right saying I wrote them, because I didn't.  Honestly,
I don't have any practical experience and only a little education
in formal methods, so I couldn't have written them myself. But
this is something I want to learn, and mrrc seems like a perfect
tool to learn them on.&lt;/p&gt;
&lt;p&gt;And then there's the wonderfully helpful &lt;a href="https://github.com/DrCatHicks/learning-opportunities"&gt;learning-opportunities Claude
skill&lt;/a&gt; by &lt;a href="https://www.drcathicks.com/"&gt;Dr. Cat
Hicks&lt;/a&gt;, which I learned about via &lt;a href="https://www.changetechnically.fyi/2396236/episodes/18692591-you-can-learn-with-ai"&gt;this episode of
Change,
Technically&lt;/a&gt;.
Not only does it work very well, it's exactly the kind of tool that I like to
learn with, developed by somebody who obviously knows a lot about how we learn
with tools. I'm working with it as I go, with a whole learning plan on the side
(it's for me, not the repo, so it's not in there) and getting it to ask me
questions and assess my understanding as I progress. I recommend this skill
highly if you're using these bots, or even (maybe especially) if you're new to
them and want to try some new things out.&lt;/p&gt;
&lt;p&gt;Progress so far includes finishing setting up the testbed, refining
the test workflow, and working through "&lt;a href="https://github.com/dchud/mrrc-testbed/blob/main/formal-methods-implementation-plan.md#wave-a-low-hanging-fruit-phases-1--beginning-of-3"&gt;Wave
A&lt;/a&gt;"
from the implementation plan, incorporating
&lt;a href="https://docs.rs/proptest/latest/proptest/"&gt;proptest&lt;/a&gt; and
&lt;a href="https://github.com/rust-lang/miri"&gt;Miri&lt;/a&gt; for Rust and
&lt;a href="https://hypothesis.readthedocs.io/en/latest/"&gt;Hypothesis&lt;/a&gt; for
Python. The testbed immediately caught something, too! And now there
are even more nightly jobs to run and the start of better assurance
that things like round-tripping data from one serialization to
another will be reliable.&lt;/p&gt;
&lt;p&gt;On to Wave B...&lt;/p&gt;</content><category term="20260303-formally-verifying-mrrc"/><category term="marc21"/><category term="mrrc"/><category term="format methods"/><category term="rust"/><category term="python"/></entry><entry><title>Choosing a Favicon</title><link href="https://data.onebiglibrary.net/2026/02/25/choosing-a-favicon/" rel="alternate"/><published>2026-02-25T00:00:00-05:00</published><updated>2026-02-25T00:00:00-05:00</updated><author><name>dchud</name></author><id>tag:data.onebiglibrary.net,2026-02-25:/2026/02/25/choosing-a-favicon/</id><summary type="html">&lt;p&gt;This &lt;a href="https://getpelican.com/"&gt;Pelican&lt;/a&gt; blog has sat still since about 2014.
I used to publish it to an S3 bucket with static hosting turned on. I've been
thinking about establishing a new place to write lately, then remembered this
was already here, and the domain kinda covers my professional interests still,
so …&lt;/p&gt;</summary><content type="html">&lt;p&gt;This &lt;a href="https://getpelican.com/"&gt;Pelican&lt;/a&gt; blog has sat still since about 2014.
I used to publish it to an S3 bucket with static hosting turned on. I've been
thinking about establishing a new place to write lately, then remembered this
was already here, and the domain kinda covers my professional interests still,
so I went with it. But retconning it into contemporary versions of things wasn't
a straightforward task. Not an overwhelmingly hard task, or very large, but it
was a bunch of stuff that had to change a little all at once.&lt;/p&gt;
&lt;p&gt;Then there's Claude.&lt;/p&gt;
&lt;p&gt;I'm not going to go on about Claude itself a lot, there are plenty of people
doing that already. But suffice it to say that while I'm not employed
I have a lot of time on my hands and a lot of experience managing
software teams. Tools this good - at least in terms of performance
in building software systems, setting aside the much bigger questions
at play - are really still brand-spanking new. They weren't this
good more than three months ago. They just let me build pretty much
whatever I feel like building, as if I had my own team that works
really quickly and lets me operate in the plan/scope/refine/review
mode I've been in professionally for a while now.&lt;/p&gt;
&lt;p&gt;So while I have this time available and this ridiculous toolkit and
before it all crumbles somehow or the prices go sky-high (both of
which will probably happen sooner than we expect) and I have a bunch of ideas of
stuff to build, I'm building stuff.&lt;/p&gt;
&lt;p&gt;Fiddly tiny blog migration project? I know a thing that will help me get that
done a lot faster than I could myself. It created &lt;a href="https://github.com/dchud/data.onebiglibrary.net/pull/1"&gt;this pull
request&lt;/a&gt; to modernize
the environment, and &lt;a href="https://github.com/dchud/data.onebiglibrary.net/pull/2"&gt;this pull
request&lt;/a&gt; to retcon old
posts into the new structure. It coached me on getting migrated to a new hosting
platform and updating DNS. Darned thing just worked, dang it all.&lt;/p&gt;
&lt;p&gt;I was testing the retconned content and noticed a warning about lacking
a favicon. What to do? I asked for proposals. Thoughtfully it reached into my
older content and came up with good ideas, so I asked it to draft the rest of
this post for content.&lt;/p&gt;
&lt;p&gt;I went with "four patch grid" because it looked best as a favicon,
and there was something familiar about it, beyond just the color
choices (see Albers color study posts from 2014). Took a few minutes
to figure it out, but it's basically the same design as the old
&lt;a href="https://en.wikipedia.org/wiki/Delicious_(website)"&gt;del.icio.us icon&lt;/a&gt;,
just with the Albers study colors taken from those prior posts.
Seemed like a good fit. I don't know if joshu is still out there, but I hope
he's been good, and I hope he doesn't mind.&lt;/p&gt;
&lt;p&gt;Everything below this sentence was bot-generated.&lt;/p&gt;
&lt;h2&gt;The Brief&lt;/h2&gt;
&lt;p&gt;The site needed a favicon. Given that so much of the early content here
is built around Josef Albers' color studies rendered in D3, it made
sense to draw from that same visual language: simple geometric shapes,
deliberate color choices, and the interplay between them.&lt;/p&gt;
&lt;p&gt;The constraint is severe. A favicon lives at 16x16 or 32x32 pixels.
Whatever we choose has to read clearly at those sizes, in a browser tab
alongside a dozen others, often against both light and dark chrome. That
rules out anything with fine detail or subtle gradation. What survives
is flat color and strong shape.&lt;/p&gt;
&lt;p&gt;With that in mind, here were four candidates, each inspired by a
different Albers study from the earlier posts.&lt;/p&gt;
&lt;h2&gt;Option 1: Nested Squares&lt;/h2&gt;
&lt;p&gt;&lt;img src="https://data.onebiglibrary.net/2026/02/25/choosing-a-favicon/favicon-nested.svg" width="128" height="128" alt="nested squares at 128px" /&gt; &lt;img src="https://data.onebiglibrary.net/2026/02/25/choosing-a-favicon/favicon-nested.svg" width="64" height="64" alt="nested squares at 64px" /&gt; &lt;img src="https://data.onebiglibrary.net/2026/02/25/choosing-a-favicon/favicon-nested.svg" width="32" height="32" alt="nested squares at 32px" /&gt; &lt;img src="https://data.onebiglibrary.net/2026/02/25/choosing-a-favicon/favicon-nested.svg" width="16" height="16" alt="nested squares at 16px" /&gt;&lt;/p&gt;
&lt;p&gt;Three concentric squares: deep purple, red-orange, yellow. This echoes
the very first Albers exercise on the site —
&lt;a href="https://data.onebiglibrary.net/2014/08/08/simple-color-relationships/"&gt;placing one color inside another&lt;/a&gt;
to see how the surround changes your perception of the inner
color. The nesting reads well at small sizes because the contrast
between the three layers is high. At 16px the innermost square is
only a few pixels across, but the warm-to-cool progression still
registers.&lt;/p&gt;
&lt;h2&gt;Option 2: Offset Overlap&lt;/h2&gt;
&lt;p&gt;&lt;img src="https://data.onebiglibrary.net/2026/02/25/choosing-a-favicon/favicon-overlap.svg" width="128" height="128" alt="offset overlap at 128px" /&gt; &lt;img src="https://data.onebiglibrary.net/2026/02/25/choosing-a-favicon/favicon-overlap.svg" width="64" height="64" alt="offset overlap at 64px" /&gt; &lt;img src="https://data.onebiglibrary.net/2026/02/25/choosing-a-favicon/favicon-overlap.svg" width="32" height="32" alt="offset overlap at 32px" /&gt; &lt;img src="https://data.onebiglibrary.net/2026/02/25/choosing-a-favicon/favicon-overlap.svg" width="16" height="16" alt="offset overlap at 16px" /&gt;&lt;/p&gt;
&lt;p&gt;Two overlapping rectangles — teal and yellow — with a mint intersection
zone. This one references the
&lt;a href="https://data.onebiglibrary.net/2014/08/08/simple-color-relationships/"&gt;transparency and optical mixture studies&lt;/a&gt;,
where Albers demonstrated that you can simulate the appearance of
overlapping translucent layers using nothing but flat, opaque color.
The intersection is the key: it's a third color that your eye reads
as the blend of the other two, even though it's just a solid fill.&lt;/p&gt;
&lt;p&gt;At 128px this tells the richest story of any of the options. At 16px
the three-zone structure starts to muddy — the intersection becomes
hard to distinguish from its neighbors, and the asymmetric layout
loses its spatial logic. The idea is stronger than its execution at
favicon scale.&lt;/p&gt;
&lt;h2&gt;Option 3: Four-Patch Grid&lt;/h2&gt;
&lt;p&gt;&lt;img src="https://data.onebiglibrary.net/2026/02/25/choosing-a-favicon/favicon-grid.svg" width="128" height="128" alt="four-patch grid at 128px" /&gt; &lt;img src="https://data.onebiglibrary.net/2026/02/25/choosing-a-favicon/favicon-grid.svg" width="64" height="64" alt="four-patch grid at 64px" /&gt; &lt;img src="https://data.onebiglibrary.net/2026/02/25/choosing-a-favicon/favicon-grid.svg" width="32" height="32" alt="four-patch grid at 32px" /&gt; &lt;img src="https://data.onebiglibrary.net/2026/02/25/choosing-a-favicon/favicon-grid.svg" width="16" height="16" alt="four-patch grid at 16px" /&gt;&lt;/p&gt;
&lt;p&gt;A 2x2 grid of colored squares: pink, red, green, grey. These are
the four colors from the juxtaposition study in the
&lt;a href="https://data.onebiglibrary.net/2014/09/04/albers-color-studies-part-2/"&gt;second Albers post&lt;/a&gt;,
where every permutation of layering was laid out in a grid to show
how quantity and ordering affect the feel of a color combination.&lt;/p&gt;
&lt;p&gt;This one holds up the best at small sizes. Four equal quadrants is
about the simplest spatial structure you can have beyond a single
block, and the four colors are different enough in both hue and value
that they stay distinct even at 16px. The 2x2 grid also has a nice
visual rhythm — it reads as a deliberate pattern, not a smudge.&lt;/p&gt;
&lt;h2&gt;Option 4: Stacked Bands&lt;/h2&gt;
&lt;p&gt;&lt;img src="https://data.onebiglibrary.net/2026/02/25/choosing-a-favicon/favicon-bands.svg" width="128" height="128" alt="stacked bands at 128px" /&gt; &lt;img src="https://data.onebiglibrary.net/2026/02/25/choosing-a-favicon/favicon-bands.svg" width="64" height="64" alt="stacked bands at 64px" /&gt; &lt;img src="https://data.onebiglibrary.net/2026/02/25/choosing-a-favicon/favicon-bands.svg" width="32" height="32" alt="stacked bands at 32px" /&gt; &lt;img src="https://data.onebiglibrary.net/2026/02/25/choosing-a-favicon/favicon-bands.svg" width="16" height="16" alt="stacked bands at 16px" /&gt;&lt;/p&gt;
&lt;p&gt;Four horizontal bands progressing from dark olive to bright yellow.
This references the
&lt;a href="https://data.onebiglibrary.net/2014/09/04/albers-color-studies-part-2/"&gt;middle mixture and light intensity studies&lt;/a&gt;,
where Albers explored how adjacent colors of similar value seem to merge
while colors with strong value contrast maintain their boundaries.&lt;/p&gt;
&lt;p&gt;The bands are clean and simple, but at 16px the four-step gradient
compresses into something that reads more like a generic color swatch
than a distinctive mark. The progression is too smooth — there's no
focal point for the eye to grab.&lt;/p&gt;
&lt;h2&gt;The Choice&lt;/h2&gt;
&lt;p&gt;The four-patch grid won. It has the clearest identity at the sizes
that matter, the colors are directly drawn from the Albers work on
the site, and the 2x2 structure is simple enough to be iconic without
being generic. It's the one you're seeing in your browser tab right
now.&lt;/p&gt;</content><category term="20260225-choosing-a-favicon"/><category term="design"/><category term="meta"/></entry><entry><title>Starting a News Service</title><link href="https://data.onebiglibrary.net/2026/02/25/starting-a-news-service/" rel="alternate"/><published>2026-02-25T00:00:00-05:00</published><updated>2026-02-25T00:00:00-05:00</updated><author><name>dchud</name></author><id>tag:data.onebiglibrary.net,2026-02-25:/2026/02/25/starting-a-news-service/</id><summary type="html">&lt;p&gt;I'm trying to replace my local newspaper with feeds and code&lt;/p&gt;</summary><content type="html">&lt;p&gt;You might have heard that our local paper isn't what it used to be. It's hard to
describe how jarring it has been to watch its decline. I don't have any close
friends there but I've been lucky enough to meet or know several staffers there
over the years and I can't imagine how it's been for them.&lt;/p&gt;
&lt;p&gt;Sadly, it's not something worth paying for any longer, so we've unsubscribed.&lt;/p&gt;
&lt;p&gt;That leaves me without a local paper. And reaching for the ol' feed reader. It
works, as expected, but it's not quiiiiite what I want.&lt;/p&gt;
&lt;p&gt;I didn't know that Terry Godier was working on
&lt;a href="https://www.terrygodier.com/current"&gt;Current&lt;/a&gt; when I read Terry's
post on &lt;a href="https://www.terrygodier.com/phantom-obligation"&gt;Phantom
Obligation&lt;/a&gt; a few
weeks ago.  Aside from a very slick degrading-to-ascii rendering
of diagrams piece which really stood out, I agreed fully with their
problem statement that shaping a news reader like a set of growing
inboxes isn't the right fit. There has been lots of praise for
Current since its release, and I bet it's great, but I haven't tried
it yet, because I'm not looking for a river.&lt;/p&gt;
&lt;p&gt;I'm looking for a newspaper.&lt;/p&gt;
&lt;p&gt;I love print newspapers. I loved them before I got to work on projects like
&lt;a href="https://www.loc.gov/collections/chronicling-america/about-this-collection/"&gt;Chronicling America&lt;/a&gt;
and loved them even more after. We used to take the local paper daily, FT on
Saturdays, and NYT on Sundays. Then a while back (you can guess the timeline, no
need to replay it) we dropped the local print edition, switching to only online.
Now that's gone.&lt;/p&gt;
&lt;p&gt;But I need local news. I live here, and I want a city desk and
weather and news about shows and exhibits and news of &lt;a href="https://wtop.com/dc/2026/02/snow-create-glacier-built-at-rfk-parking-lot/"&gt;crazy snow
mounds&lt;/a&gt;
and book reviews.  And sports! I love sports, and really enjoy
following beat and opinion writers and looking forward to their
takes on big games or trades and occasional features that help me
get to know the players I root for and against better.&lt;/p&gt;
&lt;p&gt;You can get all this with a feed reader - there's so much good stuff out there!
- but it doesn't feel like a newspaper.&lt;/p&gt;
&lt;p&gt;I want to open something with real local awareness and see a mix of big stories
at different geopolitical scales mixed in with a good local section and an arts
section and on and on. And I have another set of "requirements", for lack of
a better word.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Track far more feeds than I could possibly ever read serially to find good
  stuff from a wide range of sources&lt;/li&gt;
&lt;li&gt;Track a diversity of opinions - or at least a little wider than my
  typical well-circumscribed worldview - to be challenged to rethink things&lt;/li&gt;
&lt;li&gt;Track overlapping sources and look at different angles on the same story&lt;/li&gt;
&lt;li&gt;Keep a living discovery component that pulls in examples from new feeds
  I didn't already track so it's easy to expand the range of voices coming in&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I want to track way more stuff than most people can but with a
coherence and clarity that feels like the wise editorial staff of
a stalwart paper has made thoughtful choices that fit my brain,
sectioning up major news stories and a variety of categories of
stories that I'm probably going to be interested in, even if I never
read them all.&lt;/p&gt;
&lt;p&gt;I also want to be able to find something new and realize I want to know more
and then trace backward to see prior posts about the same thing from a variety
of sources.&lt;/p&gt;
&lt;h2&gt;Okay, so now what?&lt;/h2&gt;
&lt;p&gt;My professional training and experience are in information science
and data science, so naturally, my reaction to all this is to build
a news service. Sort of.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://netnewswire.com/"&gt;NetNewsWire&lt;/a&gt; has been around forever and works great,
so I'm using it as my stand-in reader and collector. I keep adding stuff to it.
There's so much good and interesting writing out there! For years I was locked
into the well-known loop of checking the same 5-7 sites every day, all the time,
and it gnawed at me for ages but now I can actually do something about it. And,
in a way, it feels like I have to, considering the number one item on that site list
basically doesn't exist any more.&lt;/p&gt;
&lt;p&gt;So I have an experimental repo with a messy pile of specs for
experiments and bits of infrastructure plans for folding, spindling,
and mutilating the data coming in through NNW to see if I can start
to get to a "newspaper" that I want to read every day. It's a mess. Really. It's
a private repo on github because I wouldn't expect anyone to understand it.
I don't think I understand it. But there are a lot of interesting pieces in
there already.&lt;/p&gt;
&lt;p&gt;For one thing, the new tools make it ridiculously easy to try stuff, to explore
a design and feature and data space to see if you can make an idea work. And
when you try stuff, you then need to evaluate it, and it's also ridiculously easy to
build evaluation tools that feed back into the process. Here's a taste of a few
of those. None of this is fully baked.&lt;/p&gt;
&lt;h3&gt;Named entity recognition&lt;/h3&gt;
&lt;p&gt;One of the ongoing experiments involves NER. Can we do this well
enough on arbitrary feed content to build features around it? I'm
not sure, but I'm trying. The following image is a little squeezed
and throws off the alignment of the assessment "buttons", but imagine
they look a lot neater, enough that it's an easy tool to assess
lots of extracted names of people, places, and organizations quickly.
Can some popular libraries do this well? Can a local LLM do it
better and also fast enough on my consumer laptop? What does the
data model need to look like, and can it be good enough often
enough to hang features on? This entity canonicalization assessment
tool helps me start to answer some of those questions.&lt;/p&gt;
&lt;p&gt;&lt;img alt="canonicalization-review" src="https://data.onebiglibrary.net/2026/02/25/starting-a-news-service/canonicalization-review.png"&gt;&lt;/p&gt;
&lt;h3&gt;Article sameness&lt;/h3&gt;
&lt;p&gt;Another key area I need some traction on to make my weird vision come together
is a good handle on finding articles that are about the same thing. Either
duplicates, or linklogs posting the same external post, or multiple news stories
about the same event, or many stories related to but not strictly about the same
event. This isn't a new problem, but I'm trying to come up with a banded
sameness factor using embeddings that is fast to compute and easy to build
around. The image below shows evaluating the sameness assessments the
experimental code has generated.&lt;/p&gt;
&lt;p&gt;&lt;img alt="sameness-evaluate" src="https://data.onebiglibrary.net/2026/02/25/starting-a-news-service/sameness-evaluate.png"&gt;&lt;/p&gt;
&lt;h3&gt;Story clusters&lt;/h3&gt;
&lt;p&gt;This last screenshot shows an attempt to assess designation of story clusters
with tagged facets or threads to make it easy to find different ways into and
through a story. It feels promising. It's another case where I have to find
a sweet spot between local llm quality and fast/cheap computability if this
is going to work at the content scale I'm aiming for. Right now, that's about
300 feeds grouped into 10-12 categories (roughly akin to newspaper sections
we're all familiar with), and I would hope that if this goes well that feed
count will double or triple before long. But that's already something like 1k
new posts coming in every day, and you just can't ask a local llm to churn
through all that and update clusters on a macbook air, however neat the
M processors are (which is pretty neat, but not magical).&lt;/p&gt;
&lt;p&gt;&lt;img alt="cluster-review" src="https://data.onebiglibrary.net/2026/02/25/starting-a-news-service/cluster-review.png"&gt;&lt;/p&gt;
&lt;h3&gt;Current thinking&lt;/h3&gt;
&lt;p&gt;I miss having a good local paper. Our digital subscription will run out by this
weekend. I am encouraged that I'm not far off from having something workable
before long, encouraged that these experiments have started to show a hint that
I can get to where I want to be, or at least closer. I've been putting off
specifying a full UX plan while I figure out which of these experiments are
going to fit together and how to administer them over time. And as I shift
a little further toward confidence that I can cobble together enough tricks to
produce something useful and meaningfully worth using, I've realized a few
things I wasn't clear on at the start.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I'm building something for me. I'll share the code once it adds up to something
  anybody other than me would want to look at, but it needs to fit my brain
  first and foremost.&lt;/li&gt;
&lt;li&gt;I thought I'd cobble together some extractors and relators and then have a UI.
  But the more of these evaluations I do, the more I realize that I'll probably
  keep refining these pieces as I go. Maybe not as intensively, but there will be
  new methods and models and I'll want to try new things. Experimentation with
  evaluation is a core function.&lt;/li&gt;
&lt;li&gt;There's a human-in-the-loop to some of the situations I've run into, where a
  local LLM can blow away a "traditional" model in performance but fails at the
  modest but not trivial scale I'm already at. But if you have a person look at
  one set of output and make judgements and choices and then put a briefer form
  of that output and choices into a local LLM, maybe you can get where you want
  to be.&lt;/li&gt;
&lt;li&gt;Maybe that personal mix of assessment and evaluation is not just something
  I want to engage in briefly during early experimentation. Maybe it's a core
  function of the system. I &lt;em&gt;like&lt;/em&gt; doing this stuff. It helps me understand the
  material and how it arises in the world. I've been doing things like this for
  thirty years. Why stop?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://news.kagi.com/"&gt;Kagi News&lt;/a&gt; is very cool. It solves a clear set of my
problems well, but creates others. Its summaries are succinct and efficient but
I lose track of the voices of writers and the details and color that make for great
writing. &lt;a href="https://theweek.com/"&gt;The Week&lt;/a&gt; does an admirable job of counterposing
opinions from different postures, but that one-side-vs-the-other structure
doesn't work for me.&lt;/p&gt;
&lt;p&gt;I need something that's messy, that's local, that's weird, that doesn't fit into
little boxes, that offers up something like a decent comics page, and all that
takes some time. I'll let you know when I get a little further with it.&lt;/p&gt;</content><category term="20260225-starting-a-news-service"/><category term="news service"/><category term="feeds"/><category term="newspapers"/></entry><entry><title>And We're Back</title><link href="https://data.onebiglibrary.net/2026/02/24/and-we-re-back/" rel="alternate"/><published>2026-02-24T00:00:00-05:00</published><updated>2026-02-24T00:00:00-05:00</updated><author><name>dchud</name></author><id>tag:data.onebiglibrary.net,2026-02-24:/2026/02/24/and-we-re-back/</id><summary type="html">&lt;p&gt;Back after a decade-plus hiatus&lt;/p&gt;</summary><content type="html">&lt;p&gt;I've reanimated the site and updated all the bits, and will be publishing here
again. Hooray.&lt;/p&gt;
&lt;p&gt;The short-form update: I spent 10 years as a data scientist, doing work
I enjoyed a great deal and had meaningful impact and reach, but then federal
government funding priorities shifted. Now there's Claude, so I'm working on
stuff. If you're reading this, you will see more soon!&lt;/p&gt;
&lt;p&gt;Thanks for sticking around. Seriously. It means a lot.&lt;/p&gt;</content><category term="20260224-and-we-re-back"/><category term="about"/><category term="meta"/></entry><entry><title>animating Anscombe's Quartet regression diagnostics</title><link href="https://data.onebiglibrary.net/2014/11/12/animated-anscombe-quartet-regression-diagnostics/" rel="alternate"/><published>2014-11-12T00:00:00-05:00</published><updated>2014-11-12T00:00:00-05:00</updated><author><name>dchud</name></author><id>tag:data.onebiglibrary.net,2014-11-12:/2014/11/12/animated-anscombe-quartet-regression-diagnostics/</id><summary type="html">&lt;p&gt;Using the sketch developed in animating regression parts
&lt;a href="http://data.onebiglibrary.net/2014/09/18/animating-regression/"&gt;1&lt;/a&gt; and
&lt;a href="http://data.onebiglibrary.net/2014/10/18/animating-regression-part-2/"&gt;2&lt;/a&gt;,
let's take a look at &lt;a href="http://en.wikipedia.org/wiki/Anscombe's_quartet"&gt;Anscombe's
Quartet&lt;/a&gt;. What
makes these datasets useful, as wikipedia points out, is their
near-equivalent stats: the x and y sets share the same mean, sample
variance, correlation and simple linear regression model. It's
instructive …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Using the sketch developed in animating regression parts
&lt;a href="http://data.onebiglibrary.net/2014/09/18/animating-regression/"&gt;1&lt;/a&gt; and
&lt;a href="http://data.onebiglibrary.net/2014/10/18/animating-regression-part-2/"&gt;2&lt;/a&gt;,
let's take a look at &lt;a href="http://en.wikipedia.org/wiki/Anscombe's_quartet"&gt;Anscombe's
Quartet&lt;/a&gt;. What
makes these datasets useful, as wikipedia points out, is their
near-equivalent stats: the x and y sets share the same mean, sample
variance, correlation and simple linear regression model. It's
instructive as a clear example of what to watch out for when
developing simple linear regressions, and the issues each dataset
highlights come clear in the different diagnostic plots.&lt;/p&gt;
&lt;p&gt;The technical challenge here is to use the sketch developed in part
2 four times. That code is a mess; it reflects my learning process,
but it's not anything I'd want to reuse. The simplest approach to
solving this is to turn the viz element into a &lt;a href="http://bost.ocks.org/mike/chart/"&gt;reusable
chart&lt;/a&gt; (and to-read: &lt;a href="http://bocoup.com/weblog/reusability-with-d3/"&gt;Exploring
Reusability with D3.js&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;I'm under several class deadlines just now, so I won't go as far
as possible in making this nice and cleanly configurable and
modifiable, but I certainly don't want to write the same code out
four times, so I'll look for a middle ground that achieves some
code cleanup and a modicum of reuse.&lt;/p&gt;
&lt;p&gt;First off, we need to pull the source data into this page. The
Anscombe datasets and their summary statistics are readily available,
but their linear model residuals and cook's distance values require
a little calculation. There are javascript stats libraries that can
handle the regression, but they don't seem to ship with a cook's
distance implementation. (to-do: pull request.) Fortunately R ships
with the anscombe data pre-loaded, and it's easy to put all this
together and draw it out as JSON for easy use here:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nf"&gt;library&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rjson&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;a1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;data.frame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;anscombe&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;x1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;anscombe&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;y1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;names&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;c&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;x&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;a1fit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;lm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;a1&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;cooks&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;cooks.distance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a1fit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;a1&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a1fit&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;residuals&lt;/span&gt;
&lt;span class="n"&gt;a1&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;quantile&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;scale&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a1&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# repeat for a2, a3, a4&lt;/span&gt;
&lt;span class="n"&gt;aout&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;list&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;names&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;aout&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;c&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;a1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;a2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;a3&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;a4&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;aout&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;a1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a1&lt;/span&gt;
&lt;span class="n"&gt;aout&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;a2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a2&lt;/span&gt;
&lt;span class="n"&gt;aout&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;a3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a3&lt;/span&gt;
&lt;span class="n"&gt;aout&lt;/span&gt;&lt;span class="o"&gt;$&lt;/span&gt;&lt;span class="n"&gt;a4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a4&lt;/span&gt;
&lt;span class="nf"&gt;toJSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;aout&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This can be written to a file for later use, like here.&lt;/p&gt;
&lt;p&gt;The plan is to make at least these following changes to the chart:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;define function &lt;code&gt;regcycle()&lt;/code&gt; as the reusable chart and invoke it 
 four times&lt;/li&gt;
&lt;li&gt;instead of creating multiple scales and axes, rewrite each within
each plot/view mode function instead so they're self-contained&lt;/li&gt;
&lt;li&gt;move the axis updates to the top of each plot function&lt;/li&gt;
&lt;li&gt;load the source Anscombe data and initialize the charts using a 
&lt;code&gt;d3.json()&lt;/code&gt; callback&lt;/li&gt;
&lt;li&gt;bind the selections and the data to each of the four charts&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let's see how it goes.&lt;/p&gt;
&lt;div class="container-fluid"&gt;
    &lt;div class="row"&gt;
        &lt;div id="a1" class="col-xs-6"&gt;&lt;/div&gt;
        &lt;div id="a2" class="col-xs-6"&gt;&lt;/div&gt;
    &lt;/div&gt;
    &lt;div class="row"&gt;
        &lt;div id="a3" class="col-xs-6"&gt;&lt;/div&gt;
        &lt;div id="a4" class="col-xs-6"&gt;&lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;style&gt;
.axis path,
.axis line {
    fill: none;
    stroke: black;
    shape-rendering: crispEdges;
}
.axis text {
    font-family: sans-serif;
    font-size: 11px;
}
.label {
    font-family: sans-serif;
    font-variant: small-caps;
    font-weight: normal;
    font-size: x-large;
}
&lt;/style&gt;

&lt;script&gt;
// Each set in Anscombe's Quartet has the same summary numbers
// Better to use a javascript stats lib to calculate all this
// inside the chart; oh well, a shortcut for now
var xmean = 9;
var ymean = 7.5;
var xsd = 11;
var ysd = 4.1245; // fudging slightly three digits down
var slope = 0.5;
var intercept = 3.0;
var qnorm = [-1.383, -0.967, -0.674, -0.431, -0.210, 0.0, 0.210, 0.431,
    0.674, 0.967, 1.383];

// colorbrewer "spectral" 11
var colors = ["#9e0142", "#d53e4f", "#f46d43", "#fdae61", "#fee08b",
    "#ffffbf", "#e6f598", "#abdda4", "#66c2a5", "#3288bd", "#5e4fa2"];
var color_scale = d3.scale.ordinal()
    .domain([0, 10])
    .range(colors);

function expected(index) {
    return (slope * index) + intercept;
};

function regcycle() {
    var width = 400;
    var height = 400;
    var padding = 30;
    var buffer = 1.1;
    var duration = 2000;
    var delay = 2000;

    function my(sel) {
        // generate a unique id for the named anchors
        var data = [];
        // this seems wrong
        var seldata = sel.data()[0];
        // reshape the data
        for (i=0; i &lt; seldata.x.length; i++) {
            var obs = {
                x: seldata.x[i],
                y: seldata.y[i],
                residual: seldata.error[i], 
                cooks: seldata.cooks[i],
                quantile: seldata.quantile[i],
            };
            data.push(obs); 
        };
        var uid = Math.round(Math.random() * 1024);
        var min_x = d3.min(data, function(d) { return d.x; }) - 1;
        var max_x = d3.max(data, function(d) { return d.x; }) + 1;
        var min_y = d3.min(data, function(d) { return d.y; }) - 1;
        var max_y = d3.max(data, function(d) { return d.y; }) + 1;
        var max_residual = d3.max(data, 
            function(d) { return Math.abs(d.residual); });
        var max_cooks = d3.max(data, function(d) { return d.cooks; })
        var max_quantile = d3.max(data,
            function(d) { return Math.abs(d.quantile); }); 
        var max_qnorm = d3.max(qnorm);

        // if the Cook's values are all low, lower the threshold so
        // we can still discern individual values
        if (max_cooks &gt;= 0.5) {
            if (max_cooks &lt;= 1.1) {
                max_cooks = 1.1;
            };
        };
        // check for NaN values, set a high value if present
        if (seldata.cooks.some(isNaN)) {
            max_cooks = 2;
        };

        var svg = sel.append("svg")
            .attr("width", width)
            .attr("height", height);

        // how much of the setup should be outside of the specific 
        // functions? it's repeating a lot for this first one...

        // x and y scales, axes, for the basic fit plot
        var x = d3.scale.linear()
            .domain([min_x, max_x])
            .range([padding, width - padding]);
        var x_axis = d3.svg.axis()
            .orient("bottom")
            .scale(x);
        var y = d3.scale.linear()
            .domain([min_y, max_y])
            .range([height - padding, padding]);
        var y_axis = d3.svg.axis()
            .orient("left")
            .scale(y);

        // sel contains general data/info like the regression line
        svg.append("line")
            .attr("id", "line" + uid)
            .attr("x1", x(min_x))
            .attr("y1", y(expected(min_x)))
            .attr("x2", x(max_x))
            .attr("y2", y(expected(max_x)))
            .attr("stroke-width", 2)
            .attr("stroke", "steelblue");

        // g binds to the data; this feels like an unneeded two-step
        // when sel is already bound too, perhaps a mistake?
        var g = svg.selectAll("g")
            .data(data)
            .enter().append("g")
            .attr("class", "object");

        // styling elements should be in css, not here
        g.each(function(d, i) {
            var o = d3.select(this);
            o.attr("class", "observation");
            o.append("line")
                .attr("x1", x(d.x))
                .attr("y1", y(d.y))
                .attr("x2", x(d.x))
                .attr("y2", y(expected(d.y)))
                .attr("class", "residual-bar")
                .attr("stroke-width", 0)
                .attr("stroke", "gray");
            o.append("circle")
                .attr("r", 5) // hard-coded!
                .attr("cx", x(d.x))
                .attr("cy", y(d.y))
                .attr("class", "data-point")
                .attr("stroke", "black")
                .attr("fill", color_scale(i));
        });

        // establish initial axes
        svg.append("g")
            .attr("id", "x_axis" + uid)
            .attr("class", "axis")
            .attr("transform", "translate(0, " + (height - padding) + ")")
            .call(x_axis);
        svg.append("g")
            .attr("id", "y_axis" + uid)
            .attr("class", "axis")
            .attr("transform", "translate(" + padding + ", 0)")
            .call(y_axis);

        // initial label
        svg.append("text")
            .attr("id", "label" + uid)
            .attr("class", "label")
            .attr("x", 40) // hard-coded!
            .attr("y", 40) // hard-coded!
            .text("model fit");

        setTimeout(residual, delay);

        // should these be inside this function or one level up?
        // does it matter?
        function fit() {
            // reset scales/axes for fit plot
            x = d3.scale.linear()
                .domain([min_x, max_x])
                .range([padding, width - padding]);
            x_axis = d3.svg.axis()
                .orient("bottom")
                .scale(x);
            y = d3.scale.linear()
                .domain([min_y, max_y])
                .range([height - padding, padding]);
            y_axis = d3.svg.axis()
                .orient("left")
                .scale(y);
            svg.select("#x_axis" + uid).transition()
                .duration(duration)
                .call(x_axis);
            svg.select("#y_axis" + uid).transition()
                .duration(duration)
                .call(y_axis);

            label = svg.select("#label" + uid).transition()
                .duration(duration)
                .text("fit model");

            line = svg.select("#line" + uid).transition()
                .duration(duration)
                .attr("x1", x(min_x))
                .attr("y1", y(expected(min_x)))
                .attr("x2", x(max_x))
                .attr("y2", y(expected(max_x)));

            var c = svg.selectAll(".observation");
            c.each(function(d, i) {
                var o = d3.select(this);
                o.select(".residual-bar").transition()
                    .duration(duration)
                    .attr("x1", x(d.x))
                    .attr("y1", y(d.y))
                    .attr("x2", x(d.x))
                    .attr("y2", y(expected(i)))
                    .attr("stroke-width", 0);
                o.select(".data-point").transition()
                    .duration(duration)
                    .attr("cx", x(d.x))
                    .attr("cy", y(d.y));
            });

            setTimeout(residual, delay + duration);
        };

        function residual() {
            // reset y scale/axis 
            y = d3.scale.linear()
                .domain([-max_residual, max_residual])
                .range([height - padding, padding]);
            y_axis = d3.svg.axis()
                .orient("left")
                .scale(y);
            svg.select("#y_axis" + uid).transition()
                .duration(duration)
                .call(y_axis);

            label = svg.select("#label" + uid).transition()
                .duration(duration)
                .text("residuals");

            line = svg.select("#line" + uid).transition()
                .duration(duration)
                .attr("y1", y(0))
                .attr("y2", y(0));

            var c = svg.selectAll(".observation");
            c.each(function(d, i) {
                var o = d3.select(this);
                o.select(".data-point").transition()
                    .duration(duration)
                    .attr("cy", y(d.residual));
                o.select(".residual-bar").transition()
                    .delay(duration)
                    .attr("x1", x(d.x))
                    .attr("y1", y(d.residual))
                    .attr("x2", x(d.x))
                    .attr("y2", y(0))
                    .attr("stroke-width", 3); // style hard-coded
            });

            setTimeout(cooks, delay + duration);
        };

        function cooks() {
            // reset scale / axis for cooks, x in order, not by value
            x = d3.scale.linear()
                .domain([0, data.length])
                .range([padding, width - padding]);
            x_axis = d3.svg.axis()
                .orient("bottom")
                .scale(x);
            svg.select("#x_axis" + uid).transition()
                .duration(duration)
                .call(x_axis);
            y = d3.scale.linear()
                .domain([0, max_cooks])
                .range([height - padding, padding]);
            y_axis= d3.svg.axis()
                .orient("left")
                .scale(y);
            svg.select("#y_axis" + uid).transition()
                .duration(duration)
                .call(y_axis);

            label = svg.select("#label" + uid).transition()
                .duration(duration)
                .text("cook's distance");

            line = svg.select("#line" + uid).transition()
                .duration(duration)
                .attr("x1", x(0))
                .attr("y1", y(1))
                .attr("x2", x(data.length))
                .attr("y2", y(1));

            var c = svg.selectAll(".observation");
            c.each(function(d, i) {
                var o = d3.select(this);
                o.select(".data-point").transition()
                    .duration(duration)
                    .attr("cx", x(i + 1))
                    .attr("cy", y(isNaN(d.cooks) ? 50 : d.cooks));
                o.select(".residual-bar").transition()
                    .duration(duration)
                    .attr("x1", x(i + 1))
                    .attr("y1", y(isNaN(d.cooks) ? 50 : d.cooks))
                    .attr("x2", x(i + 1))
                    .attr("y2", y(0));
            });

            setTimeout(qq, delay + duration);
        };

        function qq() {
            // reset x scale/axis to normal quantiles
            x = d3.scale.linear()
                .domain([-max_qnorm * buffer, max_qnorm * buffer])
                .range([padding, width - padding]);
            x_axis = d3.svg.axis()
                .orient("bottom")
                .scale(x);
            svg.select("#x_axis" + uid).transition()
                .duration(duration)
                .call(x_axis);

            // reset y scale/axis to observed quantiles
            y = d3.scale.linear()
                .domain([-max_quantile * buffer, max_quantile * buffer])
                .range([height - padding, padding]);
            y_axis= d3.svg.axis()
                .orient("left")
                .scale(y);
            svg.select("#y_axis" + uid).transition()
                .duration(duration)
                .call(y_axis);

            label = svg.select("#label" + uid).transition()
                .duration(duration)
                .text("q-q normal vs. observed");

            line = svg.select("#line" + uid).transition()
                .duration(duration)
                .attr("y1", y(-max_quantile))
                .attr("y2", y(max_quantile));

            // sort the data to align Q-Q
            var quantiles = data.map(function(d) { return d.quantile; });
            var sorted = quantiles.sort(function(a, b) { return a - b; });
            var c = svg.selectAll(".observation");
            c.each(function(d, i) {
                var o = d3.select(this);
                o.select(".data-point").transition()
                    .duration(duration)
                    .attr("cx", x(qnorm[i]))
                    .attr("cy", y(sorted[i]));
                o.select(".residual-bar").transition()
                    .attr("stroke-width", 0);
            });

            setTimeout(fit, delay + duration);
        };
    };

    // add accessors here some other time :)
    return my;
};

// init the four charts
var a1_cycle = regcycle();
var a2_cycle = regcycle();
var a3_cycle = regcycle();
var a4_cycle = regcycle();

// grab data, bind to charts, and render
d3.json("/data/20141112-animating-anscombe.json", function(data) {
    d3.select("#a1")
        .datum(data.a1)
        .call(a1_cycle);
    d3.select("#a2")
        .datum(data.a2)
        .call(a2_cycle);
    d3.select("#a3")
        .datum(data.a3)
        .call(a3_cycle);
    d3.select("#a4")
        .datum(data.a4)
        .call(a4_cycle);
});
&lt;/script&gt;

&lt;p&gt;This seems about right. Some additional changes proved necessary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;color! highlighting each data point with a color from the &lt;a href="http://colorbrewer2.org/"&gt;color
 brewer&lt;/a&gt; "spectral" should support a
 viewer's ability to follow any specific observation through the
 four plots.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;the JSON output from R I described above was more awkward to
 work with than a more row- or observation-oriented dataset shape,
 so there's a quick reshaping step. This results in simple references
 to the data values.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;reshape&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;into&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;observations&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seldata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nf"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;obs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nl"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seldata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nl"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seldata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nl"&gt;residual&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seldata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nl"&gt;cooks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seldata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cooks&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nl"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seldata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quantile&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obs&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;
&lt;span class="err"&gt;}&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;the Cook's Distance calculation for dataset four results in a
 NaN value for the far-right value, so I added a check for that to
 result in shooting the data point straight up way off the viewpane.
 This is perhaps not viable statistically but it feeds the animation
 well, specifically in the transition to the Q-Q plot, making the
 story told clearer to my eye. Following the bottom right pane,
 watch the green dot snap back into place at the very end of the
 transition to Q-Q and you get the effect.  The data check for the
 NaN is simple but effective:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;svg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;selectAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;.observation&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;each&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;this&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;.data-point&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transition&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cx&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cy&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;isNaN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cooks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cooks&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;.residual-bar&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transition&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;x1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;y1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;isNaN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cooks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cooks&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;x2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;y2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;added a 1.1 buffer factor around the input domain for several
 of the axes to draw the data inside the axis lines.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;bringing the residual and distance bars into the two relevant
 plots proved to require more attention. because the data points
 move around, the bars can be left in a position from an earlier
 plot that doesn't make sense two plots later. that could lead to
 the bars whooshing in from odd angles as they reappear, which is
 awkward, detracting from the intended narrative. This temporary
 mistake evokes that awkwardness:&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img alt="this temporary mistake" src="https://data.onebiglibrary.net/2014/11/12/animated-anscombe-quartet-regression-diagnostics/20141112-anscombe-error.png"&gt; &lt;/p&gt;
&lt;h3&gt;Left as an exercise for the writer&lt;/h3&gt;
&lt;p&gt;There are several unresolved issues I would like to revisit:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;synchronizing the transitions among four different chart instances
 doesn't seem to have a single obvious solution. you can see them fall
 out of sync if you want the cycle long enough, and avoiding that might
 require some sort of clock check or simple communication pattern. even
 so, the eye can only meaningfully follow one plot at a time, so it
 doesn't ruin the effect, and if you believe google analytics few readers
 spend more than one minute per page on this site, so it's not a serious
 problem here, now.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;i can see case for pulling the axis resetting back out of the
 individual plot modes again; it's a little cumbersome to keep
 reassigning each time.  on the other hand, this way all the logic
 for a plot is self-contained, so it would feel a little cleaner
 to add more plots to the reel without having to bounce around and
 keep track of a dozen different scale and axis variables.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;would be nice to jitter or spread out the residuals so the bars
 don't overlap like on the fourth dataset.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;no configuration accessors keeps this from being particularly
 resuable by anybody else, but that's the other side of that line
 i drew in going for that middle ground. other homework awaits!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;several details are hard-coded, like the color scale and the
 size and styling of different elements.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;if &lt;a href="http://jstat.github.io/"&gt;jstat&lt;/a&gt; or &lt;a href="https://github.com/tmcw/simple-statistics"&gt;simple
 statistics&lt;/a&gt; had a cook's
 distance function we could take arbitrary datasets and render them
 all inline, or at least as part of the reusable graph.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Always good to have something to work on next.&lt;/p&gt;</content><category term="20141112-animated-anscombe-quartet-regression-diagnostics"/></entry><entry><title>animating regression, part 2</title><link href="https://data.onebiglibrary.net/2014/10/18/animating-regression-part-2/" rel="alternate"/><published>2014-10-18T00:00:00-04:00</published><updated>2014-10-18T00:00:00-04:00</updated><author><name>dchud</name></author><id>tag:data.onebiglibrary.net,2014-10-18:/2014/10/18/animating-regression-part-2/</id><summary type="html">&lt;p&gt;Returning to the question of animating a regression model and its
residuals. &lt;a href="http://data.onebiglibrary.net/2014/09/18/animating-regression/"&gt;Part
1&lt;/a&gt; stepped
forward through basic animation with d3 toward a simplistic regression
model and a first view of residuals. Let's take that through two
new views, a Q-Q plot and a plot of potential outliers with Cook's …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Returning to the question of animating a regression model and its
residuals. &lt;a href="http://data.onebiglibrary.net/2014/09/18/animating-regression/"&gt;Part
1&lt;/a&gt; stepped
forward through basic animation with d3 toward a simplistic regression
model and a first view of residuals. Let's take that through two
new views, a Q-Q plot and a plot of potential outliers with Cook's
Distance. In this part we'll complete a full cycle through these four
plots, and in a final piece we'll review and tweak the look of it all
and add some more interesting data into the mix.&lt;/p&gt;
&lt;p&gt;Picking up where we left off, we have a regression line transitioning
to a view of residuals. Let's start by emphasizing that those residual
distances are most valuable in the latter view, and remove them from the
model view.&lt;/p&gt;
&lt;div id='fig1'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 400;
var height = 400;
var duration = 2000;
var delay = 2000;

var data = [15, 22, 34, 53, 48, 60, 95, 79, 88, 109, 92];

var slope = 9.036;
var intercept = 18;
function expected(index) {
    return (slope * index) + intercept;
};

var fig1 = d3.select("#fig1").append("svg")
    .attr("width", width)
    .attr("height", height);

var padding = 30;

var x = d3.scale.linear()
    .domain([0, data.length])
    .range([padding, width - padding]);
var y = d3.scale.linear()
    .domain([d3.min(data), d3.max(data)])
    .range([height - padding, padding]);

var g = fig1.selectAll("g")
        .data(data)
        .enter().append("g")
        .attr("class", "object");

fig1.append("line")
    .attr("id", "line")
    .attr("x1", x(0))
    .attr("y1", y(intercept))
    .attr("x2", x(11))
    .attr("y2", y(expected(11)))
    .attr("stroke-width", 2)
    .attr("stroke", "steelblue");

g.each(function(d, i) {
    var o = d3.select(this);
    o.attr("class", "observation");
    o.append("line")
        .attr("x1", x(i))
        .attr("y1", y(d))
        .attr("x2", x(i))
        .attr("y2", y(expected(i)))
        .attr("class", "residual-bar")
        .attr("stroke-width", 0)
        .attr("stroke", "gray");
    o.append("circle")
        .attr("r", 5)
        .attr("cx", x(i))
        .attr("cy", y(d))
        .attr("stroke", "black")
        .attr("fill", "darkslategrey");
});


setTimeout(residual, delay);

function fit() {
    line = fig1.select("#line").transition()
        .duration(duration)
        .attr("y1", y(intercept))
        .attr("y2", y(expected(11)));

    var c = fig1.selectAll(".observation");
    c.each(function(d, i) {
        var o = d3.select(this);
        o.transition()
            .duration(duration)
            .attr("transform", "translate(0, 0)");
        residual_bars = o.select(".residual-bar").transition()
            .duration(duration)
            .attr("stroke-width", 0);
    });

    setTimeout(residual, delay + duration);
};

function residual() {
    line = fig1.select("#line").transition()
        .duration(duration)
        .attr("y1", height/2)
        .attr("y2", height/2);

    var c = fig1.selectAll(".observation");
    c.each(function(d, i) {
        var o = d3.select(this);
        o.transition()
            .duration(duration)
            .attr("transform", "translate(0, " + (200 - y(expected(i))) + ")");
        residual_bars = o.select(".residual-bar").transition()
            .delay(delay / 2)
            .duration(duration)
            .attr("stroke-width", 3);
    });

    setTimeout(fit, delay + duration);
};
&lt;/script&gt;

&lt;h4&gt;Showing residuals properly&lt;/h4&gt;
&lt;p&gt;Let's improve on this by showing appropriate axes for each view.
Then, during the transitions, we will re-scale the y-axis to the
residual values and back to the true data points again.&lt;/p&gt;
&lt;div id='fig2'&gt;&lt;/div&gt;
&lt;style&gt;
.axis path,
.axis line {
    fill: none;
    stroke: black;
    shape-rendering: crispEdges;
}
.axis text {
    font-family: sans-serif;
    font-size: 11px;
}
&lt;/style&gt;

&lt;script&gt;
// using same data, add in the residuals too this time
var residuals = [];
data.forEach(function(d, i) {
    residuals.push(expected(i) - d);
});
var max_residual = d3.max(residuals, function(d) { return Math.abs(d); });

var fig2 = d3.select("#fig2").append("svg")
    .attr("width", width)
    .attr("height", height);

var x = d3.scale.linear()
    .domain([0, data.length])
    .range([padding, width - padding]);
var y = d3.scale.linear()
    .domain([d3.min(data), d3.max(data)])
    .range([height - padding, padding]);
var y_residuals = d3.scale.linear()
    .domain([-max_residual, max_residual])
    .range([height - padding, padding]);

var x_axis = d3.svg.axis()
    .orient("bottom")
    .scale(x);
var y_axis = d3.svg.axis()
    .orient("left")
    .scale(y);
var y_residuals_axis = d3.svg.axis()
    .orient("left")
    .scale(y_residuals);

var g = fig2.selectAll("g")
    .data(data)
    .enter().append("g")
    .attr("class", "object");

fig2.append("line")
    .attr("id", "line")
    .attr("x1", x(0))
    .attr("y1", y(intercept))
    .attr("x2", x(11))
    .attr("y2", y(expected(11)))
    .attr("stroke-width", 2)
    .attr("stroke", "steelblue");

g.each(function(d, i) {
    var o = d3.select(this);
    o.attr("class", "observation");
    o.append("line")
        .attr("x1", x(i))
        .attr("y1", y(d))
        .attr("x2", x(i))
        .attr("y2", y(expected(i)))
        .attr("class", "residual-bar")
        .attr("stroke-width", 0)
        .attr("stroke", "gray");
    o.append("circle")
        .attr("r", 5)
        .attr("cx", x(i))
        .attr("cy", y(d))
        .attr("stroke", "black")
        .attr("fill", "darkslategrey");
});

fig2.append("g")
    .attr("id", "x_axis")
    .attr("class", "axis")
    .attr("transform", "translate(0, " + (height - padding) + ")")
    .call(x_axis);
fig2.append("g")
    .attr("id", "y_axis")
    .attr("class", "axis")
    .attr("transform", "translate(" + padding + ", 0)")
    .call(y_axis);

setTimeout(residual2, delay);

function fit2() {
    line = fig2.select("#line").transition()
        .duration(duration)
        .attr("y1", y(intercept))
        .attr("y2", y(expected(11)));

    var c = fig2.selectAll(".observation");
    c.each(function(d, i) {
        var o = d3.select(this);
        o.transition()
            .duration(duration)
            .attr("transform", "translate(0, 0)");
        residual_bars = o.select(".residual-bar").transition()
            .duration(duration)
            .attr("stroke-width", 0);
    });

    fig2.select("#y_axis").transition()
        .duration(duration)
        .call(y_axis);

    setTimeout(residual2, delay + duration);
};

function residual2() {
    line = fig2.select("#line").transition()
        .duration(duration)
        .attr("y1", y_residuals(0))
        .attr("y2", y_residuals(0));

    var c = fig2.selectAll(".observation");
    c.each(function(d, i) {
        var o = d3.select(this);
        o.transition()
            .duration(duration)
            .attr("transform",
                "translate(0, " + (y_residuals(0) - y(expected(i))) + ")");
        residual_bars = o.select(".residual-bar").transition()
            .duration(duration)
            .attr("stroke-width", 3);
    });

    fig2.select("#y_axis").transition()
        .duration(duration)
        .call(y_residuals_axis);

    setTimeout(fit2, delay + duration);
};
&lt;/script&gt;

&lt;p&gt;Some quick notes about this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;To relocate the residuals and the corresponding scale, the residual
 values are now explicitly calculated:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;residuals&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;residuals&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Then, we look for the maximum residual value to define the domain of
 the y-scale:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;max_residual&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;residuals&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Math&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;abs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y_residuals&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linear&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;max_residual&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;max_residual&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;range&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;padding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;padding&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Finally, whereas in the first version above, I just picked an arbitrary 
 y-location (200) to anchor the residual bars after the transition...&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;o.transition()
    .duration(duration)
    .attr(&amp;quot;transform&amp;quot;, &amp;quot;translate(0, &amp;quot; + (200 - y(expected(i))) + &amp;quot;)&amp;quot;);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;...now we can locate these correctly according to the residuals scale.
 We just have to replace the arbitrary location with the exact midpoint
 of the y_residuals scale:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;o.transition()
    .duration(duration)
    .attr(&amp;quot;transform&amp;quot;,
        &amp;quot;translate(0, &amp;quot; + (y_residuals(0) - y(expected(i))) + &amp;quot;)&amp;quot;);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Adding Cook's Distance&lt;/h4&gt;
&lt;p&gt;In simple linear regression models like this, outlier values can
influence the slope model significantly, making predictions based
on resulting model with strong outliers less accurate than desired.
A standard way to evaluate whether any outliers exist in a dataset
is to examine &lt;a href="http://en.wikipedia.org/wiki/Cook's_distance"&gt;Cook's
Distance&lt;/a&gt;.  It's easy
enough to calculate in R:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;lm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;formula&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;cooks.distance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The basic rule of thumb is to look for any Cook's Distance values
of 1 or greater. It's easy enough to plot with this in mind, typical
graphs for Cook's D show the values as vertical bars with a horizontal
line at 1. We'll need to transition the y scale/axis again, and a detail
to notice is that the Cook's D values might all be well below 1, so we
need to make a choice:  if the values are well below 1, leave the line
off completely. If any data points approach or pass 1, show the line. The
risk is that if none of the values are particularly large at all (e.g.
all below 0.10, as in the example diagnostics image at the top of
&lt;a href="http://data.onebiglibrary.net/2014/09/18/animating-regression/"&gt;part 1&lt;/a&gt;)
then if we scale the y axis all the way to 1, the variation among the
small values will blur down into nothing. To handle this, we'll look 
for a mid-range value like 0.33 or 0.5, and if the max Cook's D is
below that line, we'll scale the axis with the narrower value domain;
otherwise, we'll scale it up through 1.&lt;/p&gt;
&lt;div id='fig3'&gt;&lt;/div&gt;
&lt;style&gt;
.axis path,
.axis line {
    fill: none;
    stroke: black;
    shape-rendering: crispEdges;
}
.axis text {
    font-family: sans-serif;
    font-size: 11px;
}
&lt;/style&gt;

&lt;script&gt;
// using same data, add in the residuals too this time
var residuals = [];
data.forEach(function(d, i) {
    residuals.push(d - expected(i));
});
var max_residual = d3.max(residuals, function(d) { return Math.abs(d); });

var cooks = [0.0267, 0.0445, 0.0047, 0.045, 0.0202, 0.0048, 0.2774, 0.0037,
    0.0057, 0.1642, 0.7934];
var max_cooks = d3.max(cooks);
if (max_cooks &gt;= 0.5) {
    if (max_cooks &gt;= 1.1) {
        ;
    } else {
        max_cooks = 1.1;
    }
};

var fig3 = d3.select("#fig3").append("svg")
    .attr("width", width)
    .attr("height", height);

var x = d3.scale.linear()
    .domain([0, data.length])
    .range([padding, width - padding]);
var y = d3.scale.linear()
    .domain([d3.min(data), d3.max(data)])
    .range([height - padding, padding]);
var y_residuals = d3.scale.linear()
    .domain([-max_residual, max_residual])
    .range([height - padding, padding]);
var y_cooks = d3.scale.linear()
    .domain([0, max_cooks])
    .range([height - padding, padding]);

var x_axis = d3.svg.axis()
    .orient("bottom")
    .scale(x);
var y_axis = d3.svg.axis()
    .orient("left")
    .scale(y);
var y_residuals_axis = d3.svg.axis()
    .orient("left")
    .scale(y_residuals);
var y_cooks_axis = d3.svg.axis()
    .orient("left")
    .scale(y_cooks)

var g = fig3.selectAll("g")
    .data(data)
    .enter().append("g")
    .attr("class", "object");

fig3.append("line")
    .attr("id", "line")
    .attr("x1", x(0))
    .attr("y1", y(intercept))
    .attr("x2", x(11))
    .attr("y2", y(expected(11)))
    .attr("stroke-width", 2)
    .attr("stroke", "steelblue");

g.each(function(d, i) {
    var o = d3.select(this);
    o.attr("class", "observation");
    o.append("line")
        .attr("x1", x(i))
        .attr("y1", y(d))
        .attr("x2", x(i))
        .attr("y2", y(expected(i)))
        .attr("class", "residual-bar")
        .attr("stroke-width", 0)
        .attr("stroke", "gray");
    o.append("circle")
        .attr("r", 5)
        .attr("cx", x(i))
        .attr("cy", y(d))
        .attr("class", "data-point")
        .attr("stroke", "black")
        .attr("fill", "darkslategrey");
});

fig3.append("g")
    .attr("id", "x_axis")
    .attr("class", "axis")
    .attr("transform", "translate(0, " + (height - padding) + ")")
    .call(x_axis);
fig3.append("g")
    .attr("id", "y_axis")
    .attr("class", "axis")
    .attr("transform", "translate(" + padding + ", 0)")
    .call(y_axis);

setTimeout(residual3, delay);

function fit3() {
    line = fig3.select("#line").transition()
        .duration(duration)
        .attr("y1", y(intercept))
        .attr("y2", y(expected(11)));

    var c = fig3.selectAll(".observation");
    c.each(function(d, i) {
        var o = d3.select(this);
        o.select(".residual-bar").transition()
            .duration(duration)
            .attr("y1", y(d))
            .attr("y2", y(expected(i)))
            .attr("stroke-width", 0);
        o.select(".data-point").transition()
            .duration(duration)
            .attr("cy", y(d));
    });

    fig3.select("#y_axis").transition()
        .duration(duration)
        .call(y_axis);

    setTimeout(residual3, delay + duration);
};

function residual3() {
    line = fig3.select("#line").transition()
        .duration(duration)
        .attr("y1", y_residuals(0))
        .attr("y2", y_residuals(0));

    var c = fig3.selectAll(".observation");
    c.each(function(d, i) {
        var o = d3.select(this);
        o.select(".data-point").transition()
            .duration(duration)
            .attr("cy", y_residuals(residuals[i]));
        o.select(".residual-bar").transition()
            .duration(duration)
            .attr("y1", y_residuals(residuals[i]))
            .attr("y2", y_residuals(0))
            .attr("stroke-width", 3);
    });

    fig3.select("#y_axis").transition()
        .duration(duration)
        .call(y_residuals_axis);

    setTimeout(cooks3, delay + duration);
};

function cooks3() {
    line = fig3.select("#line").transition()
        .duration(duration)
        .attr("y1", y_cooks(1))
        .attr("y2", y_cooks(1));

    var c = fig3.selectAll(".observation");
    c.each(function(d, i) {
        var o = d3.select(this);
        o.select(".data-point").transition()
            .duration(duration)
            .attr("cy", y_cooks(cooks[i]));
        o.select(".residual-bar").transition()
            .duration(duration)
            .attr("y1", y_cooks(cooks[i]))
            .attr("y2", y_cooks(0));
    });

    fig3.select("#y_axis").transition()
        .duration(duration)
        .call(y_cooks_axis);

    setTimeout(fit3, delay + duration);
}
&lt;/script&gt;

&lt;p&gt;This took some fiddling - ultimately, removing the transform/translate()
bits from the section described earlier made this simpler. Instead
of translating the values, directly setting the y values based on
the appropriate scale functions for each view mode is more direct.
As it turns out, the translation above wasn't even correct, so this
anchors us back in cleaner code with less of a cognitive gap to
verify that the values are accurate. Lesson learned: don't fall
back to SVG translate() when simpler (and higher-order) d3 scales
will do the job.&lt;/p&gt;
&lt;p&gt;Once it started working correctly an immediate benefit of this
animation approach became clear. Look at the data point at &lt;code&gt;x=6&lt;/code&gt;.
It looks as if it's a big outlier, especially when we shift into
the residuals view, where it carries the largest residual error.
But when shifting into the Cook's Distance view, its impact as an
outlier proves to be much less than that of the point at &lt;code&gt;x=10&lt;/code&gt;,
which is roughly 0.8. This makes intuitive sense when shifting back
to the model fit view; &lt;code&gt;x=10&lt;/code&gt; drags the slope of the model down
substantially, enough to be wary of, even if not enough to consider
throwing the value out.&lt;/p&gt;
&lt;h4&gt;Adding the Q-Q Plot&lt;/h4&gt;
&lt;p&gt;To add the &lt;a href="http://en.wikipedia.org/wiki/Q%E2%80%93Q_plot"&gt;Q-Q plot&lt;/a&gt;
we have to calculate the quantile of each value and plot that against
normal quantiles. To make it work with the animation loop, though,
we further have to sort the quantiles and plot them all in the
correct order.&lt;/p&gt;
&lt;p&gt;The plot itself should be straightforward, with the normal quantiles
and observed values on the x- and y-axis, respectively, and a normal
line running through it all.&lt;/p&gt;
&lt;p&gt;Because the axes and points are moving around so much, we'll add
a simple title label and update it as we switch plots.&lt;/p&gt;
&lt;div id='fig4'&gt;&lt;/div&gt;
&lt;style&gt;
.label {
    font-family: sans-serif;
    font-variant: small-caps;
    font-weight: normal;
    font-size: x-large;
}
&lt;/style&gt;
&lt;script&gt;
var data4 = [
    {"cooks": 0.0267, "error": -3.0, "q": -1.522, 
        "index": 0, "raw": 15, "qqindex": 0},
    {"cooks": 0.0445, "error": -5.036, "q": -1.3009, 
        "index": 1, "raw": 22, "qqindex": 1},
    {"cooks": 0.0047, "error": -2.072, "q": -0.9218, 
        "index": 2, "raw": 34, "qqindex": 2},
    {"cooks": 0.045, "error": 7.892, "q": -0.3216,
        "index": 3, "raw": 53, "qqindex": 4},
    {"cooks": 0.0202, "error": -6.144, "q": -0.4796,
        "index": 4, "raw": 48, "qqindex": 3},
    {"cooks": 0.0048, "error": -3.18, "q": -0.1005, 
        "index": 5, "raw": 60, "qqindex": 5},
    {"cooks": 0.2774, "error": 22.784, "q": 1.0051, 
        "index": 6, "raw": 95, "qqindex": 7},
    {"cooks": 0.0037, "error": -2.252, "q": 0.4997, 
        "index": 7, "raw": 79, "qqindex": 8},
    {"cooks": 0.0057, "error": -2.288, "q": 0.784, 
        "index": 8, "raw": 88, "qqindex": 10},
    {"cooks": 0.1642, "error": 9.676, "q": 1.4473, 
        "index": 9, "raw": 109, "qqindex": 6},
    {"cooks": 0.7934, "error": -16.36, "q": 0.9103, 
        "index": 10, "raw": 92, "qqindex": 9}
    ];
var qnorm = [-1.383, -0.967, -0.674, -0.431, -0.210, 0.0, 0.210, 0.431,
    0.674, 0.967, 1.383];
var xbar = 63.182;
var min_raw4 = d3.min(data4, function(d) { return d.raw; });
var max_raw4 = d3.max(data4, function(d) { return d.raw; });
var max_residual4 = d3.max(data4, function(d) { return Math.abs(d.error); });
var max_cooks4 = d3.max(data4, function(d) { return d.cooks; });
var max_q = d3.max(data4, function(d) { return Math.abs(d.q); });
var max_qnorm = d3.max(qnorm);
var buffer = 1.1;

if (max_cooks4 &gt;= 0.5) {
    if (max_cooks4 &gt;= 1.1) {
        ;
    } else {
        max_cooks4 = 1.1;
    }
};

var fig4 = d3.select("#fig4").append("svg")
    .attr("width", width)
    .attr("height", height);

var x4 = d3.scale.linear()
    .domain([0, data4.length])
    .range([padding, width - padding]);
var x_qnorm4 = d3.scale.linear()
    .domain([-max_qnorm * buffer, max_qnorm * buffer])
    .range([padding, width - padding]);
var y4 = d3.scale.linear()
    .domain([min_raw4, max_raw4])
    .range([height - padding, padding]);
var y_residuals4 = d3.scale.linear()
    .domain([-max_residual4, max_residual4])
    .range([height - padding, padding]);
var y_cooks4 = d3.scale.linear()
    .domain([0, max_cooks4])
    .range([height - padding, padding]);
var y_q4 = d3.scale.linear()
    .domain([-max_q * buffer, max_q * buffer])
    .range([height - padding, padding]);

var x_axis4 = d3.svg.axis()
    .orient("bottom")
    .scale(x4);
var x_qnorm_axis4 = d3.svg.axis()
    .orient("bottom")
    .scale(x_qnorm4);
var y_axis4 = d3.svg.axis()
    .orient("left")
    .scale(y4);
var y_residuals_axis4 = d3.svg.axis()
    .orient("left")
    .scale(y_residuals4);
var y_cooks_axis4 = d3.svg.axis()
    .orient("left")
    .scale(y_cooks4);
var y_q_axis4 = d3.svg.axis()
    .orient("left")
    .scale(y_q4);

var g4 = fig4.selectAll("g")
    .data(data4)
    .enter().append("g")
    .attr("class", "object");

fig4.append("line")
    .attr("id", "line4")
    .attr("x1", x4(0))
    .attr("y1", y4(intercept))
    .attr("x2", x4(11))
    .attr("y2", y4(expected(11)))
    .attr("stroke-width", 2)
    .attr("stroke", "steelblue");

g4.each(function(d, i) {
    var o = d3.select(this);
    o.attr("class", "observation");
    o.append("line")
        .attr("x1", x4(i))
        .attr("y1", y4(d.raw))
        .attr("x2", x4(i))
        .attr("y2", y4(expected(i)))
        .attr("class", "residual-bar")
        .attr("stroke-width", 0)
        .attr("stroke", "gray");
    o.append("circle")
        .attr("r", 5)
        .attr("cx", x4(i))
        .attr("cy", y4(d.raw))
        .attr("class", "data-point")
        .attr("stroke", "black")
        .attr("fill", "darkslategrey");
});

fig4.append("g")
    .attr("id", "x_axis4")
    .attr("class", "axis")
    .attr("transform", "translate(0, " + (height - padding) + ")")
    .call(x_axis4);
fig4.append("g")
    .attr("id", "y_axis4")
    .attr("class", "axis")
    .attr("transform", "translate(" + padding + ", 0)")
    .call(y_axis4);

fig4.append("text")
    .attr("id", "label")
    .attr("class", "label")
    .attr("x", 40)
    .attr("y", 40)
    .text("model fit");

setTimeout(residual4, delay);

function fit4() {
    label = fig4.select("#label").transition()
        .duration(duration)
        .text("fit model");

    line = fig4.select("#line4").transition()
        .duration(duration)
        .attr("y1", y4(intercept))
        .attr("y2", y4(expected(11)));

    var c = fig4.selectAll(".observation");
    c.each(function(d, i) {
        var o = d3.select(this);
        o.select(".residual-bar").transition()
            .duration(duration)
            .attr("x1", x4(i))
            .attr("y1", y4(d.raw))
            .attr("y2", y4(expected(i)))
            .attr("stroke-width", 0);
        o.select(".data-point").transition()
            .duration(duration)
            .attr("cx", x4(i))
            .attr("cy", y4(d.raw));
    });

    fig4.select("#x_axis4").transition()
        .duration(duration)
        .call(x_axis4);
    fig4.select("#y_axis4").transition()
        .duration(duration)
        .call(y_axis4);

    setTimeout(residual4, delay + duration);
};

function residual4() {
    label = fig4.select("#label").transition()
        .duration(duration)
        .text("residuals");

    line = fig4.select("#line4").transition()
        .duration(duration)
        .attr("y1", y_residuals4(0))
        .attr("y2", y_residuals4(0));

    var c = fig4.selectAll(".observation");
    c.each(function(d, i) {
        var o = d3.select(this);
        o.select(".data-point").transition()
            .duration(duration)
            .attr("cy", y_residuals4(d.error));
        o.select(".residual-bar").transition()
            .duration(duration)
            .attr("y1", y_residuals4(d.error))
            .attr("y2", y_residuals4(0))
            .attr("stroke-width", 3);
    });

    fig4.select("#y_axis4").transition()
        .duration(duration)
        .call(y_residuals_axis4);

    setTimeout(cooks4, delay + duration);
};

function cooks4() {
    label = fig4.select("#label").transition()
        .duration(duration)
        .text("cook's distance");

    line = fig4.select("#line4").transition()
        .duration(duration)
        .attr("y1", y_cooks4(1))
        .attr("y2", y_cooks4(1));

    var c = fig4.selectAll(".observation");
    c.each(function(d, i) {
        var o = d3.select(this);
        o.select(".data-point").transition()
            .duration(duration)
            .attr("cy", y_cooks4(d.cooks));
        o.select(".residual-bar").transition()
            .duration(duration)
            .attr("y1", y_cooks4(d.cooks))
            .attr("y2", y_cooks4(0));
    });

    fig4.select("#y_axis4").transition()
        .duration(duration)
        .call(y_cooks_axis4);

    setTimeout(qq4, delay + duration);
}

function qq4() {
    label = fig4.select("#label").transition()
        .duration(duration)
        .text("q-q normal vs. observed");

    line = fig4.select("#line4").transition()
        .duration(duration)
        .attr("y1", y_q4(-max_q))
        .attr("y2", y_q4(max_q));

    var sorted = data4.sort(function(a, b) { return a.q - b.q; });
    var c = fig4.selectAll(".observation");
    c.each(function(d, i) {
        var o = d3.select(this);
        o.select(".data-point").transition()
            .duration(duration)
            .attr("cx", x_qnorm4(qnorm[i]))
            .attr("cy", y_q4(sorted[i].q));
        o.select(".residual-bar").transition()
            .duration(duration)
            .attr("stroke-width", 0);
    });

    fig4.select("#x_axis4").transition()
        .duration(duration)
        .call(x_qnorm_axis4);
    fig4.select("#y_axis4").transition()
        .duration(duration)
        .call(y_q_axis4);

    setTimeout(fit4, delay + duration);
}
&lt;/script&gt;

&lt;p&gt;That rounds out the sketch.&lt;/p&gt;
&lt;p&gt;Now: to clean this up enough to be able to use it to render multiple
regressions side-by-side. Stay tuned...&lt;/p&gt;</content><category term="20141018-animating-regression-part-2"/></entry><entry><title>year's worth of dots</title><link href="https://data.onebiglibrary.net/2014/10/01/years-worth-of-dots/" rel="alternate"/><published>2014-10-01T00:00:00-04:00</published><updated>2014-10-01T00:00:00-04:00</updated><author><name>dchud</name></author><id>tag:data.onebiglibrary.net,2014-10-01:/2014/10/01/years-worth-of-dots/</id><summary type="html">&lt;p&gt;For a project at work we've collected a year's worth of samples
from a major non-US social media site. The samples are taken every
30 seconds, a snapshot of the most recent 200 public posts from all
users. This created a lot of files, and along the way we missed …&lt;/p&gt;</summary><content type="html">&lt;p&gt;For a project at work we've collected a year's worth of samples
from a major non-US social media site. The samples are taken every
30 seconds, a snapshot of the most recent 200 public posts from all
users. This created a lot of files, and along the way we missed
some in chunks for various reasons (network outage, service error,
reboot, etc.). The researcher we're supporting has happily taken a
copy of the 100+ GB (compressed) of data to start poring through,
but asked that we help prepare a simple visualization of the data
that's present - or more importantly, what's missing.&lt;/p&gt;
&lt;p&gt;Because it's natural to miss a few files here and there over a
year's time, it's not a problem unless there are big chunks missing
or patterns of errors that make sampling from this data problematic.
An image of what's there and what's not there needs to hit a few key
points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;cover the entire collection period (actually ~13 months)&lt;/li&gt;
&lt;li&gt;show missing files&lt;/li&gt;
&lt;li&gt;show empty files&lt;/li&gt;
&lt;li&gt;easily spot large gaps &lt;/li&gt;
&lt;li&gt;easily spot significant patterns&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In addition to the immediate use (the researcher's own knowledge
of what they have) this visualization needs to work for their
advisors and others interested in the work, so it should be readily
digested, by which I mean:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;should fit on one screen&lt;/li&gt;
&lt;li&gt;shouldn't require much explanation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To give a sense of volume, this set of files should be roughly &lt;code&gt;365
day/yr * 24 hr/day * 120 files/hr = 1,051,200&lt;/code&gt; files. It's a good
number. It's too many to read from disk in realtime, so this will
require preprocessing.&lt;/p&gt;
&lt;h4&gt;First sketch&lt;/h4&gt;
&lt;p&gt;Let's start with a rough picture of what it will take to fit the
dots onto one screen. One day's worth is &lt;code&gt;24 * 120 = 2880&lt;/code&gt; dots,
which is too much for one screen width of pixels, but if we can
divide it at least in half, we're getting closer. The 365-day year
is easier; we can multiply it by two or three and still fit a good
number of pixels in. So with this in mind, here's a &lt;code&gt;1440x730&lt;/code&gt; grid.&lt;/p&gt;
&lt;div id='sketch1'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 1440;
var height = 730;
var sketch1 = d3.select("#sketch1").append("svg")
    .attr("width", width)
    .attr("height", height);

var y = d3.scale.linear()
    .domain([0, 365])
    .range([0, height]);

d3.range(0, 365).forEach(function(ye, yi, ya) {
    sketch1.append("line")
        .attr("x1", 0)
        .attr("y1", y(ye))
        .attr("x2", width)
        .attr("y2", y(ye))
        .attr("stroke", "cadetblue")
        .attr("stroke-width", 1);
    }
);
&lt;/script&gt;

&lt;p&gt;Yah ok that's too wide.&lt;/p&gt;
&lt;p&gt;Let's try again, but half again as wide, but just for fun (and because
vertical scrolling isn't so hard) let's make it taller.&lt;/p&gt;
&lt;div id='sketch2'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 720;
var height = 1095;
var sketch2 = d3.select("#sketch2").append("svg")
    .attr("width", width)
    .attr("height", height);

var y = d3.scale.linear()
    .domain([0, 365])
    .range([0, height]);

d3.range(0, 365).forEach(function(ye, yi, ya) {
    sketch2.append("line")
        .attr("x1", 0)
        .attr("y1", y(ye))
        .attr("x2", width)
        .attr("y2", y(ye))
        .attr("stroke", "cadetblue")
        .attr("stroke-width", 1);
    }
);
&lt;/script&gt;

&lt;p&gt;One more time, with a grid and some date scales to shape it all out better:&lt;/p&gt;
&lt;div id='sketch3'&gt;&lt;/div&gt;
&lt;style&gt;
.axis path,
.axis line {
    fill: none;
    stroke: black;
    shape-rendering: crispEdges;
}
.axis text {
    font-family: sans-serif;
    font-size: 11px;
}
&lt;/style&gt;
&lt;script&gt;
var padding = 40;
var width = 720 + padding;
var height = 1095 + padding;
var sketch3 = d3.select("#sketch3").append("svg")
    .attr("width", width)
    .attr("height", height);

var x = d3.scale.linear()
    .domain([0, 720])
    .range([padding, width]);
var y = d3.scale.linear()
    .domain([0, 365])
    .range([padding/2, height - padding/2]);

var x_hours = d3.scale.linear()
    .domain([0, 23])
    .range([padding, width]);
var y_months = d3.scale.linear()
    .domain([0, 12])
    .range([padding/2, height - padding/2]);

d3.range(0, 24).forEach(function(he, hi, ha) {
    sketch3.append("line")
        .attr("x1", x_hours(he))
        .attr("y1", y(0))
        .attr("x2", x_hours(he))
        .attr("y2", y(365))
        .attr("stroke", "#ccc")
        .attr("stroke-width", 2);
    }
);

d3.range(0, 13).forEach(function(me, mi, ma) {
    sketch3.append("line")
        .attr("x1", x(0))
        .attr("y1", y_months(me))
        .attr("x2", x(720))
        .attr("y2", y_months(me))
        .attr("stroke", "#ccc")
        .attr("stroke-width", 2);
    }
);

d3.range(0, 365).forEach(function(ye, yi, ya) {
    sketch3.append("line")
        .attr("x1", x(0))
        .attr("y1", y(ye))
        .attr("x2", x(720))
        .attr("y2", y(ye))
        .attr("stroke", "cadetblue")
        .attr("stroke-width", 1);
    }
);

var x_axis1 = d3.svg.axis()
    .scale(x_hours)
    .orient("bottom");
var x_axis2 = d3.svg.axis()
    .scale(x_hours)
    .orient("top");
sketch3.append("g")
    .attr("class", "axis")
    .attr("transform", "translate(0, " + y(365 + 1) + ")")
    .call(x_axis1);
sketch3.append("g")
    .attr("class", "axis")
    .attr("transform", "translate(0, " + y(0 - 1) + ")")
    .call(x_axis2);

var y_axis = d3.svg.axis()
    .scale(y_months)
    .orient("left");
sketch3.append("g")
    .attr("class", "axis")
    .attr("transform", "translate(" + x(0 - 3) + ", 0)")
    .call(y_axis);

&lt;/script&gt;

&lt;h4&gt;Adding real data&lt;/h4&gt;
&lt;p&gt;Okay, now we're getting somewhere.  It's time to work with some
real data and place it onto the scales using dates and times. As a
first cut, I've extracted a file count for each hour in the dataset.
This resulted in a json file with content like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;...    
&amp;quot;2014-09-29 08:00:00Z&amp;quot;: 120, 
&amp;quot;2014-09-29 09:00:00Z&amp;quot;: 120, 
&amp;quot;2014-09-29 10:00:00Z&amp;quot;: 119, 
&amp;quot;2014-09-29 11:00:00Z&amp;quot;: 120, 
&amp;quot;2014-09-29 12:00:00Z&amp;quot;: 120, 
&amp;quot;2014-09-29 13:00:00Z&amp;quot;: 120, 
...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Loading this into a sketch is easy with &lt;code&gt;d3.json()&lt;/code&gt;. The keys are
sorted, but just to be thorough I'll also use &lt;code&gt;d3.min()&lt;/code&gt; and
&lt;code&gt;d3.max()&lt;/code&gt; to get the first and last date/times from the set.&lt;/p&gt;
&lt;p&gt;The next piece of all this is to set the scales to use the dates.
I created that data file knowing that javascript should be able to
parse the dates cleanly; hopefully this will feed right into the
&lt;a href="https://github.com/mbostock/d3/wiki/Time-Scales"&gt;d3 time scaling
functions&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Finally, it'll all come together with line segments drawn in for
each hour. The percentage of files available (should be 120 total
for each hour) will feed into a color scale. To see the contrast
of missing files well, the scale will have to be exponential rather
than linear (earlier discussion of which via Albers is written up
&lt;a href="http://data.onebiglibrary.net/2014/09/04/albers-color-studies-part-2/"&gt;in this
post&lt;/a&gt;).
Once again, d3 helps us out, with the &lt;code&gt;d3.scale.pow()&lt;/code&gt; exponential
scaling function. To scale the input domain to the output range using
a power of two, we just set the exponent on the scale as well, and
use colors as the range:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;color_scale&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exponent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;range&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;#fff&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;#000&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This should make missing files lighter, with a missing file or two 
barely noticeable, but more than a dozen or so should be noticeable.&lt;/p&gt;
&lt;div id="sketch4"&gt;&lt;/div&gt;
&lt;style&gt;
.hour line {
    shape-rendering: crispEdges;
}
&lt;/style&gt;
&lt;script&gt;
var padding = 70;
var width = 1200 + padding;
var height = 1600 + padding;
var sketch4 = d3.select("#sketch4").append("svg")
    .attr("width", width)
    .attr("height", height);

// 120 is max number of files per hour
var color_scale = d3.scale.pow()
    .exponent(2)
    .domain([0, 120])
    .range(["lightsteelblue", "midnightblue"]);


d3.json("/data/20141001-filecounts.json", render);

var dataset;
var mindate;
var maxdate;

function render(e, json) {
    if (e) return console.warn(e);
    dataset = json;
    dataset.forEach(function(de, di, da) {
        // construct a correct Date object
        dataset[di].push(new Date(de[0]));
        // construct a UTC-midnight-anchored Date for y-positioning
        dataset[di].push(new Date(de[0].slice(0, 10)));// + " 00:00:00Z"));
    });

    // adjust hours to anchor extremes at UTC-midnight
    mindate = new Date(d3.min(dataset, function(d) { return d[2]; }));
    maxdate = new Date(d3.max(dataset, function(d) { return d[2]; }));

    var x = d3.scale.linear()
        .domain([0, 24])
        .range([padding, width - padding/4]);

    var y = d3.time.scale()
        .domain([mindate, maxdate])
        .nice(d3.time.day)
        .rangeRound([padding/2, height - padding/2]);

    var hours = sketch4.selectAll(".hour")
        .data(dataset)
      .enter().append("line")
        .attr("class", "hour")
        .attr("x1", function(d, i) { return x(d[2].getHours()); })
        .attr("y1", function(d, i) { return y(d[3]); })
        .attr("x2", function(d, i) { return x(d[2].getHours() + 1); })
        .attr("y2", function(d, i) { return y(d[3]); })
        .attr("stroke", function(d) { return color_scale(d[1]); })
        .attr("title", function(d) { return d[0] + ": " + d[1] + " files";})
        .attr("stroke-width", 3.5);

    hours.append("svg:title")
        .attr("class", "hourtext")
        .text(function(d) { return d[0] + ": " + d[1] + " files"; });

    // vertical gridlines for hours
    d3.range(0, 24).forEach(function(he, hi, ha) {
        sketch4.append("line")
            .attr("x1", x(he))
            .attr("y1", y(mindate))
            .attr("x2", x(he))
            .attr("y2", y(maxdate))
            .attr("stroke", "#ccc")
            .attr("stroke-width", 2);
        }
    );

    var x_axis1 = d3.svg.axis()
        .scale(x)
        .orient("bottom");
    sketch4.append("g")
        .attr("class", "axis")
        .attr("transform", "translate(0, " + y(maxdate) + ")")
        .call(x_axis1);
    var x_axis2 = d3.svg.axis()
        .scale(x)
        .orient("top");
    sketch4.append("g")
        .attr("class", "axis")
        .attr("transform", "translate(0, " + y(mindate) + ")")
        .call(x_axis2);

    var y_axis = d3.svg.axis()
        .scale(y)
        .orient("left");
    sketch4.append("g")
        .attr("class", "axis")
        .attr("transform", "translate(" + x(0) + ", 0)")
        .call(y_axis);

};

&lt;/script&gt;

&lt;p&gt;That's the trick. This meets the purpose, but has two problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The time zones are off. See the way the first day starts at 00:00
 but stops at 20:00? That's the four-hour adjustment for eastern (US)
 time, which is happening in a way I'm not controlling properly. You can
 see this for yourself by mousing over a 00:00 block; it will show 04:00
 as the hour. It doesn't make sense to do that because it introduces a
 discontinuity. More importantly, it's unclear what time we're looking at
 for any given block, and for this particular case the data was collected
 from a Chinese service, so it's doubly annoying for the researcher to
 have to correct for two offsets. Looks wrong, is wrong.&lt;/li&gt;
&lt;li&gt;Not working outside of chrome. Need to debug.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For now I have to leave this aside to get back to other projects. It
will be good to circle back around to these to figure out how to get it
right. Can't leave it hanging.&lt;/p&gt;
&lt;p&gt;Fyi, I posted this up as a gist with similar text to be visible at
&lt;a href="http://bl.ocks.org/dchud/5b6f902d410e1e5253a1"&gt;bl.ocks.org/dchud&lt;/a&gt;.
If you want to poke at the code without futzing with the rest of
all this text, follow that link through or go right to &lt;a href="https://gist.github.com/dchud/5b6f902d410e1e5253a1"&gt;the original
gist&lt;/a&gt;.&lt;/p&gt;</content><category term="20141001-years-worth-of-dots"/></entry><entry><title>animating regression</title><link href="https://data.onebiglibrary.net/2014/09/18/animating-regression/" rel="alternate"/><published>2014-09-18T00:00:00-04:00</published><updated>2014-09-18T00:00:00-04:00</updated><author><name>dchud</name></author><id>tag:data.onebiglibrary.net,2014-09-18:/2014/09/18/animating-regression/</id><summary type="html">&lt;p&gt;When performing a simple linear regression, it's important to review
all the diagnostic plots that come with it. If the residual errors
aren't normally distributed, you will have to rethink your model.
Like I referenced in an &lt;a href="http://data.onebiglibrary.net/2014/08/12/things-to-know-about-data-science/"&gt;earlier
post&lt;/a&gt;
you can't just stop at the fit plot, even if it …&lt;/p&gt;</summary><content type="html">&lt;p&gt;When performing a simple linear regression, it's important to review
all the diagnostic plots that come with it. If the residual errors
aren't normally distributed, you will have to rethink your model.
Like I referenced in an &lt;a href="http://data.onebiglibrary.net/2014/08/12/things-to-know-about-data-science/"&gt;earlier
post&lt;/a&gt;
you can't just stop at the fit plot, even if it is pretty (here
courtesy of SAS):&lt;/p&gt;
&lt;p&gt;&lt;img alt="regression plot" src="https://data.onebiglibrary.net/2014/09/18/animating-regression/b3-simple-regression-plot.png"&gt;&lt;/p&gt;
&lt;p&gt;You have to review its diagnostics:&lt;/p&gt;
&lt;p&gt;&lt;img alt="regression diagnostics" src="https://data.onebiglibrary.net/2014/09/18/animating-regression/b3-simple-regression-diag.png"&gt;&lt;/p&gt;
&lt;p&gt;Typically in a set of diagnostic plots like this, you look first
at the top left chart to see if the residuals balance around 0. The
Q-Q plot below that should be close to the 45&amp;deg; line, the histogram
below that should look normal the way most of us know, and the
Cook's distance plot at middle right should show no outliers near
or above 1. Any of these plots going wrong should be a sign that
there's something amiss with your model. And this is all in addition
to reviewing the numbers that come out of the model, like the p-value
on the F test of the model, the R-square, the p-value on the t test
of the dependent variable, and the p-value of a normality test on
the residual errors.&lt;/p&gt;
&lt;p&gt;The trick is, though, it can take time to develop an intuitive feel
for how to read all these numbers and plots, even for a model that's
as (relatively) simple as linear regression.&lt;/p&gt;
&lt;p&gt;&lt;a href="http://d3js.org/"&gt;D3.js&lt;/a&gt; offers something better, the chance to
animate the relationships between these plots, with &lt;a href="http://bost.ocks.org/mike/constancy/"&gt;object
constancy&lt;/a&gt;. Maybe it would
be useful to do something like the transitions in the
&lt;a href="http://bost.ocks.org/mike/constancy/"&gt;showreel&lt;/a&gt; with the main fit
plot and the diagnostic charts, with constancy among the points in
the dataset to show which lie where in the various plots I mentioned
above from the fit through the diagnostic set. Let's try that.&lt;/p&gt;
&lt;h4&gt;Simple transitions&lt;/h4&gt;
&lt;p&gt;The first step is to wrap our heads around the timed transitions
in that showreel; I haven't done those before. The key seems to be
the use of &lt;code&gt;setTimeout(callback_function, delay)&lt;/code&gt; calls at the end
of each function in the &lt;a href="http://bl.ocks.org/mbostock/1256572"&gt;showreel
source&lt;/a&gt;.  Note that &lt;code&gt;setTimeout()&lt;/code&gt;
is a &lt;a href="http://ejohn.org/blog/how-javascript-timers-work/"&gt;JavaScript timer&lt;/a&gt;,
not a D3 function. &lt;/p&gt;
&lt;p&gt;This should be easy to replicate. To try it out, let's just draw a
box, then move it around.&lt;/p&gt;
&lt;div id='test1'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 200;
var height = 200;
var duration = 1000;
var delay = 1000;

var test1 = d3.select("#test1").append("svg")
    .attr("width", width)
    .attr("height", height);

var box = test1.append("rect")
    .attr("id", "box")
    .attr("x", 0)
    .attr("y", 0)
    .attr("width", 100)
    .attr("height", 100)
    .attr("fill", "darkolivegreen");

setTimeout(move_right, duration);

function move_right() {
    test1.select("#box").transition()
        .duration(duration)
        .attr("x", 100);
    setTimeout(move_down, delay + duration);
}

function move_down() {
    test1.select("#box").transition()
        .duration(duration)
        .attr("y", 100);
    setTimeout(move_left, delay + duration);
}

function move_left() {
    test1.select("#box").transition()
        .duration(duration)
        .attr("x", 0);
    setTimeout(move_up, delay + duration);
}

function move_up() {
    test1.select("#box").transition()
        .duration(duration)
        .attr("y", 0);
    setTimeout(move_right, delay + duration);
}

&lt;/script&gt;

&lt;p&gt;This is pretty straightforward, we draw a &lt;code&gt;rect&lt;/code&gt;, then we set the
first timeout to call one of four similar functions that does what
you'd expect:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;setTimeout(move_right, duration);

function move_right() {
    test1.select(&amp;quot;#box&amp;quot;).transition()
        .duration(duration)
        .attr(&amp;quot;x&amp;quot;, 100);
    setTimeout(move_down, delay + duration);
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;move_right()&lt;/code&gt; uses d3's &lt;code&gt;transition()&lt;/code&gt; to shift the &lt;code&gt;x&lt;/code&gt; over to
100, then sets a time for &lt;code&gt;move_down()&lt;/code&gt;, which shifts &lt;code&gt;y&lt;/code&gt; to 100,
then sets a timeout with a similar callback to &lt;code&gt;move_left()&lt;/code&gt;, then
we go to &lt;code&gt;move_up()&lt;/code&gt;, which goes back to &lt;code&gt;move_right()&lt;/code&gt;, and we
have an endless loop of timed transitions.  This might not be a
model for building UI event-driven animations, of course, but we
can settle on this kind of showreel-style series of repeating
transitions to show a cycle of plots.&lt;/p&gt;
&lt;p&gt;Note that the &lt;code&gt;setTimeout&lt;/code&gt; delay on each callback isn't just &lt;code&gt;delay&lt;/code&gt;
but is rather &lt;code&gt;delay + duration&lt;/code&gt;. The delay alone runs concurrent
with the transition duration, so if we don't add &lt;code&gt;duration&lt;/code&gt;, the
delay will end at nearly the same time as the duration! The duration
means the transition will take &lt;code&gt;duration&lt;/code&gt; milliseconds, but javascript
still executes the following call to &lt;code&gt;setTimeout&lt;/code&gt; immediately, so
we have to set the delay value to something longer or the box will
never appear to "pause" between transitions.&lt;/p&gt;
&lt;h4&gt;Adding constancy&lt;/h4&gt;
&lt;p&gt;The next trick is to do the same thing but with multiple moving
points based on data. To do this, we'll expand our model above to
include a simple three-value dataset. We'll still move it right,
down, left, up, then right again, but in each of these quadrants
we'll use a different set of scales to position each element. It's
important to use d3's &lt;a href="http://alignedleft.com/tutorials/d3/binding-data"&gt;data
binding&lt;/a&gt; for this
rather than, say, a few &lt;code&gt;circle&lt;/code&gt; and &lt;code&gt;rect&lt;/code&gt; elements we could draw
by hand because ultimately we will want to bind real data from a
regression.&lt;/p&gt;
&lt;p&gt;We'll use the same structure - four functions with obvious names.
The first time through, we'll place circles using the straight
values as their x and y positions in the upper left quadrant, then
for each of the other functions we'll use different scales to slide
them around inside each following quadrant. I've added lines to 
help distinguish the quadrants.&lt;/p&gt;
&lt;div id='test2'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 200;
var height = 200;
var duration = 1000;
var delay = 1000;

var data = [15, 38, 67, 85];

var test2 = d3.select("#test2").append("svg")
    .attr("width", width)
    .attr("height", height);

var color_scale = d3.scale.ordinal()
    .domain([0, 3])
    .range(["darkgoldenrod", "firebrick", "navajowhite", "slategrey"]);

test2.append("line")
    .attr("x1", 100)
    .attr("y1", 0)
    .attr("x2", 100)
    .attr("y2", 200)
    .attr("stroke", "#bbb");

test2.append("line")
    .attr("x1", 0)
    .attr("y1", 100)
    .attr("x2", 200)
    .attr("y2", 100)
    .attr("stroke", "#bbb");

var g = test2.selectAll("g")
    .data(data)
    .enter().append("g")
        .attr("class", "object");

g.each(function(d, i) {
    var o = d3.select(this);
    o.append("circle")
        .attr("r", 15)
        .attr("cx", d)
        .attr("cy", d)
        .attr("fill-opacity", ".80")
        .attr("fill", color_scale(i));
});

setTimeout(move_right2, duration);

function move_right2() {
    var x = d3.scale.linear()
        .domain([0, 100])
        .range([70, 30]);

    var c = test2.selectAll(".object");
    c.each(function(d, i) {
        var o = d3.select(this);
        o.select("circle").transition()
            .duration(duration)
            .attr("cx", x(d))
            .attr("cy", x(d))
            .attr("transform", "translate(100, 0)");
    });

    setTimeout(move_down2, delay + duration);
}

function move_down2() {
    var x = d3.scale.linear()
        .domain([0, 100])
        .range([10, 90]);

    var c = test2.selectAll(".object");
    c.each(function(d, i) {
        var o = d3.select(this);
        o.select("circle").transition()
            .duration(duration)
            .attr("cx", x(d))
            .attr("cy", x(d))
            .attr("transform", "translate(100, 100)");
    });
    setTimeout(move_left2, delay + duration);
}

function move_left2() {
    var x = d3.scale.linear()
        .domain([0, 100])
        .range([60, 40]);

    var c = test2.selectAll(".object");
    c.each(function(d, i) {
        var o = d3.select(this);
        o.select("circle").transition()
            .duration(duration)
            .attr("cx", x(d))
            .attr("cy", x(d))
            .attr("transform", "translate(0, 100)");
    });
    setTimeout(move_up2, delay + duration);
}

function move_up2() {
    var x = d3.scale.linear()
        .domain([0, 100])
        .range([0, 100]);

    var c = test2.selectAll(".object");
    c.each(function(d, i) {
        var o = d3.select(this);
        o.select("circle").transition()
            .duration(duration)
            .attr("cx", x(d))
            .attr("cy", x(d))
            .attr("transform", "translate(0, 0)");
    });
    setTimeout(move_right2, delay + duration);
}

&lt;/script&gt;

&lt;p&gt;This works pretty well once you are clear about the scope of the
object you want to operate on. At first we create a set of svg &lt;code&gt;g&lt;/code&gt;
&lt;a href="https://developer.mozilla.org/en-US/docs/Web/SVG/Element/g"&gt;group
objects&lt;/a&gt;, and
place &lt;code&gt;circle&lt;/code&gt;s inside of each:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;test2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;selectAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;g&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;enter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;g&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;class&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;object&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;each&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;this&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;circle&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;r&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cx&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cy&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;fill-opacity&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;.80&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;fill&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;color_scale&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This sets us up with the basic set of "data points" we'll move around.
Then we just start firing up transitions like before, using 
&lt;code&gt;setTimeout()&lt;/code&gt;, but with each move function doing a little more:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;move_left2&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;linear&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;range&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;test2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;selectAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;.object&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;each&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;this&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;circle&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;transition&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cx&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cy&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;                    &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;transform&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;translate(0, 100)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;move_up2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;delay&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;First we define a new scale for each quadrant, changing the output
range; in this case, it reverses the ordering, and places the circles
in a narrow band just 20 pixels wide. Next, we select the &lt;code&gt;.object&lt;/code&gt;s
we created, which pulls up those &lt;code&gt;g&lt;/code&gt;s we started with, then loops
through the set of them, firing off a transition the moves the &lt;code&gt;cx&lt;/code&gt;
and &lt;code&gt;cy&lt;/code&gt; of each according to the new scale, and also resets the
coordinate space to each quadrant in turn.&lt;/p&gt;
&lt;h4&gt;Simulating a regression&lt;/h4&gt;
&lt;p&gt;We'll use a more substantial dataset when we put it all together,
but for now let's assemble a small dataset and sketch a fit plot
and residual plot transitioning back and forth. I've made up some
values and used R to generate a regression (&lt;code&gt;d&lt;/code&gt; is just the same
data as in the javascript below):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;Call&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
&lt;span class="n"&gt;lm&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;formula&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;~&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;

&lt;span class="n"&gt;Coefficients&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Intercept&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="mf"&gt;18.000&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="mf"&gt;9.036&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We can use this regression line to get a feel for transitioning
more elements together, and for some of the extra elements we'll
want to add to make things pop a bit.&lt;/p&gt;
&lt;div id='sim'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 400;
var height = 400;
var duration = 1000;
var delay = 1000;

var data = [15, 22, 34, 53, 48, 60, 95, 79, 88, 109, 92];

var slope = 9.036;
var intercept = 18;
function expected(index) {
    return (slope * index) + intercept;
};

var sim = d3.select("#sim").append("svg")
    .attr("width", width)
    .attr("height", height);

var padding = 20;

var x = d3.scale.linear()
    .domain([0, data.length])
    .range([padding, width - padding]);
var y = d3.scale.linear()
    .domain([d3.min(data), d3.max(data)])
    .range([height - padding, padding]);

var g = sim.selectAll("g")
    .data(data)
    .enter().append("g")
        .attr("class", "object");

sim.append("line")
    .attr("id", "line")
    .attr("x1", x(0))
    .attr("y1", y(intercept))
    .attr("x2", x(11))
    .attr("y2", y(expected(11)))
    .attr("stroke-width", 2)
    .attr("stroke", "steelblue");

g.each(function(d, i) {
    var o = d3.select(this);
    o.attr("class", "observation");
    o.append("line")
        .attr("x1", x(i))
        .attr("y1", y(d))
        .attr("x2", x(i))
        .attr("y2", y(expected(i)))
        .attr("stroke-width", 2)
        .attr("stroke", "gray");
    o.append("circle")
        .attr("r", 5)
        .attr("cx", x(i))
        .attr("cy", y(d))
        .attr("stroke", "black")
        .attr("fill", "darkslategrey");
});


setTimeout(residual, delay);

function fit() {
    line = sim.select("#line").transition()
        .duration(duration)
        .attr("y1", y(intercept))
        .attr("y2", y(expected(11)));

    var c = sim.selectAll(".observation");
    c.each(function(d, i) {
        var o = d3.select(this);
        o.transition()
            .duration(duration)
            .attr("transform", "translate(0, 0)");
    });

    setTimeout(residual, delay + duration);
};

function residual() {
    line = sim.select("#line").transition()
        .duration(duration)
        .attr("y1", height/2)
        .attr("y2", height/2);

    var c = sim.selectAll(".observation");
    c.each(function(d, i) {
        var o = d3.select(this);
        o.transition()
            .duration(duration)
            .attr("transform", "translate(0, " + (200 - y(expected(i))) + ")");
    });

    setTimeout(fit, delay + duration);
};
&lt;/script&gt;

&lt;p&gt;For the regression, we apply the results R gave us to define the
slope, intercept, and a function that returns expected values from
the model:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;slope&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;9.036&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;intercept&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;slope&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;intercept&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This function lets us put in an index number for a data value and
get back what the model expects the data value to be. We can then
use this whenever we need to plot the residual, here in the original
rendering of the data points and residual lines against the model:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;each&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;d3&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;this&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;class&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;observation&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;line&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;x1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;y1&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;x2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;y2&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;stroke-width&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;stroke&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;gray&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;circle&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;r&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cx&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;cy&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;stroke&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;black&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;fill&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;darkslategrey&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The line is vertical, so the x-scale places it horizontally using
the index number. The vertical line segment representing the residual
error starts at the actual value &lt;code&gt;d&lt;/code&gt; and ends at the expected value
&lt;code&gt;expected(i)&lt;/code&gt;, with both adjusted to the y-scale using &lt;code&gt;y()&lt;/code&gt;. Then,
in the residual view/function, we just have to rotate the model
line to "level" (&lt;code&gt;height/2&lt;/code&gt;) and translate the &lt;code&gt;g&lt;/code&gt;-wrapped residual
line and data point to level minus the y-scale-adjusted expected
value from the model:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="na"&gt;.attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;transform&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;translate(0, &amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="no"&gt;y&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;)&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="c1"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And when we switch back to the "fit" view, we just translate them
back again to &lt;code&gt;(0, 0)&lt;/code&gt;, and rotate the model line back to the
original regression slope.&lt;/p&gt;
&lt;p&gt;This feels like a good stopping point for today. Next time, we'll
pick up from here, add the additional diagnostic plots, and fill
out each stage with axes and other niceties as appropriate.&lt;/p&gt;</content><category term="20140918-animating-regression"/></entry><entry><title>Albers color studies in D3.js, part 2</title><link href="https://data.onebiglibrary.net/2014/09/04/albers-color-studies-part-2/" rel="alternate"/><published>2014-09-04T00:00:00-04:00</published><updated>2014-09-04T00:00:00-04:00</updated><author><name>dchud</name></author><id>tag:data.onebiglibrary.net,2014-09-04:/2014/09/04/albers-color-studies-part-2/</id><summary type="html">&lt;p&gt;(See also part one, &lt;a href="http://data.onebiglibrary.net/2014/08/08/simple-color-relationships/"&gt;simple color relationships
w/d3&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;Picking up where we left off, in the middle of Josef Albers'
&lt;a href="http://yupnet.org/interactionofcolor/"&gt;Interaction of Color&lt;/a&gt; (Yale
Press's iPad edition), his study of the "middle mixture" affords a
chance to bring in &lt;a href="http://d3js.org/"&gt;D3.js&lt;/a&gt; support for animations
and transitions.&lt;/p&gt;
&lt;p&gt;In this study …&lt;/p&gt;</summary><content type="html">&lt;p&gt;(See also part one, &lt;a href="http://data.onebiglibrary.net/2014/08/08/simple-color-relationships/"&gt;simple color relationships
w/d3&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;Picking up where we left off, in the middle of Josef Albers'
&lt;a href="http://yupnet.org/interactionofcolor/"&gt;Interaction of Color&lt;/a&gt; (Yale
Press's iPad edition), his study of the "middle mixture" affords a
chance to bring in &lt;a href="http://d3js.org/"&gt;D3.js&lt;/a&gt; support for animations
and transitions.&lt;/p&gt;
&lt;p&gt;In this study, Albers chooses a trio of colors where the middle is
a mixture in the middle of the other two.  He recommends sliding
the lowest part up slowly, so we can observe how the increased ratio
of the darker color draws out how that darker color contributes to
the mix, and then as you slide it back away again, you can see the
top (lighter) color come through in the middle mixture. Concentrate
on the middle block as the lower one moves up and down, and you can
also see an illusory gradient effect near the top and bottom.&lt;/p&gt;
&lt;div id='middle'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 450, height = 720;
var svg = d3.select("#middle").append("svg")
    .attr("width", width)
    .attr("height", height);

var color_light = '#F5F57F';
var color_middle = '#C4BF7E';
var color_dark = '#918763';

// top block, light
var block_top = svg.append("rect")
    .attr("x", 0)
    .attr("y", 0)
    .attr("width", width)
    .attr("height", height / 3)
    .attr("fill", color_light);

// middle block
var block_middle = svg.append("rect")
    .attr("x", 0)
    .attr("y", 240)
    .attr("width", width)
    .attr("height", height / 3)
    .attr("fill", color_middle);

// bottom block, dark
var block_bottom = svg.append("rect")
    .attr("x", 0)
    .attr("y", 480)
    .attr("width", width)
    .attr("height", height / 3)
    .attr("fill", color_dark);

var animate = function() {
    block_bottom
        .transition()
            .delay(1000)
            .duration(5000)
            .attr("y", 280)
            .ease("quad-in-out")
        .transition()
            .duration(5000)
            .attr("y", 480)
            .ease("quad-in-out")
            .each("end", animate);
};
animate();

&lt;/script&gt;

&lt;hr /&gt;

&lt;p&gt;This study demonstrates how varying the quantity of each color present 
affects the relationships between colors and the overall feeling of a
design even when the structure isn't altered in any other way.  All of
these use the same four colors and the same overall shape.&lt;/p&gt;
&lt;div id='juxtaposition'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 630, height = 600;
var svg = d3.select("#juxtaposition").append("svg")
    .attr("width", width)
    .attr("height", height);

var xpad = 40;
var ypad = 30;

var xscale = d3.scale.linear()
    .domain([0, 4])
    .range([0, width - (xpad * 3)]);
var yscale = d3.scale.linear()
    .domain([0, 6])
    .range([0, height - (ypad * 5)]);

// colors
var pink = '#DEBAD0';
var grey = '#C4C2D1';
var red = '#C95B44';
var green = '#526B5C';
var colors = [pink, grey, red, green];

var interiors = [
    // pink column
    [[red, grey, green],
    [red, green, grey],
    [grey, green, red],
    [grey, red, green],
    [green, grey, red],
    [green, red, grey]],
    // grey column
    [[pink, green, red],
    [pink, red, green],
    [red, pink, green],
    [red, green, pink],
    [green, red, pink],
    [green, pink, red]],
    // red column
    [[pink, grey, green],
    [pink, green, grey],
    [grey, pink, green],
    [grey, green, pink],
    [green, grey, pink],
    [green, pink, grey]],
    // green column
    [[pink, grey, red],
    [pink, red, grey],
    [grey, red, pink],
    [grey, pink, red],
    [red, pink, grey],
    [red, grey, pink]]
    ];



colors.forEach(function(ce, ci, ca) {
    d3.range(0, 6).forEach(function(ye, yi, ya) {
        // outer box
        svg.append("rect")
            .attr("x", xscale(ci))
            .attr("y", yscale(yi))
            .attr("width", 100)
            .attr("height", 63)
            .attr("fill", ce);
        svg.append("rect")
            .attr("x", xscale(ci) + 10)
            .attr("y", yscale(yi) + 12)
            .attr("width", 80)
            .attr("height", 48)
            .attr("fill", interiors[ci][yi][0]);
        svg.append("rect")
            .attr("x", xscale(ci) + 17)
            .attr("y", yscale(yi) + 18)
            .attr("width", 66)
            .attr("height", 24)
            .attr("fill", interiors[ci][yi][1]);
        svg.append("rect")
            .attr("x", xscale(ci) + 23)
            .attr("y", yscale(yi) + 22)
            .attr("width", 54)
            .attr("height", 18)
            .attr("fill", interiors[ci][yi][2]);
    });
});

&lt;/script&gt;

&lt;p&gt;Each has its own distinct feel, right? Taken together they seem to
dance chaotically, and it's not particularly pleasant, but its goal
is instructive, of course, not aesthetic. Albers suggests using
sheets of paper or your hands to block out smaller sets to look at
in turn: a row, a column, etc., and considering which combinations
are your favorite and why.&lt;/p&gt;
&lt;p&gt;In writing this one up I waffled between writing a routine to
generate the color permutations and laying them out explicitly like
the example in the book, and I ended up matching the book explicitly.
The rest of these exercises have tried to match the book closely,
so it seemed okay to just iterate over an array of arrays that had
been lined up by hand. I also did a lot of pixel-nudging to get the
boxes to line up just so (hence punting on fixing the extra white
space at the bottom).&lt;/p&gt;
&lt;hr /&gt;

&lt;p&gt;This next study is a similar look at color mixture. The individual 
lines can be laid out with scaling easily enough with D3, but to make
them look uneven/wobbly is a bit of a challenge.&lt;/p&gt;
&lt;div id='wobbly'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 420, height = 720;
var svg = d3.select("#wobbly").append("svg")
    .attr("width", width)
    .attr("height", height);

var padding = 10;

var orange = '#D46D42';
var violet = '#A29BD1';
var grey = '#DCE3E1';
var green = '#476E51';
var colors = [green, orange, violet, grey];

var xscale = d3.scale.linear()
    .domain([0, 18])
    .range([padding, width-(padding * 2)]);
var yscale = d3.scale.linear()
    .domain([0, 4])
    .range([padding, height-(padding * 2)]);

var skewscale = d3.scale.linear()
    .domain([0, 1])
    .range([-1, 1]);
var skewer = function() {
    return skewscale(Math.random());
};

var sizescale = d3.scale.linear()
    .domain([0, 1])
    .range([.96, 1.04]);
var scaler = function() {
    return sizescale(Math.random());
};

var rotatescale = d3.scale.linear()
    .domain([0, 1])
    .range([-2, 2]);
var rotater = function() {
    return rotatescale(Math.random());
};

// backgrounds
colors.forEach(function(ce, ci, ca) {
    svg.append("rect")
        .attr("x", padding)
        .attr("y", yscale(ci))
        .attr("width", width - (padding * 2))
        .attr("height", (height - (padding * 2)) / 4)
        .attr("fill", ce);
});

var transformer = function() {
    return "scale(" + scaler() + ") rotate(" + rotater() + ")"; // skewX(" + skewer() + ") skewY(" + skewer() + ")";
};

[violet, grey, green, orange].forEach(function(ce, ci, ra) {
    svg.append("rect")
        .attr("x", 0)
        .attr("y", yscale(ci))
        .attr("width", width / 17 - 8)
        .attr("height", height / 4)
        .attr("fill", ce);
});

[grey, green, orange, violet].forEach(function(ce, ci, ra) {
    svg.append("rect")
        .attr("x", xscale(18))
        .attr("y", yscale(ci))
        .attr("width", width / 17 - 8)
        .attr("height", height / 4)
        .attr("fill", ce);
});

[green, grey, violet, orange].forEach(function(ce, ci, ra) {
    d3.range(0, 18).forEach(function(re, ri, ra) {
        svg.append("rect")
            .attr("x", 0)
            .attr("y", 0)
            .attr("transform", "translate(" + (xscale(re) + 2) + ", " + yscale(3 - ci) + ") scale(" + scaler() + ") skewX(" + skewer() + ") skewY(" + skewer() + ") rotate(" + rotater() + ", " + width/36 + ", " + height/8 + ")")
            .attr("width", width / 17 - 8)
            .attr("height", height / 4)
            .attr("fill", ce);
    })
});

&lt;/script&gt;

&lt;p&gt;This mostly recreates the effect of the study in the book but is
unsatisfying on a few counts. The "wobble" of the individual patches
is decent, but they should scale and skew a little more. The ordering
of the stacking throws off the effect, and the y-skew is a little
too great. Perhaps the biggest issue is the use of &lt;code&gt;translate()&lt;/code&gt;
to locate each strip in its place sets the top-left &lt;code&gt;(x,y)&lt;/code&gt; to too
fixed of a point; it needs to vary more. There is a lot going on
in the &lt;a href="https://developer.mozilla.org/en-US/docs/Web/SVG/Attribute/transform"&gt;SVG transform
attributes&lt;/a&gt;
that I don't fully understand, largely around the shift in coordinate
systems, and that's holding me back from developing the right
approach to skewing and placing each strip correctly. I'll have to
revisit this to wrap my head around it more fully.&lt;/p&gt;
&lt;p&gt;This is definitely the most disappointing recreation of studies
from the book I've done so far.&lt;/p&gt;
&lt;hr /&gt;

&lt;h4&gt;The Weber-Fechner Law&lt;/h4&gt;
&lt;p&gt;Wikipedia points out &lt;a href="http://en.wikipedia.org/wiki/Weber%E2%80%93Fechner_law"&gt;discrepancies in the term "Weber-Fechner
law"&lt;/a&gt; but
as that's how Albers referred to the difference between the
quantitative and perceptual effect of layering color: linear additions
seem to lead to logarithmic effects, and exponential additions seem
to lead to linear effects. In the book this study uses translucence,
so I'll stick with SVG's opacity support to recreate it.&lt;/p&gt;
&lt;p&gt;Arithmetic increases in application of color here, by way of stacking,
lead to only slight shifts in the perceived color.&lt;/p&gt;
&lt;div id='yellow-stack'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 600, height = 600;
var svg = d3.select("#yellow-stack").append("svg")
    .attr("width", width)
    .attr("height", height);

var yellow = '#D7E650';
var opacity = '0.75';

// two horizontals
svg.append("rect")
    .attr("x", 0)
    .attr("y", 300)
    .attr("width", 500)
    .attr("height", 175)
    .attr("fill-opacity", opacity)
    .attr("fill", yellow);
svg.append("rect")
    .attr("x", 50)
    .attr("y", 220)
    .attr("width", 500)
    .attr("height", 175)
    .attr("fill-opacity", opacity)
    .attr("fill", yellow);

// two verticals
svg.append("rect")
    .attr("x", 100)
    .attr("y", 50)
    .attr("width", 175)
    .attr("height", 500)
    .attr("fill-opacity", opacity)
    .attr("fill", yellow);
svg.append("rect")
    .attr("x", 180)
    .attr("y", 130)
    .attr("width", 175)
    .attr("height", 500)
    .attr("fill-opacity", opacity)
    .attr("fill", yellow);

&lt;/script&gt;

&lt;p&gt;Ah, this recreates the effect of the study in the book much more
effectively than the previous one (a relief). With 75% &lt;code&gt;fill-opacity&lt;/code&gt;
we can trace the distinct shades of color as 2, 3, and 4 patches
are overlaid in different spots. The difference from one to two is
much greater than the difference from three to four.&lt;/p&gt;
&lt;hr /&gt;

&lt;p&gt;This next study repeats a similar process, showing off the difference
between linear and exponential layer addition. At left, each succeding
strip from top to bottom has one additional layer beyond the one above
it; at right, the difference is a power of two. So at left, it is 
{1, 2, 3, 4, 5} layers, and at right, {1, 2, 4, 8, 16}.&lt;/p&gt;
&lt;div id='red-stacks'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 600, height = 600;
var svg = d3.select("#red-stacks").append("svg")
    .attr("width", width)
    .attr("height", height);

var red = '#871315';
var black = '#000';
var opacity = '0.12';

// left
svg.append("rect")
    .attr("x", 0)
    .attr("y", 0)
    .attr("width", 260)
    .attr("height", height)
    .attr("fill", red);
// right
svg.append("rect")
    .attr("x", 340)
    .attr("y", 0)
    .attr("width", 260)
    .attr("height", height)
    .attr("fill", red);

// layer two, just like the first, but smaller
// left
svg.append("rect")
    .attr("x", 0)
    .attr("y", 120)
    .attr("width", 260)
    .attr("height", height)
    .attr("fill-opacity", opacity)
    .attr("fill", black);
// right
svg.append("rect")
    .attr("x", 340)
    .attr("y", 120)
    .attr("width", 260)
    .attr("height", height)
    .attr("fill-opacity", opacity)
    .attr("fill", black);

// layer three, repeating on right
// left
svg.append("rect")
    .attr("x", 0)
    .attr("y", 240)
    .attr("width", 260)
    .attr("height", height)
    .attr("fill-opacity", opacity)
    .attr("fill", black);
// right
d3.range(0, 2).forEach(function(e, i, a) {
    svg.append("rect")
        .attr("x", 340)
        .attr("y", 240)
        .attr("width", 260)
        .attr("height", height)
        .attr("fill-opacity", opacity)
        .attr("fill", black);
});

// layer four, repeating on right
// left
svg.append("rect")
    .attr("x", 0)
    .attr("y", 360)
    .attr("width", 260)
    .attr("height", height)
    .attr("fill-opacity", opacity)
    .attr("fill", black);
// right
d3.range(0, 4).forEach(function(e, i, a) {
    svg.append("rect")
        .attr("x", 340)
        .attr("y", 360)
        .attr("width", 260)
        .attr("height", height)
        .attr("fill-opacity", opacity)
        .attr("fill", black);
});

// layer five, repeating on right
// left
svg.append("rect")
    .attr("x", 0)
    .attr("y", 480)
    .attr("width", 260)
    .attr("height", height)
    .attr("fill-opacity", opacity)
    .attr("fill", black);
// right
d3.range(0, 8).forEach(function(e, i, a) {
    svg.append("rect")
        .attr("x", 340)
        .attr("y", 480)
        .attr("width", 260)
        .attr("height", height)
        .attr("fill-opacity", opacity)
        .attr("fill", black);
});
&lt;/script&gt;

&lt;p&gt;I misread this one, getting it completely wrong at first. The ground
is a red, and the added layers are blacks, using SVG &lt;code&gt;fill-opacity&lt;/code&gt;.
I had thought at first that the layers were all red, but the part
at right never converged to black until I re-read that they are
indeed black layers added on top, on both sides.&lt;/p&gt;
&lt;p&gt;I haven't been able to recreate the subtlety of the shift to barely
imperceptable on the left, but this is fairly close.&lt;/p&gt;
&lt;hr /&gt;

&lt;p&gt;This final study is a look at near-equality of light intensity, the
difficulty of choosing examples for which Albers warns us carefully.
If chosen correctly, when the two saw-tooths come together, two
colors with similar light intensity should start to blend into each
other, even though they are dissimilar otherwise.&lt;/p&gt;
&lt;div id='sawtooth'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 300, height = 600;
var svg = d3.select("#sawtooth").append("svg")
    .attr("width", width)
    .attr("height", height);

var c1 = '#D9C1A0';
var c2 = '#E3BABA';

var y = d3.scale.linear()
    .domain([0, 6])
    .range([0, height]);

d3.range(1, 6).forEach(function(e, i, a) {
    svg.append("rect")
        .attr("class", "left")
        .attr("x", 70)
        .attr("y", 0)
        .attr("width", 80)
        .attr("height", 100)
        .attr("transform", "translate(0, " + y(e) + ") skewX(-10)")
        .attr("fill", c1);
});

d3.range(0, 5).forEach(function(e, i, a) {
    svg.append("rect")
        .attr("class", "right")
        .attr("x", 168)
        .attr("y", 0)
        .attr("width", 80)
        .attr("height", 100)
        .attr("transform", "translate(0, " + y(e) + ") skewX(-10)")
        .attr("fill", c2);
});

var animate = function() {
    var left = svg.selectAll('.left'); 
    left.transition()
        .delay(1000)
        .duration(3000)
        .attr("x", 79)
        .ease("quad-in-out")
        .transition()
        .duration(3000)
        .attr("x", 70)
        .ease("quad-in-out");
    var right = svg.selectAll('.right');
    right.transition()
        .delay(1000)
        .duration(3000)
        .attr("x", 157)
        .ease("quad-in-out")
        .transition()
        .duration(3000)
        .attr("x", 168)
        .ease("quad-in-out")
        .each("end", animate);
};
animate();

&lt;/script&gt;

&lt;p&gt;This works nicely - for that brief instant when the two sides touch
it seems like the sawtooth pattern at their mutual boundary disappears
and the colors start to merge. &lt;/p&gt;
&lt;p&gt;This has been a great exercise, both in learning about color
relativity and digging deeper into the basics of D3. Just makes me
want to do more. There a lot of code in the studies I replicated
here and in part one that could be much clearner, but I gave up on
writing clean code in service of getting it done and keeping things
simple.  In future posts I'll be working with real datasets more
often than not, and cleaner code will always help there. I aimed
for staying true to the exact studies in the book, too, to have a
target to aim towards, rather than taking the opportunity to do the
exercises for myself, finding colors that would be a good match,
because I wanted to learn about D3 at the same time, and reproduction
is easier than original work. The app version of the book allows
for creating your own studies, and I've played around with that
some, so I don't feel like I'm missing out too much.&lt;/p&gt;
&lt;p&gt;I hope you'll stay tuned, it feels like it's just getting started.&lt;/p&gt;</content><category term="20140904-albers-color-studies-part-2"/></entry><entry><title>7±2 things to know about data science</title><link href="https://data.onebiglibrary.net/2014/08/12/things-to-know-about-data-science/" rel="alternate"/><published>2014-08-12T00:00:00-04:00</published><updated>2014-08-12T00:00:00-04:00</updated><author><name>dchud</name></author><id>tag:data.onebiglibrary.net,2014-08-12:/2014/08/12/things-to-know-about-data-science/</id><summary type="html">&lt;p&gt;&lt;em&gt;For a talk given at code4lib DC 2014.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;Background&lt;/h2&gt;
&lt;p&gt;I am a professional librarian and software developer with 17 years
in the job post master's. I studied at a strong school, worked at
some great institutions and worked with many great people and between
all this I've learned a lot …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;em&gt;For a talk given at code4lib DC 2014.&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;Background&lt;/h2&gt;
&lt;p&gt;I am a professional librarian and software developer with 17 years
in the job post master's. I studied at a strong school, worked at
some great institutions and worked with many great people and between
all this I've learned a lot about being a hacker / librarian, enough
that the good people at GW Libraries saw fit to hire me to manage
a team exactly three years ago.&lt;/p&gt;
&lt;p&gt;I am a student of data science, halfway through a two-year program
at &lt;a href="http://gwanalytics.org/"&gt;GW School of Business&lt;/a&gt;.  So far I have
learned enough to understand a fair amount about what it is I need
to be able to do to apply data science, but I am not yet very good
at doing that.&lt;/p&gt;
&lt;p&gt;As a manager in tech in a research library, my job is to work to
ensure that our team and our library do meaningful work well,
reliably.  I intend to develop my professional skill at working
with data to meet this same goal:  do meaningful work reliably well.
With that in mind, I have a rough sense of what librarian and
archivist colleagues might need to know about data science means,
but I still have an awful lot to learn.&lt;/p&gt;
&lt;h2&gt;Defining "data science" and "business analytics"&lt;/h2&gt;
&lt;p&gt;Like many aspects of data science, this is best communicated visually.&lt;/p&gt;
&lt;p&gt;Here is a canonical industry view of required skills many of us like:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Data Science Venn Diagram" src="http://static.squarespace.com/static/5150aec6e4b0e340ec52710a/t/51525c33e4b0b3e0d10f77ab/1364352052403/Data_Science_VD.png?format=1500w"&gt; &lt;/p&gt;
&lt;p&gt;&lt;cite&gt;by Drew Conway, see &lt;a href="http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram"&gt;http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram&lt;/a&gt;&lt;/cite&gt;&lt;/p&gt;
&lt;p&gt;A layer-cake view of analytics tasks that is also helpful:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Categories of Analytics" src="http://www.theorsociety.com/Media/Images/Users/CaraQuinton01011978/ActualSize/17_05_2012-16_06_56.jpg"&gt; &lt;/p&gt;
&lt;p&gt;&lt;cite&gt;by Gavin Blackett, see &lt;a href="http://www.theorsociety.com/Media/Images/Users/CaraQuinton01011978/ActualSize/17_05_2012-16_06_56.jpg"&gt;http://www.theorsociety.com/Media/Images/Users/CaraQuinton01011978/ActualSize/17_05_2012-16_06_56.jpg&lt;/a&gt;&lt;/cite&gt;&lt;/p&gt;
&lt;p&gt;The questions we ask at these levels, phrased simply:&lt;/p&gt;
&lt;p&gt;&lt;img alt="Types of Business Analytics Capabilities" src="http://www.morganfranklin.com/website/assets/uploads/weblog/_658_370/4TypesofBusinessAnalyticsCapabilities_658x370px.jpg"&gt;&lt;/p&gt;
&lt;p&gt;&lt;cite&gt;by MorganFranklin Consulting, see &lt;a href="http://www.morganfranklin.com/insights/article/4-types-of-business-analytics-capabilities"&gt;http://www.morganfranklin.com/insights/article/4-types-of-business-analytics-capabilities&lt;/a&gt;&lt;/cite&gt;&lt;/p&gt;
&lt;p&gt;Lisa Kurt defined a helpful paradigm for this view at code4lib 2012
in Seattle:&lt;/p&gt;
&lt;p&gt;&lt;img alt="DIPP Framework" src="http://www.mu-sigma.com/analytics/images/dipp.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;cite&gt;by Mu Sigma, see &lt;a href="http://www.mu-sigma.com/analytics/ecosystem/dipp.html"&gt;http://www.mu-sigma.com/analytics/ecosystem/dipp.html&lt;/a&gt;&lt;/cite&gt;&lt;/p&gt;
&lt;p&gt;Most simply, we can speak of data science as the application of
statistics to support decisions, to understand patterns in data,
and to reduce or at least clarify uncertainty in a wide range of
domains.&lt;/p&gt;
&lt;h2&gt;Applying data science&lt;/h2&gt;
&lt;p&gt;From where I sit (halfway through a degree program) the ability to
apply data science techniques meaningfully comes down to something
more like this:&lt;/p&gt;
&lt;div id='skill-venn'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 800, height = 600;
var svg = d3.select("#skill-venn").append("svg")
    .attr("width", width)
    .attr("height", height);
svg.append("svg:circle")
    .attr("cx", 300)
    .attr("cy", 200)
    .attr("r", 200)
    .style("fill", "#1b9e77")
    .style("fill-opacity", ".5");
svg.append("svg:circle")
    .attr("cx", 500)
    .attr("cy", 200)
    .attr("r", 200)
    .style("fill", "#d95f02")
    .style("fill-opacity", ".5");
svg.append("svg:circle")
    .attr("cx", 400)
    .attr("cy", 400)
    .attr("r", 200)
    .style("fill", "#7570b3")
    .style("fill-opacity", ".5");
svg.append("svg:text")
    .attr("x", 160)
    .attr("y", 160)
    .style("font-size", "36px")
    .style("fill", "black")
    .text("Science");
svg.append("svg:text")
    .attr("x", 550)
    .attr("y", 160)
    .style("font-size", "36px")
    .style("fill", "black")
    .text("Skill");
svg.append("svg:text")
    .attr("x", 300)
    .attr("y", 480)
    .style("font-size", "36px")
    .style("fill", "black")
    .text("Good sense");
&lt;/script&gt;

&lt;p&gt;Can you identify the Danger Zone?&lt;/p&gt;
&lt;p&gt;I am weak on Science, but improving. I am confident in the hacking
side of my Skill, but not yet in applying statistical models. I
would like to believe I have good sense, but there is an art to
applying it here.&lt;/p&gt;
&lt;h2&gt;Asking the right questions&lt;/h2&gt;
&lt;p&gt;Much of this work comes down to knowing which questions to ask and
being steadfast in attempting to answer them honestly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;What goal do we wish to achieve?&lt;/li&gt;
&lt;li&gt;What data do we have to work with?&lt;/li&gt;
&lt;li&gt;What gaps in data do we need to fill, and how can we fill them?&lt;/li&gt;
&lt;li&gt;What assumptions are we working under, and are they acceptable?&lt;/li&gt;
&lt;li&gt;Which of the many available models fits our data and goals well?&lt;/li&gt;
&lt;li&gt;What bias is inherent in our data, and what bias are we introducing?&lt;/li&gt;
&lt;li&gt;With what level of certainty can we make a claim?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is particularly risky to learn one model (e.g. linear regression)
and one tool (e.g. R) and take whatever data you have and only ever
attempt linear regressions with R without asking and answering these
other questions.  It's not about R (or SAS or Python or SPSS or
Julia or Stata or Excel or ...) being a magic tool, and linear
regression might be a poor fit for your data.&lt;/p&gt;
&lt;h2&gt;Data context switching, aka munging&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;There is no such thing as a clean data set.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is important.&lt;/p&gt;
&lt;p&gt;Any data you start with will have been collected or prepared with
a particular purpose. That purpose might or might not have anything
to do with your goals. You will most likely need to reframe the
data you start with to fit your needs. This might involve ETL
pipeline processing, recontextualizing, extracting, summarizing,
merging, splitting, and otherwise reshaping data.&lt;/p&gt;
&lt;p&gt;Any decent data person will need to become proficient at some or
all of these tasks.&lt;/p&gt;
&lt;p&gt;There are even style guides for data, such as Hadley Wickham's
&lt;a href="http://vita.had.co.nz/papers/tidy-data.pdf"&gt;Tidy Data&lt;/a&gt;, which 
proposed the following principles of tidiness:&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Each variable forms a column.&lt;/li&gt;
&lt;li&gt;Each observation forms a row.&lt;/li&gt;
&lt;li&gt;Each type of observational unit forms a table.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Some tools have their own preferences; in SAS, some procedures like
"wide form" (each variable a column) and others "long form" (variable
names parameterized). All the more reason to develop munging skills.&lt;/p&gt;
&lt;p&gt;Data munging is often the most time-consuming part of statistics
work.&lt;/p&gt;
&lt;p&gt;Sound familiar?&lt;/p&gt;
&lt;h2&gt;Applying models&lt;/h2&gt;
&lt;p&gt;There are many, many types of models. Different models can be used
for different tasks as shown in the diagram above.  Some, like
simple regression, are widely applicable and easy to understand.
Many are narrowly applicable and hard to understand, but prove to
be far more effective for certain use cases.&lt;/p&gt;
&lt;p&gt;Work with most models require similar steps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Prep sample data to apply the model&lt;/li&gt;
&lt;li&gt;Use 2-3 visualizations to explore the data&lt;/li&gt;
&lt;li&gt;Re-munge data to apply the model&lt;/li&gt;
&lt;li&gt;Run the model, evaluating results&lt;/li&gt;
&lt;li&gt;Review residuals/errors&lt;/li&gt;
&lt;li&gt;Check model assumptions, bias&lt;/li&gt;
&lt;li&gt;Lather, rinse, repeat&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After all that, you'll probably want to try all of the above again
with another model. Or two or three.&lt;/p&gt;
&lt;p&gt;Half of understanding a model is understanding what to look for in
results, and how to evaluate assumptions and results. It is easy
to think you might have a great model, but if you don't know how
to evaluate residuals and check basic model assumptions, your work
might not be meaningful.&lt;/p&gt;
&lt;p&gt;Here is an example of what this might look like, using SAS.&lt;/p&gt;
&lt;p&gt;First, we import data in an attempt to find a relationship between
age and weight. The data looks like this (thanks,
&lt;a href="https://csvkit.readthedocs.org/en/0.8.0/"&gt;csvkit&lt;/a&gt;!):&lt;/p&gt;
&lt;p&gt;&lt;img alt="a few lines of data" src="https://data.onebiglibrary.net/2014/08/12/things-to-know-about-data-science/csvlook.png"&gt;&lt;/p&gt;
&lt;p&gt;A simple regression offers these results:&lt;/p&gt;
&lt;p&gt;&lt;img alt="regression results" src="https://data.onebiglibrary.net/2014/08/12/things-to-know-about-data-science/b3-simple-regression-table.png"&gt;&lt;/p&gt;
&lt;p&gt;Every stats app has a report format like this; SAS likes HTML tables.
Important details in here are the p-value of the F test result, the
R-Square, and the p-value of the t test on the dependent variable age.&lt;/p&gt;
&lt;p&gt;The plot is pretty:&lt;/p&gt;
&lt;p&gt;&lt;img alt="regression plot" src="https://data.onebiglibrary.net/2014/08/12/things-to-know-about-data-science/b3-simple-regression-plot.png"&gt;&lt;/p&gt;
&lt;p&gt;But we have to review its diagnostics:&lt;/p&gt;
&lt;p&gt;&lt;img alt="regression diagnostics" src="https://data.onebiglibrary.net/2014/08/12/things-to-know-about-data-science/b3-simple-regression-diag.png"&gt;&lt;/p&gt;
&lt;p&gt;And check residuals precisely:&lt;/p&gt;
&lt;p&gt;&lt;img alt="regression residuals" src="https://data.onebiglibrary.net/2014/08/12/things-to-know-about-data-science/b3-simple-regression-residuals-b.png"&gt;&lt;/p&gt;
&lt;p&gt;To be a good data scientist, you have to work any model through all
of these steps, knowing which tests to run on results. Every model
has its own characteristics.&lt;/p&gt;
&lt;h2&gt;Applying tools&lt;/h2&gt;
&lt;p&gt;In some cases, there are straightforward models that can be applied
with straightforward tools. For example, this is a time series of 
the amount of recent airline travel in the US. In just a few lines 
of R, you can produce this decomposition of seasonal and trend lines:&lt;/p&gt;
&lt;p&gt;&lt;img alt="time series decomposition" src="https://data.onebiglibrary.net/2014/08/12/things-to-know-about-data-science/ts-decomp.png"&gt;&lt;/p&gt;
&lt;p&gt;This is wonderful, but keep in mind that there is always more to the
story. The simplest-seeming models and tools often require a lot of
subtlety to wield reliably well.&lt;/p&gt;
&lt;p&gt;For more on this particular example, see &lt;a href="http://a-little-book-of-r-for-time-series.readthedocs.org/en/latest/src/timeseries.html"&gt;Using R for Time Series
Analysis&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Learning the craft&lt;/h2&gt;
&lt;p&gt;On this part I'm shaky, but my hunch is that like learning to be a
competent programmer, applying meaningful data science reliably
well is a craft that takes time to learn. It took me a good five
years to learn many basic lessons about programming that no CS prof
ever taught me (granted, I don't have a CS degree, but I've taken
many of the basic courses in formal settings). After five years, I
was a good enough programmer to get a job on a great team, where I
really started to learn the craft - all the details you need to
attend to if you want to build systems that scale up, and if you
want to sustain projects over years, through staff turnover and
changing technologies.&lt;/p&gt;
&lt;p&gt;Like with CS, the science is critical, but it seems like it will
take me several years and a lot of repetition to develop the kind
of intuitive feel for choosing models, checking assumptions, and
explaining results. It's that "good sense" I know I need to strive
for, but I don't have it yet. Before I start applying any of this
on data in my workplace, I'll be sure to find someone more experienced
than me to run ideas by, someone who knows the craft well already.&lt;/p&gt;
&lt;h2&gt;What can we do to help?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Fill in gaps on campus&lt;/li&gt;
&lt;li&gt;Support critical thinking in data selection, munging, and application&lt;/li&gt;
&lt;li&gt;Encourage a well-rounded view, especially with Ethics&lt;/li&gt;
&lt;li&gt;Apply our experience with workflows and conventions&lt;/li&gt;
&lt;li&gt;Learn and apply for ourselves&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;What next?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.khanacademy.org/math/probability"&gt;Probability and statistics&lt;/a&gt; on Khan Academy&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.coursera.org/specialization/jhudatascience/1"&gt;Johns Hopkins Data Science Specialization&lt;/a&gt; on Coursera&lt;/li&gt;
&lt;li&gt;Leek et al., &lt;a href="https://github.com/jtleek/datasharing"&gt;How to share data with a statistician&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Provost and Fawcett, &lt;a href="http://www.data-science-for-biz.com/"&gt;Data Science for Business&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</content><category term="20140812-things-to-know-about-data-science"/></entry><entry><title>simple color relationships w/d3</title><link href="https://data.onebiglibrary.net/2014/08/08/simple-color-relationships/" rel="alternate"/><published>2014-08-08T00:00:00-04:00</published><updated>2014-08-08T00:00:00-04:00</updated><author><name>dchud</name></author><id>tag:data.onebiglibrary.net,2014-08-08:/2014/08/08/simple-color-relationships/</id><summary type="html">&lt;p&gt;I've been reading Josef Albers' &lt;a href="http://yupnet.org/interactionofcolor/"&gt;Interaction of
Color&lt;/a&gt; (Yale Press's iPad
edition) and am learning quite a lot from it. I particularly enjoy
his details about what to expect in student reactions to particular
exercises; you know he must have anticipated and savored these
reactions each time, with every class …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I've been reading Josef Albers' &lt;a href="http://yupnet.org/interactionofcolor/"&gt;Interaction of
Color&lt;/a&gt; (Yale Press's iPad
edition) and am learning quite a lot from it. I particularly enjoy
his details about what to expect in student reactions to particular
exercises; you know he must have anticipated and savored these
reactions each time, with every class.&lt;/p&gt;
&lt;p&gt;The basic principles of the first few chapters should be easy to
demonstrate using &lt;a href="http://d3js.org/"&gt;d3&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In "Chapter IV: A color has many faces" we see the first of several
color plates and we are quickly drawn into what he has to teach us
about the relativity of color, that "color is the most relative
medium in art." Let's mimic the first experiment, making one color
look different from itself, using different background colors. I'm
guessing (poorly!) at colors somewhat close to those in the prepared
studies in the text itself, using &lt;a href="http://www.colorpicker.com/"&gt;this color
picker&lt;/a&gt;.&lt;/p&gt;
&lt;div id='basic'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 700, height = 800;
var svg = d3.select("#basic").append("svg")
    .attr("width", width)
    .attr("height", height);
var outer1 = svg.append("rect")
    .attr("x", 50)
    .attr("y", 50)
    .attr("width", 600)
    .attr("height", 300)
    .attr("fill", "#4C0A73");
var inner1 = svg.append("rect")
    .attr("x", 100)
    .attr("y", 100)
    .attr("width", 500)
    .attr("height", 200)
    .attr("fill", "#5A6E5E");
var outer2 = svg.append("rect")
    .attr("x", 50)
    .attr("y", 450)
    .attr("width", 600)
    .attr("height", 300)
    .attr("fill", "#9DD1CE");
var inner2 = svg.append("rect")
    .attr("x", 100)
    .attr("y", 500)
    .attr("width", 500)
    .attr("height", 200)
    .attr("fill", "#5A6E5E");
&lt;/script&gt;

&lt;p&gt;This use of d3 demonstrates several features which make it an
appealing toolkit, even for a beginner:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;it's just javascript&lt;/li&gt;
&lt;li&gt;it's just &lt;a href="http://en.wikipedia.org/wiki/Scalable_Vector_Graphics"&gt;SVG&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;you can do simple things very simply&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I learned SVG many years ago, back in 2004, when it was still a
fairly new web standard and had very few useable implementations.
The good news is that it is much more widely implemented now, and
that it hasn't changed much since back then (there's only been one
new revision, a "second edition" of the first version), so if you
know a few SVG basics then it's easy to see that d3 just uses an
API defined in javascript to generate SVG.  This is a &lt;em&gt;lot&lt;/em&gt; easier
than generating SVG by hand yourself; I know from first-hand
experience a decade ago.&lt;/p&gt;
&lt;p&gt;Another way to think of d3 is as a "domain specific language" for
dynamic documents on the web.  It's just javascript, but it's a
flavor of types and techniques specific to generating SVG using
javascript that lends itself well to visualizing data.&lt;/p&gt;
&lt;p&gt;In any case, this copied "plate" demonstrates the basic principle
well: the inner color is exactly the same in both rectangles, and
it is the interaction between this color and the differing surrounding
/ background colors that makes it look different from itself from
one to the next.&lt;/p&gt;
&lt;h4&gt;Changing colors&lt;/h4&gt;
&lt;p&gt;To make this a little more dynamic (it is the web after all) let's
add the ability to change colors by clicking on the inner boxes.
The code will be the same, but with the "click" method defined on
each.&lt;/p&gt;
&lt;p&gt;Click on the top inner box to make both inner boxes lighter.  Click
on the bottom inner box to make both inner boxes darker.&lt;/p&gt;
&lt;div id='changing-colors'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 700, height = 800;
var innercolor = "#5A6E5E";
var svg = d3.select("#changing-colors").append("svg")
    .attr("width", width)
    .attr("height", height);
var outer1 = svg.append("rect")
    .attr("x", 50)
    .attr("y", 50)
    .attr("width", 600)
    .attr("height", 300)
    .attr("fill", "#4C0A73");
var inner1 = svg.append("rect")
    .attr("x", 100)
    .attr("y", 100)
    .attr("width", 500)
    .attr("height", 200)
    .attr("fill", innercolor)
    .on("click", function(){
        brighten();
    });
var outer2 = svg.append("rect")
    .attr("x", 50)
    .attr("y", 450)
    .attr("width", 600)
    .attr("height", 300)
    .attr("fill", "#9DD1CE");
var inner2 = svg.append("rect")
    .attr("x", 100)
    .attr("y", 500)
    .attr("width", 500)
    .attr("height", 200)
    .attr("fill", innercolor)
    .on("click", function(){
        darken();
    });

function brighten () {
    [inner1, inner2].forEach(function(item) {
        item.style("fill", d3.hsl(item.style("fill")).brighter(.1));
    });
}

function darken () {
    [inner1, inner2].forEach(function(item) {
        item.style("fill", d3.hsl(item.style("fill")).darker(.1));
    });
}
&lt;/script&gt;

&lt;p&gt;This further reinforces the effect; at some points as you click to
ratchet the intensity up or down the two inner boxes look like
wholly different colors, and at other points (especially the extremes)
it is clear that they are the same.&lt;/p&gt;
&lt;p&gt;Of course this isn't quite what Albers had in mind with the lovely
physical interactions designed into his text (which the Yale Press' folks
very creatively transposed to the iPad app) but perhaps we can use the
dynamic aspect of the web, made so easy by d3, usefully to embody some
of the same lessons he taught.&lt;/p&gt;
&lt;h4&gt;Lighter and/or darker&lt;/h4&gt;
&lt;p&gt;To focus us in on light intensity, Albers presents several exersizes
in subtle and not-so-subtle gradations of light. SVG's gradient support
should help to recreate them.&lt;/p&gt;
&lt;div id='light-stripes'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 450, height = 700;
var svg = d3.select("#light-stripes").append("svg")
    .attr("width", width)
    .attr("height", height);
// basic gradient
var gradient_up = svg.append("svg:defs")
    .append("svg:linearGradient")
        .attr("id", "gradient_up")
        .attr("x1", "0%")
        .attr("y1", "0%")
        .attr("x2", "0%")
        .attr("y2", "100%");
gradient_up.append("svg:stop")
    .attr("offset", "0%")
    .attr("stop-color", "#222")
    .attr("stop-opacity", 1);
gradient_up.append("svg:stop")
    .attr("offset", "100%")
    .attr("stop-color", "#ddd")
    .attr("stop-opacity", 1);
// now the opposite; perhaps a transform instead?
var gradient_down = svg.append("svg:defs")
    .append("svg:linearGradient")
        .attr("id", "gradient_down")
        .attr("x1", "0%")
        .attr("y1", "100%")
        .attr("x2", "0%")
        .attr("y2", "0%");
gradient_down.append("svg:stop")
    .attr("offset", "0%")
    .attr("stop-color", "#222")
    .attr("stop-opacity", 1);
gradient_down.append("svg:stop")
    .attr("offset", "100%")
    .attr("stop-color", "#ddd")
    .attr("stop-opacity", 1);

// the frame
var outer = svg.append("rect")
    .attr("x", 0)
    .attr("y", 0)
    .attr("width", width)
    .attr("height", height)
    .attr("fill", "#888");
// the inner "background"
var inner = svg.append("rect")
    .attr("x", 10)
    .attr("y", 10)
    .attr("width", width - 20)
    .attr("height", height - 20)
    .style("fill", "url(#gradient_up)");

var bar_width = (width-20) / 19;

var x_scale = d3.scale.linear()
    .domain([0, 18])
    .range([10, width - 10 - bar_width]);

// the "foreground"
for(var i=0; i&lt;19; i++) {
    if(i % 2) {
        var barup = svg.append("rect")
            .attr("x", x_scale(i))
            .attr("y", 10)
            .attr("width", bar_width)
            .attr("height", height - 20);
        barup.style("fill", "url(#gradient_down)");
    }
}
&lt;/script&gt;

&lt;p&gt;This really comes alive as the intensity of the two gradients pass
each other on the way up/down. It all seems to merge!  And little
shadows seem to appear around the frame at the top and bottom just
past the strips' ends.&lt;/p&gt;
&lt;p&gt;The gradients above are explicit. In this next example from Albers,
the gradients are illusions.&lt;/p&gt;
&lt;div id='gradations'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 300, height = 700;
var svg = d3.select("#gradations").append("svg")
    .attr("width", width)
    .attr("height", height);

// the frame
var outer = svg.append("rect")
    .attr("x", 0)
    .attr("y", 0)
    .attr("width", width)
    .attr("height", height)
    .attr("fill", "#888");

var bar_width = (width - 60) / 2;
var bar_height = (height - 20) / 17;

var y_scale = d3.scale.linear()
    .domain([0, 16])
    .range([height - 20 - bar_height, 20]);
var color_scale = d3.scale.linear()
    .domain([0, 16])
    .range(['#222', '#ddd']);

// the panels
for(var i=0; i&lt;17; i++) {
    var panel = svg.append("rect")
        .attr("x", 20)
        .attr("y", y_scale(i))
        .attr("width", bar_width)
        .attr("height", bar_height)
        .style("fill", color_scale(i));
    var panel = svg.append("rect")
        .attr("x", (width / 2) + 10)
        .attr("y", y_scale(i))
        .attr("width", bar_width)
        .attr("height", bar_height)
        .style("fill", color_scale(i));
}
&lt;/script&gt;

&lt;p&gt;Every one of the individual rectangles above is a solid color, even
though it looks like each has its own gradient. It's the effect of the 
proximity to slightly lighter and darker colors above and below that
makes the contrasts between them appear to form two ends of a gradient
in each rectangle. It seems to be most pronounced in the corners.&lt;/p&gt;
&lt;h4&gt;Transparence and Optical Mixture&lt;/h4&gt;
&lt;p&gt;Albers teaches that we can simulate transparency and the apparent
ordering/stacking of layers with color mixtures; SVG allows for
specific opacity settings. Let's try it both ways, first with
explicit color changes:&lt;/p&gt;
&lt;div id='transparency'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 450, height = 700;
var svg = d3.select("#transparency").append("svg")
    .attr("width", width)
    .attr("height", height);

// the frame
var outer = svg.append("rect")
    .attr("x", 0)
    .attr("y", 0)
    .attr("width", width)
    .attr("height", height)
    .attr("fill", "#ADA0BA");

// black "foreground"
var foreground = svg.append("rect")
    .attr("x", 210)
    .attr("y", 50)
    .attr("width", 200)
    .attr("height", 600)
    .attr("fill", "#111");

// white strips, left side
var strip1 = svg.append("rect")
    .attr("x", 60)
    .attr("y", 110)
    .attr("width", 150)
    .attr("height", 120)
    .attr("fill", "#eee");

var strip2 = svg.append("rect")
    .attr("x", 60)
    .attr("y", 290)
    .attr("width", 150)
    .attr("height", 120)
    .attr("fill", "#eee");

var strip3 = svg.append("rect")
    .attr("x", 60)
    .attr("y", 470)
    .attr("width", 150)
    .attr("height", 120)
    .attr("fill", "#eee");

// "white" strips, right side
var strip4 = svg.append("rect")
    .attr("x", 210)
    .attr("y", 110)
    .attr("width", 120)
    .attr("height", 120)
    .attr("fill", "#333");

var strip5 = svg.append("rect")
    .attr("x", 210)
    .attr("y", 290)
    .attr("width", 120)
    .attr("height", 120)
    .attr("fill", "#888");

var strip3 = svg.append("rect")
    .attr("x", 210)
    .attr("y", 470)
    .attr("width", 120)
    .attr("height", 120)
    .attr("fill", "#ccc");
&lt;/script&gt;

&lt;p&gt;Note that none of the transparent-seeming sections are actually
transparent; it is only simulated by shifting the color mix. Even
so, it appears that the one at top is "behind" the black, and the
one at bottom is "in front of" the black.&lt;/p&gt;
&lt;p&gt;Let's try doing it again, but this time with SVG opacity variations.&lt;/p&gt;
&lt;div id='transparency2'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 450, height = 700;
var svg = d3.select("#transparency2").append("svg")
    .attr("width", width)
    .attr("height", height);

// the frame
var outer = svg.append("rect")
    .attr("x", 0)
    .attr("y", 0)
    .attr("width", width)
    .attr("height", height)
    .attr("fill", "#ADA0BA");

// black "foreground"
var foreground = svg.append("rect")
    .attr("x", 210)
    .attr("y", 50)
    .attr("width", 200)
    .attr("height", 600)
    .attr("fill", "#111");

// white strips, left side
var strip1 = svg.append("rect")
    .attr("x", 60)
    .attr("y", 110)
    .attr("width", 150)
    .attr("height", 120)
    .attr("fill", "#eee");

var strip2 = svg.append("rect")
    .attr("x", 60)
    .attr("y", 290)
    .attr("width", 150)
    .attr("height", 120)
    .attr("fill", "#eee");

var strip3 = svg.append("rect")
    .attr("x", 60)
    .attr("y", 470)
    .attr("width", 150)
    .attr("height", 120)
    .attr("fill", "#eee");

// "white" strips, right side
var strip4 = svg.append("rect")
    .attr("x", 210)
    .attr("y", 110)
    .attr("width", 120)
    .attr("height", 120)
    .attr("fill-opacity", 0.15)
    .attr("fill", "#eee");

var strip5 = svg.append("rect")
    .attr("x", 210)
    .attr("y", 290)
    .attr("width", 120)
    .attr("height", 120)
    .attr("fill-opacity", 0.5)
    .attr("fill", "#eee");

var strip3 = svg.append("rect")
    .attr("x", 210)
    .attr("y", 470)
    .attr("width", 120)
    .attr("height", 120)
    .attr("fill-opacity", 0.85)
    .attr("fill", "#eee");
&lt;/script&gt;

&lt;p&gt;Looks very similar, right?&lt;/p&gt;
&lt;p&gt;If you look at the source, you'll see that the structure of this
second version is exactly the same. Only two things change: first,
the right halves of the strips are set to the same initial color
as the left halves, &lt;code&gt;#eee&lt;/code&gt;, whereas in the first version each is
set to an explicitly different color on the grey scale; second, the
&lt;code&gt;fill-opacity&lt;/code&gt; is varied for each of these three from &lt;code&gt;0.15&lt;/code&gt; at the
top (so more of the background black comes through) to &lt;code&gt;0.85&lt;/code&gt; at the
bottom (so more of the white stays "on top"). Just like the first
version, each of the "strips" are actually rendered as two separate
&lt;code&gt;rect&lt;/code&gt; elements.&lt;/p&gt;
&lt;p&gt;So there it is, you can truly simulate transparency and ordering /
stacking just by varying colors, and achieve results almost exactly
like using actual transparency, as demonstrated by the second
diagram.&lt;/p&gt;
&lt;p&gt;One more example from the book exhibiting the effects of "optical
mixture". There are four colors in this example: white, blue, olive,
and mint (for lack of better terms). The individual circles and 
their "donut holes" are all the same size, but the color mixing
makes it appear otherwise. Also, changing contrast in the background
colors relative to the foreground create their own effects, shifting
the sense of what's foreground and background.&lt;/p&gt;
&lt;div id='circles'&gt;&lt;/div&gt;
&lt;script&gt;
var width = 380, height = 800;
var svg = d3.select("#circles").append("svg")
    .attr("width", width)
    .attr("height", height);

// colors
var white = "#eee",
    olive = "#8A8049",
    blue = "#248591",
    mint = "#9BC9B2";

// the frame
var outer = svg.append("rect")
    .attr("x", 0)
    .attr("y", 0)
    .attr("width", width)
    .attr("height", height)
    .attr("fill", olive);

// padding elements
var padding = 40;

// scales for placing the circles
var dia = 30;
var x = d3.scale.linear()
    .domain([0, 9])
    .range([padding + dia/2, width - (padding + dia/2)]);

var ydia = (height - (padding * 2)) / 24;
var y = d3.scale.linear()
    .domain([0, 23])
    .range([padding + dia/2, height - (padding + dia/2)]);

// ranges for counting the circles
var xrange = d3.range(0, 10);
var yrange = d3.range(0, 8);

// draw outer circles, want to repeat per color
var outer_circles = function(range_factor, color) {
    xrange.forEach(function (xe, xi, xa) {
        yrange.forEach(function (ye, yi, ya) {
            svg.append("circle")
                .attr("cx", x(xe))
                .attr("cy", y(ye + range_factor))
                .attr("r", dia/2)
                .attr("fill", color);
        });
    });
};

outer_circles(0, white);
outer_circles(8, mint);
outer_circles(16, blue);

// draw inner circles, arbitrary sets of y-lines and color
var inner_circles = function(ystart, ystop, color) {
    xrange.forEach(function (xe, xi, xa) {
        d3.range(ystart, ystop).forEach(function (ye, yi, ya) {
            svg.append("circle")
                .attr("cx", x(xe))
                .attr("cy", y(ye))
                .attr("r", dia/5)
                .attr("fill", color);
        });
    });
};

inner_circles(2, 4, mint);
inner_circles(4, 6, olive);
inner_circles(6, 10, blue);
inner_circles(10, 12, olive);
inner_circles(14, 18, white);
inner_circles(18, 20, mint);
inner_circles(20, 22, olive);
&lt;/script&gt;

&lt;p&gt;Wow, that turned out better than I thought, but it took a while.
This was a good exercise in framing scaled elements with padding
in d3. I had tried to eyeball the inner frame shape and circle
diameters based on calculations based on padding, width, and height,
but it didn't line up right until I realized it's just an exact 10
x 24 grid.&lt;/p&gt;
&lt;p&gt;Once I reset the scaling to use that grid (worked right away), I
rewrote the outer/inner circle rendering bits using one function
for each; it could be taken a step further with one function for
both that would allow the diameter as a parameter too, and the rows
and colors could just be one simple data structure to loop over,
but it's good enough as is.&lt;/p&gt;
&lt;p&gt;Finally, the colors were a bear to get right. I eyeballed a match
to the colors in the iPad app but the contrast just didn't pop the
way it does in the Yale-produced ebook. After playing with the
colors a lot I remembered: I use the &lt;a href="https://justgetflux.com/"&gt;flux app&lt;/a&gt;
on my desktop, and was working on this at night, so everything was
completely wrong! After turning flux off I was able to get a lot
closer, though the ebook version is still much better.&lt;/p&gt;
&lt;h4&gt;Summary&lt;/h4&gt;
&lt;p&gt;This has been a great exercise in working with the lessons in color 
Albers lays out so elegantly in his book. If this interests you at
all I recommend you get a copy for yourself (the iPad ebook is worth
every penny). A colleague at our library told me we have an early
print edition with all the fold-outs and flaps, so I will have to
take a look at that as well.&lt;/p&gt;
&lt;p&gt;It's also been a good lesson in using d3 to render simple shapes
and colors, and remembering to look in the d3 docs for a cleanly
defined function I'd have otherwise more awkwardly wired up myself
in javascript. Even something as simple to do by hand as what
&lt;code&gt;d3.range()&lt;/code&gt; &lt;a href="https://github.com/mbostock/d3/wiki/Arrays#d3_range"&gt;offers&lt;/a&gt;
has a familiar feel and semantic specificity that makes d3 just
make all the more sense.&lt;/p&gt;
&lt;p&gt;I am about halfway through the text and could use a lot more d3
practice, so before I move on to rendering data more explicitly I
might take a stab at a "part two" post along these same lines.&lt;/p&gt;
&lt;p&gt;If any of the specifics interest you I'd suggest you look at the 
source directly in your browser or using the github links to view
or edit the full markdown+javascript file I'm writing here and
feeding into &lt;a href="http://blog.getpelican.com/"&gt;pelican&lt;/a&gt;. Pull requests
welcome, especially if you spot mistakes or just plain bad ideas,
I know I still have a lot to learn.&lt;/p&gt;
&lt;p&gt;(See also part two, &lt;a href="http://data.onebiglibrary.net/2014/09/04/albers-color-studies-part-2/"&gt;Albers color studies in D3.js, part
2&lt;/a&gt;)&lt;/p&gt;</content><category term="20140808-simple-color-relationships"/></entry></feed>