The ideologies of Canadian economists, according to Twitter –

Think tanksInteresting analysis of Twitter use and followers to indicate ideological leanings by Stephen Tapp:

Four additional results are worth highlighting. First, there are indeed many Canadian think tanks: these results include 44. Having such a crowded playing field may explain much of the general public’s confusion about which think tank fits in where ideologically.

Second, according to my ideology measure, Canadian think tanks seem to be about evenly split on the left-right continuum: there are 21 think tanks to the left of centre and 23 to the right.

Third, the smile isn’t exactly symmetric. In this sample, and with this measure, the average “right-wing” think tank appears to be a bit more “ideological” than the average “left-wing” think tank. That said, the difference is not that large and may simply reflect what Halberstam and Knight found in the US: that conservatives are actually more tightly connected on social media than liberals.

Fourth, my preliminary analysis did not suggest any systematic relationship between ideology and Twitter followers. In other words, it does not appear that more extreme ideologies on their own are associated with a larger Twitter following.

… That said, we should always be careful when reducing a complex issue to a single number along a single dimension. The concept of ideology is inevitably problematic. Moreover, think tank ideologies are not uniform within a given organization and they change over time. Finally, of course, readers should not use these results to prejudge, discredit or approve of research by any of these organizations without a thorough reading of that research. I emphasize that these simple results are preliminary and just a first step; much more work is needed to better understand these complex issues.

The ideologies of Canadian economists, according to Twitter –

The Best Infographics of the Year: Nate Silver on the 3 Keys to Great Information Design and the Line Between Editing and Censorship

Fields of CommemorationSome neat examples and the principles are well articulated. Some of the use of graphics in the Globe, National Post, NY Times etc buttress his points:

Great works of information design are also great works of journalism.

[…]At the core of journalism is the mission of making sense of our complex world to a broad audience. Newsrooms … place emphasis on gathering information. But they’re also in the business of organizing that information into forms like stories. Visual approaches to organizing information also tell stories, but have a number of potential advantages against purely verbal ones:

Approachability. Human beings have strong visual acuity. Furthermore, our visual language is often more universal than our words. Data presented in the form of an infographic can transcend barriers of class and culture. This is just as important for experts as for laypersons: a 2012 study of academic economists found that they made much more accurate statistical inferences from a graphic presentation of data than when the same information was in tabular form.

Transparency. The community of information designers has an ethos toward sharing their data and their code — both with one another and with readers. Well-executed examples of information design show the viewer something rather than telling her something. They can peel away the onion, build trust, and let the reader see how the conclusions are drawn.

Efficiency. I will not attempt to tell you how many words a picture is worth. But surely visualization is the superior medium in some cases. In trying to figure out how to get from King’s Cross to Heathrow Airport on the London Tube, would you rather listen to a fifteen-minute soliloquy from the bloke at the pub — or take a fifteen-second glance at Beck’s map?

But alongside the tremendous power of information design in making sense of the world is also a dark side of potentially equal magnitude, which Silver captures elegantly:

That information design is part and parcel of journalism also means that it inherits journalism’s burdens. If it’s sometimes easier to reveal information by means of data visualization, that can make it easier to deceive… What one journalist thinks of as organizing information, the next one might call censorship.

But it’s long past time to give information designers their place at the journalistic table. The ones you’ll see in this book are pointing the way forward and helping the rest of us see the world a little more clearly.

The Best Infographics of the Year: Nate Silver on the 3 Keys to Great Information Design and the Line Between Editing and Censorship | Brain Pickings.

How StatsCan lost 42,000 jobs with the stroke of a key –

Ouch. More a management than a technical issue, in terms of the lack of communication and risk analysis. And possibly partially a result of reduced capacity on the management and quality control side as a result of reduced funding:

Fast forward to July. StatsCan technicians were updating the Labour Force Survey computer systems. They were changing a field in the survey’s vast collection of databases called the “dwelling identification number.” The report doesn’t explain what this is, but it’s likely a unique code assigned to each of the 56,000 households in the survey so that analysts can easily track their answers over time. They assumed they only needed to make this change to some of the computer programs that crunch the employment data, but not all of them.

The changes themselves were happening piecemeal, rather than all at once, because the system that collects and analyzes the labour force survey is big, complicated and old it was first developed in 1997. Despite being a pretty major overhaul of the computer system, the report makes it clear that the agency considered the changes to be nothing but minor routine maintenance. After updating the system, no one bothered to test the changes to see if they had worked properly before the agency decided to release the data to the public, in large part because they considered it too minor to need testing.

One of the programs that was supposed to be updated — but wasn’t — was the program that fills in the blanks when people don’t answer all the survey questions. But since technicians had changed the identification code for households in some parts of the system, but not others, the program couldn’t match all the people in July survey to all the people in the June survey. The result was that instead of using the June survey results to update the July answers, all those households who didn’t answer the questions about being employed in July were essentially labelled as not in the labour force. With the push of a button, nearly 42,000 jobs disappeared.

… There is a particularly illuminating passage in the report that speaks to problems of miscommunication and misunderstanding at the agency:

“Based on the facts that we have gathered, we conclude that several factors contributed to the error in the July 2014 LFS results. There was an incomplete understanding of the LFS processing system on the part of the team implementing and testing the change to the TABS file. This change was perceived as systems maintenance and the oversight and governance were not commensurate with the potential risk. The systems documentation was out of date, inaccurate and erroneously supported the team’s assumptions about the system. The testing conducted was not sufficiently comprehensive and operations diagnostics to catch this type of error were not present. As well, roles and responsibilities within the team were not as clearly defined as they should have been. Communications among the team, labour analysts and senior management around this particular issue were inadequate.”

How StatsCan lost 42,000 jobs with the stroke of a key –

Charts, Colour Palettes, and Design

Ethnic Origin Based Charts.001

NHS 2011

As some of you may know, working fairly intensely on analyzing and charting Canadian multiculturalism as seen through the National Household Survey data from 2011 (not as reliable as the Census but what we have).

In looking at how to make charts as simple as clear and possible, came across some good design and related sites.

The above sample is illustrative of the work I am doing.

Starting with Perceptual Edge on data visualization, and the advantages of simplicity. A short clear article outlining good design principles, with some suggested colour palettes:

Practical Rules for Using Color in Charts – Perceptual Edge

For a wider choice of colour palettes, see Every ColorBrewer Scale.

And for users of iWork, this nifty and easy to follow tutorial on how to use the “Colour Picker” effectively and create customized palettes:

Using Apple’s “Color Picker” in Pages 5, Numbers 3, & Keynote 6 (iWork 2013)

Any feedback or suggestions always welcome.

Don’t beat up Statscan for one data error – Cross

More on StatsCan from Phillip Cross, former chief economic analyst. Worth reading for some of the history and how the agency reacted to previous cuts:

People should get agitated about Statscan over substantive issues. Wring your hands that the CPI over-states price changes over long periods. Write your MP complaining that the Labour Force Survey doesn’t follow the U.S. practice and exclude 15 year olds. Take to the barricades that for an energy superpower like Canada, measuring energy exports has become a monthly adventure, routinely revised by $1-billion a month. But don’t use the July employment incident to evaluate how the statistical system is functioning overall. They messed up one data point in one series. Big deal. Anyone who lets one data point affect their view of the economy should not be doing analysis. Move along folks, nothing to see here.

Don’t beat up Statscan for one data error – The Globe and Mail.

The case of the disappearing Statistics Canada data

Good piece on Statistics Canada and the impact of some of the changes made to reduce long-standing data series:

Last year, Stephen Gordon railed against StatsCan’s attention deficit disorder, and its habit of arbitrarily terminating long-standing series and replacing them with new data that are not easily comparable.

For what appears to be no reason whatsover, StatsCan has taken a data table that went back to 1991 and split it up into two tables that span 1991-2001 and 2001-present. Even worse, the older data have been tossed into the vast and rapidly expanding swamp of terminated data tables that threatens to swallow the entire CANSIM site. A few months ago, someone looking for SEPH wage data would get the whole series. Now, you’ll get data going back to 2001 and have to already know StatsCan won’t tell you that there are older data hidden behind the “Beware of the Leopard” sign.…

Statistics Canada must be the only statistical agency in the world where the average length of a data series gets shorter with the passage of time. Its habit of killing off time series, replacing them with new, “improved” definitions and not revising the old numbers is a continual source of frustration to Canadian macroeconomists.

Others are keeping tabs on the vanishing data. The Canadian Social Research Newsletter for March 2 referred to the cuts as the CANSIM Crash Diet and tallied some of the terminations:

  • For the category “Aboriginal peoples” : 4 tables terminated out of a total of 7
  • For the category “Children and youth” : 89 tables terminated out of a total of 130
  • For the category “Families, households and housing” : 67 tables terminated out of a total of 112
  • For the category “Government” : 62 tables terminated out of a total of 141
  • For the category “Income, pensions, spending and wealth” : 41 tables terminated out of a total of 167
  • For the category “Seniors” : 13 tables terminated out of a total of 30

As far as Statistics Canada’s troubles go, this will never get the same level of attention as the mystery of the 200 jobs. But, as it relates to the long-term reliability of Canadian data, it’s just as serious.

Given my work using NHS data, particularly ethnic origin, visible minority and religions, linked to social and economic outcomes, still in the exploration stage of what data and linkages are available – or not.

The case of the disappearing Statistics Canada data

Neat Data Visualization: Net Neutrality

gr-neutrality-comments-624A good visualization that helps one understand relationships and relative weight of comments.

A Fascinating Look Inside Those 1.1 Million Open-Internet Comments 

Israel, Gaza, War & Data — i ❤ data — Medium

Twitter Mid-East solitudesFor data visualization geeks, as well as those more broadly interested in social networks and how they reinforce our existing views, this article by Gilad Lotan is a must read (Haaretz, the left-wing Israeli newspaper, draws the most from both sides):

Facebook’s trending pages aggregate content that are heavily shared “trending” across the platform. If you’re already logged into Facebook, you’ll see a personalized view of the trend, highlighting your friends and their views on the trend. Give it a try.

Now open a separate browser window in incognito mode Chrome: File->New Incognito Window and navigate to the same page. Since the browser has no idea who you are on Facebook, you’ll get the raw, unpersonalized feed.

How are the two different?

Personalizing Propaganda

If you’re rooting for Israel, you might have seen videos of rocket launches by Hamas adjacent to Shifa Hospital. Alternatively, if you’re pro-Palestinian, you might have seen the following report on an alleged IDF sniper who admitted on Instagram to murdering 13 Gazan children. Israelis and their proponents are likely to see IDF videos such as this one detailing arms and tunnels found within mosques passed around in their social media feeds, while Palestinian groups are likely to pass around images displaying the sheer destruction caused by IDF forces to Gazan mosques. One side sees videos of rockets intercepted in the Tel-Aviv skies, and other sees the lethal aftermath of a missile attack on a Gazan neighborhood.

The better we get at modeling user preferences, the more accurately we construct recommendation engines that fully capture user attention. In a way, we are building personalized propaganda engines that feed users content which makes them feel good and throws away the uncomfortable bits.

Worth reflecting upon. I try to have a range of news and twitter feeds to reduce the risk.

Israel, Gaza, War & Data — i ❤ data — Medium.

Professor goes to big data to figure out if Apple slows down old iPhones when new ones come out

Apple Slow iphones

A good illustration of the limits of big data and the risks of confusing correlation with causation. But bid data and correlation can help us ask more informed questions:

The important distinction is of intent. In the benign explanation, a slowdown of old phones is not a specific goal, but merely a side effect of optimizing the operating system for newer hardware. Data on search frequency would not allow us to infer intent. No matter how suggestive, this data alone doesn’t allow you to determine conclusively whether my phone is actually slower and, if so, why.

In this way, the whole exercise perfectly encapsulates the advantages and limitations of “big data.” First, 20 years ago, determining whether many people experienced a slowdown would have required an expensive survey to sample just a few hundred consumers. Now, data from Google Trends, if used correctly, allows us to see what hundreds of millions of users are searching for, and, in theory, what they are feeling or thinking. Twitter, Instagram and Facebook all create what is evocatively called the “digital exhaust,” allowing us to uncover macro patterns like this one.

Second, these new kinds of data create an intimacy between the individual and the collective. Even for our most idiosyncratic feelings, such data can help us see that we aren’t alone. In minutes, I could see that many shared my frustration. Even if you’ve never gathered the data yourself, you’ve probably sensed something similar when Google’s autocomplete feature automatically suggests the next few words you are going to type: “Oh, lots of people want to know that, too?”

Finally, we see a big limitation: This data reveals only correlations, not conclusions. We are left with at least two different interpretations of the sudden spike in “iPhone slow” queries, one conspiratorial and one benign. It is tempting to say, “See, this is why big data is useless.” But that is too trite. Correlations are what motivate us to look further. If all that big data does – and it surely does more – is to point out interesting correlations whose fundamental reasons we unpack in other ways, that already has immense value.

And if those correlations allow conspiracy theorists to become that much more smug, that’s a small price to pay.

Professor goes to big data to figure out if Apple slows down old iPhones when new ones come out

Pie Charts Are Terrible | Graph Graph



I am doing more and more charting to illustrate citizenship and multiculturalism issues and in consulting with those who have a better graphic sense than I, came across this convincing article and illustration against the use of pie charts.

Means I have to redo a number of what I have been working on but always good to learn something new that helps tell the story (I haven’t read his recommended book yet):

Let’s start this off with some honesty.  I used to love pie charts.  I thought they were great, just like the way I used to think Comic Sans was the best font ever.

But then I had some #RealTalk, and I’ve been enlightened in the error of my ways, and I want to pass on what I’ve learned to show people why pie charts aren’t the best choice for visualization.  For my day job, part of my work involves creating visualizations out of business data for our customers.  I picked up a copy of “Information Dashboard Design” a book by Stephen Few of Perceptual Edge.  If you’re at all interested in data visualization, I highly recommend his books, and on this site we attempt to use a lot of the principles in creating the visualizations we present to you.

Pie Charts Are Terrible | Graph Graph.