2014/08/30 Leave a comment
Ouch. More a management than a technical issue, in terms of the lack of communication and risk analysis. And possibly partially a result of reduced capacity on the management and quality control side as a result of reduced funding:
Fast forward to July. StatsCan technicians were updating the Labour Force Survey computer systems. They were changing a field in the survey’s vast collection of databases called the “dwelling identification number.” The report doesn’t explain what this is, but it’s likely a unique code assigned to each of the 56,000 households in the survey so that analysts can easily track their answers over time. They assumed they only needed to make this change to some of the computer programs that crunch the employment data, but not all of them.
The changes themselves were happening piecemeal, rather than all at once, because the system that collects and analyzes the labour force survey is big, complicated and old it was first developed in 1997. Despite being a pretty major overhaul of the computer system, the report makes it clear that the agency considered the changes to be nothing but minor routine maintenance. After updating the system, no one bothered to test the changes to see if they had worked properly before the agency decided to release the data to the public, in large part because they considered it too minor to need testing.
One of the programs that was supposed to be updated — but wasn’t — was the program that fills in the blanks when people don’t answer all the survey questions. But since technicians had changed the identification code for households in some parts of the system, but not others, the program couldn’t match all the people in July survey to all the people in the June survey. The result was that instead of using the June survey results to update the July answers, all those households who didn’t answer the questions about being employed in July were essentially labelled as not in the labour force. With the push of a button, nearly 42,000 jobs disappeared.
… There is a particularly illuminating passage in the report that speaks to problems of miscommunication and misunderstanding at the agency:
“Based on the facts that we have gathered, we conclude that several factors contributed to the error in the July 2014 LFS results. There was an incomplete understanding of the LFS processing system on the part of the team implementing and testing the change to the TABS file. This change was perceived as systems maintenance and the oversight and governance were not commensurate with the potential risk. The systems documentation was out of date, inaccurate and erroneously supported the team’s assumptions about the system. The testing conducted was not sufficiently comprehensive and operations diagnostics to catch this type of error were not present. As well, roles and responsibilities within the team were not as clearly defined as they should have been. Communications among the team, labour analysts and senior management around this particular issue were inadequate.”