Tagged standards

Dismantling Federal Assistance to US Elections — The Freeze/Thaw Cycle

Last time I wrote in this series on the EAC being dismantled, I used the metaphor of freezing and thawing to describe not only how the EAC’s effectiveness has been limited, but also the consequence:

We now have voting systems that have been vetted with standards and processes that are almost as Jurassic as the pre-Internet era.

This time I need to support my previous claims by explaining the freeze/thaw cycle in more detail, and connecting it to the outcome of voting systems that are not up to today’s job, as we now understand it, post-2016.

The First Try

EAC’s first try at voting system quality started after the year 2000 election hanging chad debacle, and after the Help America Vote Act (HAVA) designed to fix it. During the period of 2004 to 2006, the EAC was pretty busy defining standards and requirements (technically “guidelines” because states are not obligated to adopt them) for the then-next-gen of voting systems, and setting up processes for testing, review, and certification.

That first try was “good enough” for getting started on a way out of the hanging chad morass, but was woefully inadequate in hindsight. A beginning of a second try resulted in the 2007 recommendations to significantly revise the standards, because the hindsight then showed that the first try had some assumptions that weren’t so good in practice. My summary of those assumptions:

  • Electronic Voting Machines (EVMs) were inherently better than paper-based voting, not just for accessibility (which is a true and important point) but also for reliability, accuracy, and many other factors.
  • It’s OK if EVMs are completely paperless, because we can assume that the hardware and software will always make an accurate and permanent digital record of every voter’s choice.
  • The then current PC technology was good enough for both EVMs and back-office systems, because that PC tech was good enough desktop computing.
  • Security and quality are important, and can be “legislated” into existence by written standards and requirements, and a test process for evaluating whether a voting system meets those requirements.

Even in 2007, and certainly even more since then, we’ve seen that what these assumptions actually got us was not what we really wanted. My summary of what we got:

  • Voting machines lacking any means for people to cross-check the work of the black-box hardware and software, to detect malfunctions or tampering.
  • Voting machines and back-office systems that election officials can only assume are unmodified, un-tampered copies of the certified systems, but can’t actually validate.
  • Voting machines and back-office systems based on decades old PC technology, with all the security and reliability limitations thereof, including the ready ability of any software to modify the system.
  • Voting system software that passed testing, but when opened up for independent review in California and in Ohio, was found to be rife with security and quality problems.

Taken together, that meant that election tech broadly was physically unreliable, and very vulnerable, both to technological mischance and to intentional meddling. A decade ago, we had much less experience than today with the mischances that early PC tech is prone to. At the time, we also had much less sensitivity to the threats and risks of intentional meddling.

Freeze and Thaw

And that’s where the freeze set in. The 2007 recommendations have been gathering dust since then. A few years later, the freeze set in on EAC as well, which spent several years operating without a quorum of congressionally approved commissioners, and not able to change much – including certification standards and requirements.

That changed a couple years ago. One of the most important things that the new commissioners have done is to re-vitalize the process for modernizing the standards, requirements, and processes for new voting system. And that re-vitalization is not a moment too soon, just as most of the nation’s states and localities have been replacing decaying voting machines with “new” voting systems thatare not substantially different from what I’ve described above.

That’s where the huge irony lies – after over a decade of inactivity, the EAC has finally gotten its act together to try to become an effective voting system certification body for the future — and it is getting dismantled.

It is not just EAC that’s making progress. EAC works with NIST, and a Technical Guidelines Working Group (TGWC), and many volunteers from many organizations (including ours) that working in several groups focused on help the TGWC. We’ve dusted off the 2007 recommendations, which address how to fix at least some of those consequences I listed above. We’re writing detailed standards for interoperability, so that election officials have more choice about how to acquire and operate voting tech. I could go on about the range of activity and potential benefits, but the point is, there is lot that is currently a-building that is poised to be frozen again.

A Way Forward?

I believe that it is vitally important, indeed a matter of national security, that our election tech makes a quantum leap forward to address the substantial issues of our current threat environment, and the economic and administrative environment that our hardworking election officials face today.

If that’s to happen, then we need a way to not get frozen again, even if the EAC is dismantled. A look at various possible ways forward will be the coda for this series.

— EJS

The Freeze Factor – Dismantling Federal Assistance to U.S. Elections

“Frozen” is my key word for what happens to the voting system certification process after EAC is dismantled. And in this case, frozen can be really harmful. Indeed, as I will explain, we’ve already seen how harmful.

Last time I wrote in this series on the EAC being dismantled (see the first and second posts), I said that EAC’s certification function is more important than ever. To re-cap:

  • Certification is the standards, requirements, testing, and seal-of-approval process by which local election officials gain access to new election tech.
  • The testing is more important than ever, because of the lessons learned in 2016:

1. The next gen of election technology needs to be not only safe and effective, but also …

2. … must be robust against whole new categories of national security threats, which the voting public only became broadly aware of in late 2016.

Today it’s time to explain just how ugly it could get if the EAC’s certification function gets derailed. Frozen is that starting point, because frozen is exactly where EAC certification has been for over a decade, and as a result, voting system certification is simply not working. That sounds harsh, so let me first explain the critical distinction between standards and process, and then give credit where credit is due for the hardworking EAC folks doing the certification process.

  • Standards comprise the critical part of the voting system certification program. Standards define what a voting system is required to do. They define a test lab’s job for determining whether a voting system meets these requirements.
  • Process the other part of the voting system certification program, composed of the set of activities that the players – mainly a voting system vendor, a test lab, and the EAC – must collectively step through to get to the Federal “seal of approval” that is the starting point for state election officials to make their decisions about voting system to allow in their state.

Years worth of EAC efforts have improved the process a great deal. But by contrast, the standards and requirements have been frozen for over a decade. During that time, here is what we got in the voting systems that passed the then-current and still-current certification program:

Black-box systems that election officials can’t validate, for voting that voters can’t verify, with software that despite passing testing, later turned out to have major security and reliability problems.

That’s what I mean by a certification program that didn’t work, based solely on today’s outcome – election tech that isn’t up to today’s job, as we now understand the job to be, post-2016. We are still stuck with the standards and requirements of the process that did not and does not work. While today’s voting systems vary a bit in terms of verifiability and insecurity, what’s described above is the least common denominator that the current certification program has allowed to get to market.

Wow! Maybe that actually is a good reason to dismantle the EAC – it was supposed to foster voting technology quality, and it didn’t work. Strange as it may sound, that assessment is actually backwards. The root problem is that as a Federal agency, the EAC had been frozen itself. It got thawed relatively recently, and has been taking steps to modernize the voting systems standards and certification. In other words, just when the EAC has thawed out and is starting to re-vitalize voting system standards and certification, it is getting dismantled – that at a time when we just recently understood how vulnerable our election systems are.

To understand the significance of what I am claiming here, I will have be much more specific in my next segment, about the characteristics of the certification that didn’t work, how the fix started over a decade ago, got frozen, and has been thawing. When we understand the transformational value of the thaw, we can better understand what we need in terms of a quality program for voting systems, and how we might get to such a quality program if the EAC is dismantled.

— EJS

A Response to POLITICO: Election Infrastructure as Critical Infrastructure

Below is a letter prepared by co-founders Gregory Miller and John Sebes sent to Tim Starks and Cory Bennett of POLITICO, who cover cyber-security issues.  A formatted version is here.  The signal-to-noise ratio on this subject is rapidly decreasing.  There seems to be some fundamental misunderstandings of the challenges local election officials (LEOs) face; the process by which the equipment is qualified for deployment (albeit decrepit archaic technology by today’s standards); what the vulnerabilities are (and are not); and why a designation of “critical infrastructure” is an important consideration.  We attempt to address some of those points in this response to Tim’s otherwise really good coverage.

Tim Starks
tstarks@politico.com
Morning Cybersecurity Column
POLITICO
1000 Wilson Blvd, 8th Floor,
Arlington, VA, 22209

RE:      11.August Article on Whether to Designate Election Infrastructure as Critical Infrastructure

Greetings Tim

I am a co-founder of the OSET Foundation, a 501.c.3 nonprofit election technology research institute in the Silicon Valley.  I’m writing in response to your article this week in Morning Cybersecurity:

ANOTHER VIEW ON ELECTIONS AS “CRITICAL INFRASTRUCTURE” –
Maybe classifying the election system as part of the nation’s “critical infrastructure” isn’t so wise.

We’ve been on a public benefit mission to innovate electoral technology since 2006.  We’re a group of tech-sector social entrepreneurs bringing years of experience from our former employers like Apple, Facebook, Mozilla, Netscape, and elsewhere to bear on innovating America’s “critical democracy infrastructure” —a term we coined nearly a decade ago.

We’re working with elections officials across the country to develop a publicly owned democracy operating system called ElectOS™ in order to update and upgrade America’s voting systems with innovations that will increase integrity and improve participation for 1/3rd the cost of today’s aging systems.  ElectOS will innovate voting machinery the way Android® has innovated smart phones and mobile devices.  Both are freely available (oropen source”), and like Android, we believe ElectOS will one day enjoy a flourishing commercial market to sustain its continued innovation, deployment, and support.

We’ve been studying the challenges of election administration infrastructure for a decade.  So, we read with great interest your article regarding another viewpoint about making a critical infrastructure designation for our nation’s deteriorating, obsolete, and vulnerable voting infrastructure.  There are elements of your article we agree with (and more specifically comments of Cris Thomas), and there are points that we disagree with because they reveal some misunderstanding of the realities of election administration and the processes of managing the machinery today.  Thus, we were compelled to write you and share these clarifications.

We hope our comments are helpful going forward as you continue to cover this important topic, especially in light of the current election season and the delicate issues being raised by at least one candidate and other media.  Good on you for covering this. Below please find our (hopefully helpful) contributions to your effort.  Relevant portions of your article appear indented in blue.

In recent days, a growing chorus of experts and policy makers have backed a proposal to give elections the same level of federal security protections that the government already grants other so-called critical infrastructure, such as the power grid or financial industry.

First, we believe it’s important to be very clear on what elections infrastructure are we talking about?  We should be discussing voting technology operated by Local Election Officials (“LEOs”), and not web sites and eMail servers run by political NGOs.

Sure, the recent attacks on NGOs are a wake-up call for a variety of potential attacks on real Election Infrastructure (“EI”) and peripheral targets.  But the Critical Infrastructure (“CI”) designation should be for core EI; that is, voting machines and the election administration software and systems that manage voting machinery.

But an old school hacker who was part of the L0pht collective says such a change might do more harm than good.  “Classifying voting computers as critical infrastructure is going to cause a lot of headaches at the local level,” Cris Thomas, aka “Space Rogue,” tells MC [MC = “POLITICO Morning Cybersecurity”].

Critical Election Infrastructure (“CEI”) is not very different than other locally managed CI.  Not all CI is big corporate IT like financial transaction processing systems, or government-operated systems like the ATC, or quasi-public technology like the power grid operated by a variety organizations, but subject to many government regulations.  By contrast, we already have CI that is local, including local government operated.  For example, there are small local water utilities and municipal water treatment organizations.  Local first responders’ infrastructure is CI as well.  So, there is plenty of precedent for giving a CI designation to locally managed assets.

Because elections, even national elections, have been historically treated as a local event; having a federal designation as critical infrastructure will fundamentally change how we have handled our elections for the last 240 years.

CEI designation will not cause a fundamental change in the current situation where U.S. elections are a local matter.  Mr. Thomas is mistaken on this one point.  Local election organizations will have the same responsibilities, plus some new ones for managing CI.  But a county election administrator will still manage elections the day after or even the year after a critical infrastructure designation.  That cannot, should not, and will not change.

Thomas, now a strategist at Tenable Network Security, says the idea misses the point: We need to remain focused on the security concerns of the current system, which fall into two areas. First, many manufacturers are not testing the systems well enough before selling them to municipalities, often using off-the-shelf hardware and software with minimal security and using things like default, hard-coded passwords.

Of course, the existing voting machines have technical security issues—and at the risk of reading like we’re overly defending vendors, what computing system has none?  And of course, it’s also true that a CI designation won’t change these products’ default security posture.

at the same time, the local government certification agencies seldom have the time, resources and knowledge to properly test these computers for vulnerabilities, …

The same is true regarding certification process, although Mr. Thomas is mistaken about that process itself.  There are notlocal certification agencies,” but rather Federal and State organizations that certify the systems local (county) election jurisdictions are authorized to use. Nevertheless a CI designation will not increase the rigor of the certification process, and it won’t increase the capability of LEOs to do technical scrutiny of their own.

and often just accept a manufacturer’s claims of security.

We must also take exception on Mr. Thomas’s last comment.  The idea of certification sometimes amounts to “just accepting vendor security claims” —cannot be, and is not the case.  Although the current certification process isn’t as strong as we’d like, and though nearly all stakeholders want improvement, there are already clear requirements for vendors to demonstrate compliance with security related requirements.  On the other hand, misleading vendor claims about security can sway LEOs when selecting a certified system (and the choices are down to three vendors).

[T]he result is a system that our entire democracy depends on, which is run with minimal, easily bypassed security.

Sure, but its a mistake to focus solely on technical security problems of voting machines, particularly since these systems are not going to be replaced with better technology immediately upon a CI designation.  In the near term, the impact of CEI will be more on people and process, and less on technology itself.  LEOs will need help to build organizational capacity and expertise to manage physical assets as critical infrastructure, with physical security, personnel security, increased operational security processes, and the ability to demonstrate that a variety of kinds of people and process controls are actually being followed rather than merely mandated.

So, improvements in the human aspects and processes are the immediate value of a Critical Election Infrastructure designation.  Such a designation would need to clearly state that our local election officials (LEOs) are custodians of not just critical infrastructure, but infrastructure that is critical to our national security.

That’s never been a responsibility for LEOs, and many LEOs will be dismayed that they will be called upon to operate in ways that they never imagined would be important.  It will require long-term capacity building.  In the short term, there are many improvements in people and process that are possible, although unlikely unless there is a high sense of urgency and importance.  The designation of election infrastructure and critical infrastructure, however, can help create and maintain that urgency.

A better approach, Thomas says, is to increase funding for the National Voluntary Laboratory Accreditation Program run by NIST and the U.S. Election Assistance Commission.

We agree in principle, but this is not mutually exclusive with Critical Infrastructure.  Clearly, there is room for improvement, and NIST and EAC have important roles.  With Critical Election Infrastructure, their roles would need to enlarge, but reasonably so.

We also agree that more funding for these organizations’ election integrity efforts are necessary, but doing so is not an either / or decision in consideration of other aspects of CEI.  If Election Infrastructure is truly “critical” then several things must occur, including, but not limited to the additional support for NIST and EAC that Mr. Thomas is encouraging.

Here are three examples of improvement that a Critical Election Infrastructure designation would enable —though additional funding and expertise would be required.

  1. Do not connect anything relating to ballots, counting, voter check-in, etc. to the Internet, ever—and in many cases no local wireless networking should be allowed.  With CEI, using an Internet connection is no longer a convenience or shortcut in the grey area of safety—it’s a possible vulnerability with national security implications.
  2. Physically secure the election back-office systems.  The typical election management system (EMS) is a nearly decade old Microsoft Windows based application running on Personal Computers no longer manufactured, that are as easy to break into (“black hack”) as any ordinary PC.  Yet, they are the brains of the voting system, and “program” the voting machines for each election.  So put them in locked rooms, with physical access controls to ensure that only authorized people every touch them, and never one person alone.
  3. Perform physical chain of custody really well (i.e., for machines, paper ballots, poll books, precinct operations logs, —everything), with measurable compliance, and transparency on those measurements.  It’s just not reasonable to expect LEO Operations to do excellent physical chain of custody routinely everywhere, if these physical assets are not classed as CI.  They’re not funded or trained to operate physical security at a CI level.  So, there is plenty of room for improvement here, including new responsibility, resources, training, and accountability.  All of this may be low hanging fruit for improvement (not perfection) in the near term, but only if the mandate of CEI is made.

We hope this is helpful.  We’re glad to discuss issues of election integrity, security, and innovation whenever you want.  The co-founders have been in the technology sector for three decades.  Both have worked on critical infrastructure initiatives for the government.  The OSET CTO, John Sebes has been in digital security for over 30-years and is deeply experienced with the policy, protocols, and tools of systems and facilities security.  Our Advisory Board includes former US CTO Aneesh Chopra, digital security expert and CSO of Salesforce.com, Dr. Taher Elgamal, global expert on elections systems integrity, Dr. Joe Kiniry, DHS Cyber-Security Directorate Dr. Douglas Maughan, and several former state election officials.

Respectfully,

Gregory A. Miller
Co-Founder & Chief Development Officer

State Certification of Future Voting Systems — 3 Points of Departure

In advance of this week’s EVN Conference, we’ve been talking frequently with colleagues at several election oriented groups about the way forward from the current voting system certification regime. One of the topics for the EVN conference is a shared goal for many of us: how to move toward a near future certification regime that can much better serve state election officials in states that want to have more control, customization, and tailoring of the certification process, to better serve the needs of their local election officials.

Read more

The “VoteStream Files” A Summary

The TrustTheVote Project Core Team has been hard at work on the Alpha version of VoteStream, our election results reporting technology. They recently wrapped up a prototype phase funded by the Knight Foundation, and then forged ahead a bit, to incorporate data from additional counties, provided by by participating state or local election officials after the official wrap-up.

DisplayAlong the way, there have been a series of postings here that together tell a story about the VoteStream prototype project. They start with a basic description of the project in Towards Standardized Election Results Data Reporting and Election Results Reload: the Time is Right. Then there was a series of posts about the project’s assumptions about data, about software (part one and part two), and about standards and converters (part one and part two).

Of course, the information wouldn’t be complete without a description of the open-source software prototype itself, provided Not Just Election Night: VoteStream.

Actually the project was as much about data, standards, and tools, as software. On the data front, there is a general introduction to a major part of the project’s work in “data wrangling” in VoteStream: Data-Wrangling of Election Results DataAfter that were more posts on data wrangling, quite deep in the data-head shed — but still important, because each one is about the work required to take real election data and real election result data from disparate counties across the country, and fit into a common data format and common online user experience. The deep data-heads can find quite a bit of detail in three postings about data wrangling, in Ramsey County MN, in Travis County TX, and in Los Angeles County CA.

Today, there is a VoteStream project web site with VoteStream itself and the latest set of multi-county election results, but also with some additional explanatory material, including the election results data for each of these counties.  Of course, you can get that from the VoteStream API or data feed, but there may be some interest in the actual source data.  For more on those developments, stay tuned!

Election Results: Data-Wrangling Los Angeles County

LA County CA is the mother of all election complexities, and the data wrangling was intense, even compared to the hardly simple efforts that I reported on previously. There are over 32,000 distinct voting regions, which I think is more than the number of seats, ridings, chairs, and so on, for every federal or state houses of government in all the parliamentary democracies in the EU.

The LA elections team was marvelously helpful, and upfront about the limits of what they can produce with the aging voting system that they are working hard on replacing. This is what we started with.

  • A nicely structured CSV file listing all the districts in LA county: over 20 different types of district, and over 900 individual districts.
  • Some legacy GIS data, part of which defined each precinct in terms of which districts it is in.
  • The existing legacy GIS data converted into XML standard format (KML), again, kindly created byLA CC-RR IT chief, Kenneth Bennett.
  • A flat text file of all the election results for the 2012 election for every precinct in LA County, and various roll-ups.
  • A sort of Rosetta Stone that is just the Presidential election results, but in a well-structured CSV file, also very kindly generated for us by Kenneth.

You’ll notice that not included is a definition of the 2012 election itself – the contests, which district each contest is for, other info on the contest, info on candidates, referenda, and so. So, first problem, we needed to reverse engineer that as best as we could, from the election results. But before we could do that, we had to figure out how to parse the flat text file of results. The “Rosetta Stone” was helpful, but we then realized that we needed information about each precinct that reported results in the flat text file. To get the precinct information, we had to parse the legacy GIS data, and map it to the districts definition.

Second problem was GIS that wasn’t obvious, but fortunately we had excellent help from Elio Salazar, a member of Ken’s team who specializes in the GIS data. He helped us sort out various intricacies and corner cases. One of the hardest turned out to be the ways in which one district (say, a school district) is a real district used for referenda, but is also sub-divided into smaller districts each being for a council seat. Some cities were subdivided this way into council seats, some not; same for water districts and several other kinds of districts.

Then, as soon as we thought we had clear sailing, it turned out that the districts file had a couple minor format errors that we had to fill by hand. Plus there were 4 special case districts that weren’t actually used in the precinct definitions, but were required for the election results. Whew! At that point we though we had a complete election definition including the geo-data of each precinct in KML. But wait! We had over 32,000 precincts defined, but only just shy of 5,000 that reported election results. I won’t go into the details of sub-precincts and precinct consolidation, and how some data was from the 32,000 viewpoint and other data from the 4,993 viewpoint. Or why 4,782 was not our favorite number for several days.

Then the final lap, actually parsing all the 100,000 plus contest results in the flat text file, normalizing and storing all the data, and then emitting it in VIP XML. We thought we had a pretty good specification (only 800 words long) of the structure implicit in the file. We came up with three major special cases, and I don’t know how many little weird cases that turned out not to be relevant to the actual vote counts. I didn’t have the heart to update the specification, but it was pretty complex, and honestly the data is so huge that we could spend many days writing consistency checks of various kinds, and manual review of the input to track down inconsistencies.

In the end, I think we got to a pretty close but probably not perfect rendition of election results. A truly re-usable and reliable data converter would need some follow-on work in close collaboration with several folks in Ken’s team — something that I hope we have the opportunity to do in a later phase of work on VoteStream.

But 100% completeness aside, we still had excellent proof of concept that even this most complex use case did in fact match the standard data model and data format we were using. With some further work using the VIP common data format with other counties, the extended VIP format should be nearly fully baked and ready work with the IEEE standards body on election data.

— EJS

Election Results: Data-Wrangling Travis County

Congratulations if you are reading this post, after having even glanced at the predecessor about Ramsey County data wrangling — one of the longer and geekier posts in recent times at TrustTheVote. There is a similar but shorter story about our work with Travis County Texas. As with Ramsey, we started with a bunch of stuff that Travis Elections folks gave us, but rather than do the chapter and verse, I can summarize a bit.

In fact, I’ll cut to the end, and then go back. We were able to fairly quickly develop data converters from the Travis Nov 2012 data to the same standards-based data format we developed for Ramsey. The exception is the GIS data, which we will circle back to later. This was a really good validation of our data conversion approach. If it extends to other counties as well, we’ll be super pleased.

The full story is that Travis elections folks have been working on election result reporting for some time, as have we at TrustTheVote Project, and we’ve learned a lot from their efforts. Because of those efforts, Travis has worked extensively on how to use the data export capabilities of their voting system product’s election management system. They have enough experience with their Hart Intercivic EMS that they know exactly the right set of export routines to use to dump exactly the right set of files. We then developed data converters to chew up the files and spit out VIP XML for the election definitions, and also a form of VIP XML for the vote tallies.

The structure of the export data roughly corresponds to the VIP schema; one flat TXT file that presents a list of each of the 7 kinds of basic item (precinct, contest, etc.) that we represent as VIP objects; and 4 files that express relations between types of objects, e.g. precincts and districts, or contests and districts. As with Ramsey, the district definitions were a bit sticky. The Travis folks provided a spreadsheet of districts, that was a sort of extension of the exports file about districts. We had to extend the extensions a bit, for similar reasons outlined in the previous account of Ramsey data-wrangling. The rest of the files were a bit crufty, with nothing to suggest the meaning of the column entries other than the name of the file. But with the raw data and some collegial help from Travis elections folks, it mapped pretty simply to the standard data format.

There was one area though, where we learned a lot more from Travis. In Travis with their Hart system, they are able to separately track vote tallies for each candidate (of course, that’s the minimum) as well as: write-ins, non-votes that result from a ballot with no choice on it (under-votes), and non-votes that result from a ballot with too many choices (over-votes). That really helped extend the data format for election results, beyond what we had from Ramsey. And again, this larger set of results data fit well into our use of the VIP format.

That sort of information helps total up the tallies from each individual precinct, to double check that every ballot was counted. But there is also supplementary data that helps even more, noting whether an under or over was from early voting, absentee voting, in person voting, etc. With further information about rejected ballots (e.g. unsigned provisional ballot affadavits, late absentee ballots), one can account for every ballot cast (whether counted or rejected), every ballot counted, every ballot in every precinct, every vote or non-vote from individual ballots — and so one — to get a complete picture down to the ground in cases where there are razor thin margins in an election.

We’re still digesting all of that, and will likely continue for some time as we continue our election-result work beyond the VoteStream prototype effort. But even at this point, we think that we have the vote-tallies part of the data standard worked out fairly well, with some additional areas for on-going work.

— EJS

Election Results: Data-Wrangling Ramsey County

Next up are several overdue reports on data wrangling of county level election data, that is, working with election officials to get legacy data needed for election results; and then putting the data into practical use. It’s where we write software to chew up whatever data we get, put it in a backend system, re-arrange it, and spit it out all tidy and clean, in a standard election data format. From there, we use the standard-format data to drive our prototype system, VoteStream.

I’ll report on each of 3 and leave it at that, even though since then we’ve forged ahead on pulling in data from other counties as well. This reports from the trenches of VoteStream will be heavy on data-head geekery, so no worries if you want to skip if that’s not your cup of tea. For better or for worse, however, this is the method of brewing up data standards.

I’ll start with Ramsey County, MN, which was our first go-round. The following is not a short or simple list, but here is what we started with:

  • Some good advice from: Joe Mansky, head of elections in Ramsey County, Minnesota; and Mark Ritchie, Secretary of State and head of elections for Minnesota.
  • A spreadsheet from Joe, listing Ramsey County’s precincts and some of the districts they are in; plus verbal info about other districts that the whole county is in.
  • Geo-data from the Minnesota State Legislative GIS office, with a “shapefile” for each precinct.
  • More data from the GIS office, from which we learned that they use a different precinct-naming scheme than Ramsey County.
  • Some 2012 election result datasets, also from the GIS office.
  • Some 2012 election result datasets from the MN SoS web site.
  • Some more good advice from Joe Mansky on how to use the election result data.
  • The VIP data format for expressing info about precincts and districts, contests and candidates, and an idea for extending that to include vote counts.
  • Some good intentions for doing the minimal modifications to the source data, and creating a VIP standard dataset that defines the election (a JEDI in our parlance, see a previous post for explanation).
  • Some more intentions and hopes for being able to do minimal modifications to create the election results data.

Along the way, we got plenty of help and encouragement from all the organizations I listed above.

Next, let me explain some problems we found, what we learned, and what we produced.

  •  The first problem was that the county data and GIS data didn’t match, but we connected the dots, and used the GIS version of precint IDs, which use the national standard, FIPS.
  • County data didn’t include statewide districts, but the election results did. So we again fell back on FIPS, and added standards-based district IDs. (We’ll be submitting that scheme to the standards bodies, when we have a chance to catch our breath.)
  • Election results depend on an intermediate object called “office” that links a contest (say, for state senate district 4) to a district (say, the 4th state senate district), via an office (say, the state senate seat for district 4), rather than a direct linkage. Sounds unimportant, but …
  • The non-local election results used the “office” to identify the contest, and this worked mostly OK. One issue was that the U.S. congress offices were all numbered, but without mentioning MN. This is a problem if multiple states report results for “Representative, 1st Congressional District” because all states have a first congressional district. Again, more hacking the district ID scheme to use FIPS.
  • The local election results did not work so well. A literal reading of the data seemed to indicate that each town in Ramsey County in the Nov. 2012 election had a contest for mayor — the same mayor’s office. Ooops! We needed to augment the source data to make plain *which* mayor’s office the contest was for.
  • Finally, still not done, we had a handful of similarly ambiguous data for offices other than mayor, that couldn’t be tied to a single town.

One last problem, for the ultra data-heads. Turns out that some precincts are not a single contiguous geographical region, but a combination of 2 that touch only at a point, or (weirder) aren’t directly connected. So our first cut at encoding the geo-data into XML (for inclusion in VIP datasets) wasn’t quite right, and the Google maps view of the data, had holes in it.

So, here is what we learned.

  • We had to semi-invent some naming conventions for districts, contests, and candidates, to keep separate  everything that was actually separate, and to disambiguate things that sounded the same but were actually different. It’s actually not important if you are only reporting results at the level of one town, but if you want to aggregate across towns, counties, states, etc., then you need more. What we have is sufficient for our needs with VoteStream, but there is real room for more standards like FIPS to make a scheme that works nationwide.
  • Using VIP was simple at first, but when we added the GIS data, and used the XML standard for it (KML), there was a lot of fine-tuning to get the datasets to be 100% compliant with the existing standards. We actually spent a surprising amount of time testing the data model extensions and validations. It was worth it, though, because we have a draft standard that works, even with those wacky precincts shaped like east and west Prussia.
  • Despite that, we were able to finish the data-wrangling fairly quickly and use a similar approach for other counties — once we figured it all out. We did spend quite a bit of time mashing this around and asking other election officials how *their* jurisdictions worked, before we got it all straight.

Lastly, here is what we produced. We now have a set of data conversion software that we can use to start with the starting data listed above, and produce election definition datasets in a repeatable way, and making the most effective use of existing standards. We also had a less settled method of data conversion for the actual results — e.g., for precinct 123, for contest X, for candidate Y, there were Z votes — similar for all precincts, all contests. That was sufficient for the data available in MN, but not yet sufficient for additional info available in other states but not in MN.

The next steps are: tackle other counties with other source data, and wrangle the data into the same standards-based format for election definitions; extend the data format for more complex results data.

Data wrangling Nov 2012 Ramsey County election was very instructive — and we couldn’t have done it without plenty of help, for which we are very grateful!

— EJS

VoteStream: Data-Wrangling of Election Results Data

 

 

 

If you’ve read some of the ongoing thread about our VoteStream effort, it’s been a lot about data and standards. Today is more of the same, but first with a nod that the software development is going fine, as well. We’ve come up with a preliminary data model, gotten real results data from Ramsey County, Minnesota, and developed most of the key features in the VoteStream prototype, using the TrustTheVote Project’s Election Results Reporting Platform.

I’ll have plenty to say about the data-wrangling as we move through several different counties’ data. But today I want to focus on a key structuring principle that works both for data and for the work that real local election officials (LEOS) do, before an election, during election night, and thereafter.

Put simply, the basic structuring principle is that the election definition comes first, and the election results come later and refer to the election definition. This principle matches the work that LEOs do, using their election management system to define each contest in an upcoming election, define each candidate, and do on. The result of that work is a data set that both serves as an election definition, and also provides the context for the election by defining the jurisdiction in which the election will be held. The jurisdiction is typically a set of electoral districts (e.g. a congressional district, or a city council seat), and a county divided into precincts, each of which votes on a specific set of contests in the election.

Our shorthand term for this dataset is JEDI (jurisdiction election data interchange), which is all the data about an election that an independent system would need to know. Most current voting system products have an Election Management System (EMS) product that can produce a JEDI in a proprietary format, for use in reporting, or ballot counting devices. Several states and localities have already adopted the VIP standard for publishing a similar set of information.

We’ve adopted the VIP format as the standard that that we’ll be using on the TrustTheVote Project. And we’re developing a few modest extensions to it, that are needed to represent a full JEDI that meets the needs of VoteStream, or really any system that consumes and displays election results. All extensions are optional and backwards compatible, and we’ll be submitting them as suggestions, when we think we got a full set. So far, it’s pretty basic: the inclusion of geographic data that describes a precinct’s boundaries; a use of existing meta-data to note whether a district is a federal, state, or local district.

So far, this is working well, and we expect to be able to construct a VIP-standard JEDI for each county in our VoteStream project, based on the extant source data that we have. The next step, which may be a bit more hairy, is a similar standard for election results with the detailed information that we want to present via VoteStream.

— EJS