Election Standards, Day 2
John Sebes
The second day of the annual meeting of the Voting System Standards Committee was Friday Feb. 6. I’m concluding my reporting on the meeting with a round up of existing and proposed standards activity that we discussed on day 2. Each item below is about an existing working group, a group in formation, or proposed groups.
Election Process Modeling
This working group isn’t making a standard, but rather a guideline: a semi-formal model for the typical or common processes for U.S. election administration and election operations. The intent here is to document, in a structured manner, the various use cases where there needs to be data transfer from one system, component, or process to another. That will make it much easier to identify and prioritize such cases where data interoperability standards may be needed; from there, folks may choose to form a working group to address some of these needs.
This group is well along under the leadership of LA’s Kenneth Bennett, but still it’s a work in progress. I’ll be reporting more as we go along.
Digital Poll Books
This just formed working group, led by Ohio’s John Dziurlaj, will develop a standard data format for digital poll books. The starting point is to define a format that can accommodate data interchange between a voter registration system and digital pollbook — for example, a list of voters, each one with a name, address, voter status (e.g. in person vs. absentee voter). The reverse flow — all that plus a note of whether each voter checked in to vote, when, etc. — is included as well of course, but there are some other subtler issues. For example, it’s not enough to simply provide the data in that reverse flow; you need to also include data that ensures that the check-in records are from a legitimate source, and not modified. Without that, systems would be vulnerable to tampering that causes some ballots to be counted that shouldn’t, and vice versa. Also, not every pollbook does its job based on purely local pollbook records. Some rely on a callback to a central system that co-ordinates information flow among lots of digital pollbooks, and there are several hybrid models as well.
Also, there are privacy issues. In the paper world, every pollbook record was legally a public document, without including what we would now call “personal identifying information” (PII). More recently, with strong voter ID requirements, a voter check-in needs to include a comparison of a presented ID number (such as a driver’s license number) with the ID number that’s part of a voter’s registration record. Today, such ID numbers are often included in e-pollbook data, but that’s not ideal because each e-pollbook becomes a trove of PII at risk. In the upcoming data standards work, we may be able to include some optional privacy guards, like a way to store PII in cryptographically hashed form, to protect privacy but still enable a valid equivalence check — just the same way that stored-password system do.
Voting Methods Models
This newer group, led by Laura Massa-Lochridge, is also creating not a standard but a guideline to be used as a “standard” reference for other work. In this group, the focus is on the various individual approaches to voting on a ballot item and counting the votes. A familiar one is “vote for one” where the candidate with the most votes is the winner. Also familiar and well understood is “vote for N out of M”. Further, each of these has different semantics; for example, some vote-for-one contests have no winner when no candidate reaches a threshold, thus triggering a run-off. Familiar to some, but not so well understood, is “instant run-off”. In fact there different flavors of IRV, and in some cases it is not actually obvious which one is wanted or used in a particular jurisdiction. From there we get into heavy-duty election geek-dom with ranked choice voting and single transferrable vote.
The goal of this working group is to develop a formal mathematical model specifying precisely what’s meant for each variation of each voting method, with consensus from all who choose to participate, with process, oversight, and validation from an official international standards body. The result should be a great reference for elections officials and legislators to refer to, instead of (as is common now) simply referring to voting method by name, or by writing a counting algorithm into law.
Voting Machine Event Logs
This standard is nearly done, thanks to the leadership of NIST’s John Wack. It deserves more than a bit of explanation because it is a great example of both how the standards work, and the value of standard open data.
Every voting system has components for casting or counting ballots, and U.S. requirements for them include the requirement to do some logging of events that would provide researchers with data to analyze in order to assess how well the components operate, how effectively voters are able to use them, or so forth. Every product does some kind of logging, but each one’s log data format is in a different, proprietary format. So, VSSC has a data standard for log data, to enable vendors to provide logs in a common format that enables log analysis tools to combine and collate data from various systems.
So far, not the most thrilling part of standards work, but necessary to ensure that techies can understand what’s going wrong — or right — with real systems in operation. As many are aware, the current crop of voting system products do seem to misbehave during elections, and it’s important for tech assessment to learn whether there really were any faults (as opposed to operator error) and if so what. However, the curious part of this standard is not that provides a standard format for data common to pretty much every system (that’s why we call them common data formats!) like date/time, event code, event description, etc. Rather, the curious part is that it doesn’t try to provide a complete enumeration of all common events. Sure, most systems have an event that means “Voter cast the ballot” or “Completed scanning a ballot” but one vendor may call this “event 37” and another “event 29”.
Why not enumerate these in the standard? Well, for one thing ,it is hard to get a complete list, and as systems add more logging capabilities over time, the list grows. We want to issue the standard now, and don’t want to bake into it an incomplete list. (Once a standard is issued, it is some work to update it, and typically a standards group would prefer to use their efforts to standardize new stuff rather than revise old standards.) So the approach taken is different. It’s typical of many of the standards we’re working on, which is why I want to explain it for this standard. The approach is to have a particular part of the data format that’s expected to be filled by an event identifier that could be one of a canonical list defined elsewhere. It’s like the standard is saying “this ID field is just a string, but systems can choose to fill it with a string that’s from some canonical list that’s beyond the scope of this standard.” Also, the data format allows for a sort of glossary to be part of a dataset, to enable a dataset to essentially say “you’re going to see a bunch of event 37’s and in my lingo that means voter cast a ballot.”
The intent of course is that systems that conform to the standard will also choose to use this canonical list, which can grow over time, without requiring modifications to the standard. That’s nice but it begs the question: who maintains this list, and how does the maintainer allow people to submit additions to it? Good question. No answer yet, but that’s not a barrier to using the standard, and the early adopters will in essence start the list and figure out who is going manage it.
Event Logs for Voter Records
This is a topic for a to-be-formed working group, focused on issues very similar to those of the Event Log group described above, but for events of a voter records systems. The type of events we’re talking about here are things like: voter registration request rejected (including why); voter’s address change accepted; voter’s absentee ballot accepted for counting; voter’s provisional ballot rejected (and why); voter checked in to vote in person; and so on. The format will likely be pretty similar to the other event log format, and much of the discussion will be similar to above groups: whether there is a complete enumeration of actions or objects; whether to rely on external canonical lists; how to not expose PII, but allow a record can uniquely identify the voter in question (so that we can recognize when multiple events were about the same voter).
What types of interoperability would this support? Automated reporting, and data mining in general — again, larger issue — but one example is that is would support automated reporting that compares military voters to other voters in terms of voting outcomes: numbers and percentages of voters who voted absentee vs. in person, absentee voters who ballots where counted vs. rejected and if so why …
This type of reporting is already required of localities and states to the Federal government, and it is currently very burdensome for many election officials to create. As a result, one of the enthusiastic supporters of this nascent effort is a recently appointed EAC Commissioner who until recently was a state election who was official grumpy over the burden of this type reporting, but is now on the Federal commission requiring the reporting. So you can see that although not the most thrilling human endeavor, standards work can have its elements of irony. 🙂
Cast Vote Records
Another topic for a to-be-formed working group, this is about how to extend the existing .2 standard (election result reporting) to describe just the votes recorded from a single ballot, together with other data (for example an image of a paper ballot from which the votes were recorded) that would be needed to support ballot audits. The whole larger issue of ballot audits is … larger; but you can read more about it in past posts here (search for “audit”) or elsewhere by a web search on “risk limiting audits elections.”
Ballot Specifications
Another topic for a to-be-formed working group, this is about how to extend the existing .2 standard’s description of ballot items (contests, candidates, referenda, questions, etc.) that is currently limited to what’s needed for results reporting. The extensions could be limited to extensions needed to display online sample ballots, but could extend much further. Some of us have a particular interest in supporting “interactive sample ballots” which is, again, a larger issue, but more on that as the work unfolds.
Common Identifiers and OCD
Lastly, we also discussed more of the common identifier issue that I reported on earlier in day one. It turns out that this is another instance, thought slightly more complicated, of the issue facing a number of standards that I described above: semantic interoperability. In the .2 standard, we don’t want to bake into it an incomplete list of every possible district, office, precinct, etc. — even though we need common identifiers for these if two datasets can be interpreted as referred to the same things.
So, again, we have the issue of a separate canonical list. However, in this case, the space is huge, and the names (unlike event types) wouldn’t be self-identifying; and the things named could have multiple valid names. So there will no doubt be large directories of information about these political units, using common naming schemes. But to avoid these becoming a large muddle, we do have a smaller problem of smaller canonical lists, for example, a list of the names of all the types of district used in each state. With that, we could use existing naming schemes in a canonical way.
The most promising (by consensus of those working on standards anyway) naming scheme is that of the Open Civic Data project, including IDs of exacting this sort. The scope for OCD-IDs is broad: defining a handle for pretty much any government entity in any country, so that various organizations that have data on those entities can publish that data using a common identifier, enabling others to aggregate the data about those entities. It’s much broader than U.S. electoral districts. However, it’s already in use, including U.S. electoral districts. However, as I described above, the fly in the ointment is that plethora of types of electoral district; for a common unique name, you need to include the type of district, for example, the fire control district in CA’s San Mateo County that’s known as Fire District #3.
OK, so what, who, how will this registry, or directory, or curated list — whatever you might call it — get created and managed? Still a good question, but at least we have some clarity on what needs to be done, and maybe a bit of the how, as well. Stay tuned.
If we had this missing link (a canonical scheme for names of U.S. electoral districts) then we could use OCD-IDs (or extensions of FIPS geo codes for that matter) as an optional but commonly used and standards-based approach for constructing unique identifiers for electoral districts. Organizations that choose to use the naming scheme could issue VSSC.2 datasets that could be aggregated with others that also use the scheme. And then, people could have a much easier time aggregating those election result datasets to get large scale election results. At the risk of fore-shadowing, that’s actually a big deal to data-heads, public interest groups, and news organizations alike, as eloquently explained by a speaker at the next annual conference, which was this week in DC.
Coming Soon
At that conference — NIST/EAC Future of Voting Systems Symposium II — will be the topic of my next few reports!
— EJS