The annual meeting of U.S. elections standards board is this week. In addition to standards board members, several observers are here, and will be reporting. The next few blogs are solely my views (John Sebes), but I’ll do my best to write what I think is a consensus.
However, today I’ll start with a closely related topic — election data standards — because I think it will be helpful to refresh the readers’ memory about where standards fit in, and how important they are. I’ll do that explaining 4 benefits that are under discussions today.
One type of standards-enabled interoperability is data exchange. One system needs data to do its job, and the source data is produced by another system; but the two systems don’t speak the same language to express the data. In election technology, a common example is election results. Commercial election management system (EMS) products produce election definitions and election results data in their own format, because until recently there wasn’t a standard. Election reporting systems need to consume that data, but it’s hard to do because different counties (and other electoral jurisdictions) use different formats. For example, in California, a complete collection of results from all counties would involve 5 different proprietary or legacy formats, perhaps more in cases where two counties use the same EMS product but very different versions.
Large news organizations, as well as academics and other research organizations including the TrustTheVote Project, can put a lot of effort into “data-wrangling” and come up with something that’s nearly uniform. It’s time consuming and error prone, and needs to be done several times as election results get updated from election night to final results. But more to the point, election officials don’t have a ready, re-usable technical capability to “just get the data out.”
Well, now we have a standard for U.S. election definitions and election results (more on that in reporting from the annual conference this week). What does that mean? In the medium to long term, the vendors of all the EMS products could support the new standard, and consumers of the data (elections organizations themselves, election reporting products, in-house tools of big news organizations, and of course open source systems like VoteStream) can re-tool to use standards-compliant data. But in the short to medium term, elections organizations, and their existing technology base, need the ability to translate from existing formats to the standard. (A big part of our just-restarted work on VoteStream is to create a translator/aggregator toolset for election officials, but more on that as VoteStream reporting proceeds.)
Interoperability by itself is great in some cases, if the issue is mainly getting two systems to talk to one another. For example, at the level of an individual county, election reporting is mostly a matter of data transfer from the EMS that the county uses, to an election result publishing system. Some counties have created a basic web publishing system that consumes results from their EMS. However, it’s not so easy for any county to re-use such a solution unless they use and EMS that speaks exactly the same lingo.
For another example at the local level, a standards-compliant election definition data set can be bridge between and EMS that defines the information on each ballot, and a separate system that consumes an election definition and offers election officials the ability to design the layout of paper ballots. (In the TrustTheVote Project, we call that our Ballot Design Studio.) The point here is that data standards can enable innovations in election tech, because various different jobs can be delegated to systems that specialize in that job, and these specialized systems can inter-operate with them.
Component interoperability by itself is not so great if you’re trying to aggregate multiple datasets of the same kind, but from different sources. Taking election result reporting as the example again, here is a problem faced by consumers of election results. Part of one county votes in one Federal congressional district, and part of another county votes in the same district. Each county’s EMS assigns some internal identifier to each district, but it’s derived from whatever the county folks use; this is true even if an election result is represented in the new VSSC Standard. In one county, the district — and by extension the contest for the representative for the district — might be called the 4th Congressional District, while in the other it might be CD-4. If you’re trying to get results for that one contest, you need to be able tell that those are the same district and the results for the contest need to include numbers from both counties.
Currently, consumers of this data have processes for overcoming these challenges, but that ability is limited to each consumer org, in some cases private to that org. But what election officials need from standards is the ability to automatically aggregate disparate data sets. Ahh, more standards!
This exact issue is one of the things we’re discussing this morning at the standards meeting: a need for a standard way to name election items that span jurisdictions or even elections in a single jurisdiction.
Combination is closely related to aggregation, except that aggregation is combined data sets of the same kind, while combination occurs when we have multiple data sets, each containing different but complementary information about some of the same things. That was one of the challenges we had in VoteStream Alpha: election results referred to precincts (vote counts per precinct), GIS data also (the geo-codes representing a precinct), and voter-registration statistics as well (number of registered voters per precinct, actually several stats related). But many precincts had a different name in each data source! That made it challenging, for example, to report election results in the context of how registration and turnout numbers, and using mapping to visualize variations in registration levels and turnout numbers.
We’ll be showing how to automate the response to such challenges, as part of VoteStream Beta, using the data standards, identifiers, and enumerations under discussion right now.
That’s the report from the morning session. More later …
— John Sebes