We sat down yesterday afternoon with a group of users of our FTP files to talk about changes we're planning as we move to a different database platform. First, thanks to everyone who participated - it was very helpful to us and hopefully will improve what we are able to provide going forward.
The main focus of the meeting was the fact that the process for updating and maintaining the most current set of detailed transactions is different for the new database than the process we used in the old days. The set of records coming from the new tables will mostly be the most recent version of each contribution or disbursement transaction coming from the most recently filed amendment to each financial report we receive. (In the past, most of the records were from the original filing, not amendments, if no changes to specific records were made.) We talked about several possible approaches to tracking the history of each transaction as amendments are provided, and always providing unique identifying information for each transaction, even though that identifier will be different from the one used in the current versions of the files. For example, each filer provides a "transaction id" for each record and that value is supposed to remain the same in any subsequent amendments to the same filing. We can include this information, along with the unique identifier for each record. There may be other information that will help track changes within each of the files going forward - we're going to spend some time working on them and we'll report back and ask for comments again when we have a format we think will work best. We also agreed that we would provide the data in both formats for a period of time so that people can make whatever adjustments are needed to move to the new format.
One of the difficult processes for us now is the creation of distinct "add" "change" and "delete" files that we started because we thought people wouldn't want to download the large files of individual contributions from scratch each time they wanted updated information. No one who participated in the meeting uses those files, though, so we're considering eliminating them in the new formats. Let us know if this would cause anyone serious concern.
We also spent some time talking about changes to the data itself. For example, the new system allows us to provide all of the contributions from individual people that were itemized on reports, rather than only those where the specific contribution amount is $200 or more. This means that there will be a lot of transactions in the new file with amounts of $10 or $50 or $100, etc. because the total amount given by that specific person has gone beyond the $200 threshold for itemization. It will be important to remember, though, that this doesn't represent all of the "small" contributions committees receive - there will still be no itemized information for people who give less than $200 in the aggregate.
We also talked about some straightforward changes to the format like including two different fields for employer and occupation of donors, rather than combining this information as the old format did, and we discussed including more address information in the files - we'll consider those as we move forward.
There were several other ideas and concerns we discussed. There is a strong desire to update the files on a more flexible schedule, particularly around filing dates. We'll look at the possibility of including new versions shortly after filing dates when those come early in a week. Similarly, we'll look for ways to create new files (or maybe additional information in summary files) that identify the candidates and committees that have most recently registered with us.
We'll begin the change by creating new file formats for the 2012 election cycle, and as we move forward we'll also create new versions of the files going back in time. We'll likely stop with the 2008 election cycle, though, because we don't have complete data from paper filings in earlier cycles.
There was support for more file formats - e.g. json - in addition to the csv and xml files we've been providing through the data catalog. This wasn't news to us, and we're looking for ways to build these as we search for solutions to providing very large files (detailed transactions) with more flexibility (searching, sorting, etc.) than we have now. We also talked about the need for a single unique identifier for each person who has run for federal office, that would stay with that person even if they ran for different offices over time. While we aren't a perfect source for this information, there are some times when people make it clear to us that they are changing from House to Senate or President, for example - so we'll think about what we might do in this area too, but it wouldn't necessarily be perfectly reliable based on primary source information we receive.
Finally (and not something we talked about yesterday) - anyone who worked through the 2008 Presidential campaign will remember the pain caused by the fact that the presidential reporting form didn't include a summary line for "unitemized total from individuals." That information WILL be included in 2011 and 2012 filings from Presidential campaign committees.