There was an interesting post on The Scoop (Derek Willis' weblog on investigative and computer assisted reporting) yesterday that focused on a couple of characteristics of FEC data that deserve more discussion.
Derek points out that the list of Leadership PACs in our data catalog has some limitations. First, he notes that there are a couple of places where you might find the name of the office holder or federal candidate who is the "leader" of the PAC. Sometimes its in a place called "sponsor name" and sometimes its in a place called "affiliated committee name."
I think this is a symptom of a larger problem - building an electronic format for receiving information that is still tied to a paper version of the same thing. In this case, if you look at the paper version of a Statement of Organization - the source for information about Leadership PAC sponsors, it turns out that the place to put the sponsor name is the same real estate that would be used to name any affiliated committees. (Are you sorry you started reading this yet?) This seemed like a good idea for the people designing a paper form - a more efficient use of space for a form that's already pretty long - but it leads to this ambiguity and confusion when the material gets translated from paper to data. (The software we make available to committees that file electronically doesn't have this confusion, and I suspect that the commercial programs that some committees use doesn't either, but we still have some paper filers - Senate campaigns, for example. . . )
This is something we need to work on - revisiting how we ask for and receive raw material - and your ideas are welcome.
Derek's second point relates to unique identifiers for people and organizations that are the building blocks of our data. When the scheme for creating ID numbers for candidates was created in the mid 1970's it was thought to be important to include as much information in the visual presentation of the ID as possible, so things like the office sought, state and district number were included to save some space and help people understand the ID's. This leads to problems, though, when circumstances change over time. For us, it meant that when a person decided to run for a second federal office (House members running for the Senate, for example) that person got a new ID. This is fine for some purposes - tracking financial activity related to a specific election - but a problem for other purposes - the campaign history of a single individual.
I think we can fix this by adding a second and unique identifier to each person who has run for federal office while also maintaining the system we've used throughout the years. Derek points to one that already exists - developed by Congress itself - and we need to find out if we can use the same one and extend it for individuals who didn't win their races.