Wednesday, January 09, 2013

Database Cleanup :: It's bad, but not as bad as expected

Before I began working on "cleaning up" my database in December, I took an "inventory" of the data, partially so I'd have some idea of what the project would entail, partly because I'd never looked at the database in this way before, and to some extent so I'd have some tangible evidence that something had actually been accomplished.

I've got two databases - one for Mom's ancestral lines and the other for Dad's families. For this project I'm working with the 'Phend-Brubaker' database, which is Mom's lines.
  • 7179 individuals are in the database
  • 1123 individuals were added in 2012 (between January 1st and December 10th) [15.6%]
  • 162 do not have a surname [2.3%] - I was actually surprised that this number wasn't higher. These are primarily women whose first names are from census records and for whom marriage records have not yet been obtained. And, of course, some are the very early ancestors whose maiden name seems to have never been recorded anywhere.
  • 77 do not have a birth date [1%] - Most of these do have a christening or baptism date
  • 1498 have an estimated birth date [23.7%] - Most are people for whom I have only census records and enter their dates as "about xxxx"
  • 1316 do not have a birth place [18.3%] - I could "guess" where they were born based on where the family lived but should I? Some of these were simply missed when I entered the census records.
  • 1434 are Living [20%] - Except in certain cases, if I receive information on people born after 1940, their information is entered into notes rather than creating a profile for them. The profiles of some of these people will likely "disappear" as they are moved into notes for their parents. They are "left over" from when I made an adjustment regarding who would have a profile added to the database.
  • 5744 are Dead [80%] - anyone over 105 years of age is automatically marked as deceased
Of the 5744 who are dead (Whoa! Lots of work to do in this area!):
  • 3738 have a death date [about 65.1%]
  • 2006 do not have a death date [about 34.9%]
  • 2582 have a death place [45%]
  • 3162 do not have a death place [55%] 
Of the 3738 who have a death date
  • 2577 have a death place [68.9%]
  • 1161 do not have a death place. [31.1%] Many of these death dates come from the SSDI and cemetery records, which are clues but not definitive as far as place of death.)
Missing Sources - This does not mean that I don't have a source, just that it has not been entered as such. A lot of source information was entered in notes "way back" when software didn't have sourcing capabilities. And I just never got around to doing anything about it.  the numbers aren't as bad as I thought they would be though.
  • 1621 - Births with no source
  •  579 - Marriages with no source
  •  230 - Deaths with no source
  •  317 - Burials with no source
  •  314 - Individuals with events missing sources (the individuals may have multiple events that don't have sources.)
General Notes
  • 4649 Individuals have "general" notes. This could be a brief sentence or "everything" I've found on someone but most likely something in-between. It could also be information on living people born since 1940 that have been entered into notes for their parents. Regardless, I hope to see this number fall substantially as the information is moved into events and put into sources.
The Plan, in order of priority, is to review the individuals with:
  • missing sources - which should also eliminate some of the General Notes
  • no birth date - if necessary, determine a best estimate
  • no surname - depending on where they lived, marriage records may be online
  • general notes - in some cases the general notes won't be eliminated so I'll come up with some way of identifying those that have been reviewed
Some headway has been made since the first week of December but there is still a lot to be done. Overwhelming almost. The tagging feature in Legacy (the software that I use) has been indispensable in this process and allows me to work on little bits at a time. It's a nice feeling when I get the "There are no Tagged Individuals" message. It means one small part is completed and I can move on to the next task.

Published under a Creative Commons License.
Becky Wiseman, "Database Cleanup :: It's bad, but not as bad as expected," Kinexxions, posted January 9, 2013 ( : accessed [access date])


Barbara Poole said...

Becky, very interesting stats. Now, I would like to do the same thing, but how? Could you please give me easy instructions? Thank you.

Linda Edwards said...

As always I'm impressed with your doggedness on such a daunting project.

Unknown said...

In the midst of bringing my database up to GPS as far as sourcing goes. My two database combined are not nearly as large as your one database, only about 2500 individuals.

I wish you luck in this endeavor. I can tell you, one fringe benefit, is new leads and ideas come up as you go along.

Becky Wiseman said...

Barbara, do you use Legacy?

Becky Wiseman said...

Thank you Linda. I've been putting this off for sooooo long and finally decided to 'just do it' and am hoping it won't take all year to finish!

Becky Wiseman said...

Thanks Rorey, I can use all the luck I can get! As far as bringing sources up to any standard - well, that will be a future project. I'm trying to be do it better as I source the data that didn't get entered but the sources that are already there will wait for another time. Something to look forward to!

I'm enjoying getting 'reacquainted' with some of the people in my database as I go along in this process. And I'm making notes on future 'To Do' items as well.