DEVELOPMENT OF A DIGITAL GIS DATABASE FOR AUGUSTA CO., VIRGINIA, 1860 -1870:
OVERVIEW, OUTLINE, AND DETAILED DISCUSSION OF PLANS AND PROCEDURES FOR DATA AUTOMATION

Compiled by:
Steve Thompson and Ariel Lambert, June 1999-August 1999


TABLE OF CONTENTS

OVERVIEW

OUTLINE OF PROCEDURES

  1. Photographing of the Hotchkiss Map
  2. Constructing a Single Digitial Image of the Hotchkiss Map
  3. Geo-referencing and Rectification of the Digital Image
  4. Creating Digitial Vector Coverages from the Geo-Referenced Image
    1. Line Coverages
    2. Polygon Coverages
    3. Point Coverages
  5. Checking and Cleaning Census Match Excel Files ("aumap.xls" and derivatives")
  6. Importing Data Files and Joining with Arc/Info Point Attribute Tables (.pat)
  7. Creating "Locale" Polygon Coverages and Compiling Aggregate Statistical Data for These Areas
  8. Recreating the Routes of the Census Takers
  9. Merging District-Wide Coverages Into County-Wide Coverages
  10. Adding or Relating Data Bases to the Attribute Tables of Digital Geographic Coverages
  11. Staunton
  12. Preparing the Final Data File


OVERVIEW

Development of a digital GIS data base for Augusta County, Va. is based upon the existence of a detailed map of the county draw by Jedediah Hotchkiss in 1870 and derived in large measure from existing Confederate Army maps produced during the Civil War. In addition to showing major and minor roads as well as rivers, streams, and smaller water courses, the Hotchkiss map is significant in that in shows locations of over 2000 named structures. Although mills (flouring, saw, and paper), churches, schools, mines, and a variety of manufacturing establishments (black smithies, potteries, forges) are shown on the map, the vast majority of named structures are private residences with the name corresponding either to the property's owner or inhabitant.

Viewed alone, the Hotchkiss map is capable of providing many insights into the physical and cultural geography of Augusta Co. during the Civil War and the immediately subsequent period. The major goal of this project, however, is to use the Hotchkiss map as a basis for projecting detailed Census records (population, agricultural, manufacturing, and slave holding) of the county for 1860 and 1870 into space. The abundant family names provided by the map provide the key which enables us to link Census records to inhabited space.


OUTLINE OF PROCEDURES

  1. Photographing of the Hotchkiss map

    The original Hotchkiss map was photographed by Special Collections in Alderman library. In its published form, the map consists of twenty-four paper sections (refered to as "quads" by the photographer) arranged in six rows each of which is comprised of four sections and all of which are affixed to a single canvas backing. So that the map could be easily folded, ca. 1/2 inch spaces were left between individual map sections. The Special Collections photographer shot the map in twenty-one sections, with each photograph corresponding to a 1:1 reproduction of a section/quad of the original map. The map sections were numbered quad01 through quad24 by the photographer, beginning in the upper left-hand corner of the map and proceeding from left to right and from top to botton. Thus, the quads were numbered as follows:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24

    The upper left (NW), upper right (NE), and lower right (SE) quads (numbers 01, 04, and 24) where not photographed as the margins of the county map did not extend into these sections. A map of the city of Staunton drawn at a smaller scale occupies the lower left hand corner of the map (quads 17, 18, 21,and 22). The twenty-one photographed sections were delivered to the VCDH as full color TIF images, each one about 17 MB in size.

  2. Constructing a single digital image of the Hotchkiss map

    A single image file comprising the whole of the county was "stitched together" in Photoshop. Individual quads were first aligned and their margins cropped as closely to the borders of individual map sections as possible. Quads were then edge-matched one to another, first in "blocks" defined by four contiguous map sections. These first-order recombinations were labeled `block01' through `block06.' Block01 was comprised of quads02, 05, and 06; block02 of quads03, 07, and 08; block03 of quads09, 10, 13, and 14; block04 of quads11, 12, 15, and 16; block05 of quads17, 18, 21, and 22; and block06 of quads19, 20, and 23. The six blocks were thus arrayed as follows:

    01
    02
    03
    04
    05
    06

    The six blocks were saved both as full color, compressed TIF images (block01-06.tif) and as black and white, uncompressed TIFs (block01b-06b.tif). The final stage of combining the six blocks into a single image required, for reasons of file size, that the blocks be converted to black and white while the geo-registration and rectification of the resultant image (see below) required an uncompressed TIF image.

    The final stage entailed edge-matching and joining the six blocks into a single black and white image file. This image file, named "augmap," was saved in both compressed (augmap.tif) and uncompressed (augmap2.tif) formats. The insert of Staunton wholly contained in block05 was clipped and saved as the uncompressed "stntn.tif."

    Edge-matching and joining of the quads and blocks was a tedious and not wholly perfectable process as both the paper and the canvas backing are very elastic and individual sections of the map appear to have been variably stretched and distorted over the past 130 years. In addition, the edges of the paper sections were often frayed and worn so that less than perfect matches often could not be achieved.

  3. Geo-referencing and rectification of the digital image

    Assigning real-world coordinate values to the individual pixels of the augmap2.tif image, known as "geo-referencing" was carried out in Arc/Info using the Arc commands REGISTER and RECTIFY. Within REGISTER, links were initially made to county boundary and hydrology vector line coverages from the U.S. Census (Tiger/Line data) and reasonable results were obtained. Better results were achieved by establishing links between the Augusta Co. image and georeferenced TIF files of 1:24000 scale USGS quadrangle maps (Digital Raster Graphics (DRGs), as numerous stable points such as churches, road intersections, etc. could be located on both target (Hotchkiss) and source (USGS) maps.

    Perfect geo-referencing of the Hotchkiss image was not possible due to various factors. First, as mentioned, the original paper map appears to have been stretched and distorted significantly. Second, distortions were undoubtedly compounded both during photography and subsequent editing, edge-matching, and joining of map sections and blocks. Third, the cartographic precision of the original Hotchkiss map appears to be less than that of modern maps of the county. This is particularly notable along the northwestern and southeastern borders of the county, both of which lie in mountainous terrain. The most significant departures in the actual contours of the county's boundary between the Hotchkiss map and modern maps occur at the southwestern and southeastern corners of the county.

    Approximately twenty links were established between the DRG source images and the Hotchkiss image and included a number of points along the county's boundary and throughout the internal area of the county. Links were added and deleted until the RMS error of all links was less than 500 meters. Lower average RMS errors could not be achieved despite much experimentation. In the main, then, points on the geo-referenced Hotchkiss image (as indicated by there x,y coordinates) lie no more than 500 meters from their "actual" locations and often times are significantly closer.

    The RECTIFIED Hotchkiss image file is labeled "augmap2r.tif" (with corresponding `world file' "augmap2r.tfw), and it is this rectified image that was used as a background for all subsequent digitizing of vector data.

  4. Creating Digital Vector Coverages from the Geo-Referenced Image

    A series of digital vector coverages have been produced to date using the rectified raster image of the 1870 Hotchkiss map and this process of digitizing continues. All digitizing is being carried out within the ArcEdit module of Arc/Info using "heads up" or "on screen" methods. Essentially, features are traced from the rectified image, with the resultant digital "coverages" being in the same real-world coordinate system as the source image (the UTM coordinate system is being used throughout this project). Digitized coverages of each of the three basic types - line, polygon, and point - have been created.

    1. Line Coverages

      To date, two county-wide line coverages have been digitized, one detailing hydrology (stream1870) and the other roadways (roads1870). Line features representing water courses in the hydrology coverage have all been coded (within a field name "RANK" added to the arc attribute table (aat) of the coverage so that all streams are classified into one of three types (major, lesser, and minor). Stream length was the criterion upon which this classification was based (>12000 m = Rank 1/major, 6000 - 12000 m = Rank 2/lesser, and <6000m = Rank 3/minor). A second field named "NAME" was also added to the aat of the hydrology coverage to contain stream names as they appear on the Hotchkiss map.

      Digitized roadways have also been classified according to a tripartite scheme. Within the aat of the roads coverage a field named "RD_TYPE" was added to contain this information. Roads classed as type 1 are considered "major roads" and are represented by double solid lines on the Hotchkiss map. Type 2 roads are "minor" and are represented by single solid lines on the original map. Finally, type 3 roads or "paths" are those routes shown by single dashed lines by Hotchkiss. As with the hydrology coverage a RD_NAME field was also added to the aat to contain road names, though most of these features are not named on the original map.

      At least one further county-wide line coverage detailing railroads is envisioned.

    2. Polygon Coverages

      The most basic polygon coverage digitized represents the boundaries of Augusta County. This coverage is named "bord1870".

      The only locational information that accompanies the available census data places individual households within one of three intra-county divisions. These are named "1st subdivision", "north subdivision" and "Staunton." Only the boundaries of the Staunton division are currently known, though basically this internal division of the county appears to correspond roughly to "north, south, and center." It provides a somewhat finer spatial grain with which to view the census data, but remains crude even if the division boundaries can be determined.

      The boundaries of six electoral districts plus Staunton are portrayed on the Hotchkiss map and these have been digitized into a single county-wide coverage named "dist1870." A single field named "DISTRICT" was added to this coverage's polygon attribute table (pat) to contain the name of each district polygon. The Staunton district has not yet been added to this coverage as this awaits georeferencing and rectification of the insert image (stntn.tif). Eventually, the electoral district polygons will be used to contain statistical data aggragated from the household level as provided by the various censuses.

      Polygon coverages of each of the individual electoral district boundaries have also been created. These were extracted from the above county-wide coverage. Thus far these coverages have mainly been used in various ways to assist the digitizing of other data.

      Experimentation has begun with the creation of lower order polygons, currently termed "locales." Locales are areas centered on named places such as towns or mill seats but also may be defined by the locations of churches, schools, or other "centralizing" institutions. Polygon coverages of locales are now being generated by transforming point coverages using the Arc command THIESSEN (which creates a matrix of Thiessen polygons around an array of points). Statistical data from the various censuses will be aggregated at the local level either for the entire county or within individual electoral districts.

    3. Point Coverages

      The heart of this project entails the digitization of point coverages from the Hotchkiss image that record the locations of all named (and unnamed) structures and establishments that appear on the map. It is through the establishment of links between map names and census names that a fully spatially referenced statistical database will be generated.

      The process now being followed entails digitizing all point features on the map and assigning each a unique identifier that can be used to join/relate the gis point coverages to a series of data files containing information regarding matches between points on the Hotchkiss map and records contained in the 1860 population, agricultural, and slave holding censuses. The initial task of matching map points to census records was carried out by VCDH staff prior to the initiation of this gis data base. Results of this earlier work are contained in an Excel spread sheet named "aumap.xls." The compilers of this spread sheet worked systematically by election district, proceeding typically from point to point along roadways, recording named points along with general locational information (toponym and reference within a grid that was superimposed over the map) and indicating whether the point could be matched to a record in any of the three censuses (pop, agric., and slave). The compilers of "aumap.xls" clearly maintained the record order in which points were entered/matched within this spread sheet. Consequently, it has been possible to break "aumap.xls" up into a series of sheets based upon electoral district. These district-wide spead sheets are named "beverle.xls," "middle.xls," "north.xls," "pastures.xls," "rvheads.xls," and "south.xls." Records for points within the Staunton electoral district presumably exist, but these are not contained within "aumap.xls." Digitizing of point features from the Hotchkiss map, then, is proceeding by electoral district, and once all districts have been completed these can be joined to form a single, seamless, county-wide coverage.

      Digitizing point features began in the northwestern corner of the map with the North River District and is proceeding from west to east and north to south. To date, complete point coverages containing all named and unnamed points have been digitized for the North River and Middle River Districts; digitizing of the Pastures district is in progress. To join the district point coverages to the district data files containing information on matches to census records, unique identifiers must be created in both the gis and Excell files that link points with a corresponding Excell record. IDs were first created for the North River district Excell records simply by creating a field (MAP-ID) and filling it with a numerical series beginning with "1." NB signifies that the Excel files have NOT BEEN sorted and should not be sorted prior to the addition of MAP-IDs for each record. Once MAP-IDs have been added to a district Excel file, the entire file is printed to be used to monitor the digitizing process.

      Within ArcEdit, adding point features to a coverage entails the software automatically adding a unique "user-id" to each feature. The user-id field is the fourth field in the coverage's point attribute table and is generated automatically. The name of this field is always -id and should not be confused with the third pat field named #. By default, ArcEdit calculates user-ids sequentially, beginning with "1" each time a new coverage is created. The user-ids of added features that are later deleted are NOT reused, again by default. The assignment of user-ids, however, can be controlled by the digitizer; the start number as well as interval of a sequence, for example, can be specified. User-ids can also be changed for individual points or a series of points using the CALCULATE command. In this project, digitizing and thus the assignment of a series of user-ids to point features follows exactly the record sequence of the Excel files (and therefore the MAP-IDs contained there). Essentially, digitizing moves from point to point in the same sequence followed by the compilers of the Excel files. In addition to the automatic assignment of a user-id to each point digitized, the digitizer also fills a value in the added field named "PNT_TYPE." [The numerical codes of all POINT TYPES (residences, schools, churches, etc.) is contained within "pntcodes.txt" in the docs directory on ptolemy.]

      Because of the extremely repetive nature of point digitizing (there are approximately 1000 points to be digitized within each of the 6 electoral districts) this process has been automated and is now being carried out with the use of an AML script named "points.aml" This aml, of course, cannot automatically correct entry errors which must be corrected manually outside of the script. Once all named points contained with the Excel file for a given district have been digitized, points.aml can be used to add and assign point types to any additional locations (usually unnamed) within the district. This entails adding, typically, several hundred additional points to those contained in the Excel file. Assignment of user-ids to these points can take place irrespective of sequence. Once the last point within a district has been digitized, the next number in the id sequence can be used to define the first user/map-id to be used in the next district to be digitized.

  5. Checking and Cleaning Census Match Excel Files ("aumap.xls" and derivatives)

    Once all points within a district have been successfully digitized, the next step is to join the data records contained in the corresponding Excel file to the coverage's point attribute table. Before this is carried out, however, the Excel file MUST BE checked and cleaned of any erroreous or ambiguous entries.

    Occasionally, transcription or spelling errors are encountered in the Excel file during the process of digitizing and these should be corrected. Very occassionally, Excel file records may be encountered for which no clear point on the map can be found (and thus digitized). In this case, the Excel file record can be deleted (not deleting the file record will have no consequence upon the gis data base as lacking a match to the ids in the point attribute table, the Excel record will not be imported - as long as the digitizer did not assign this user-id to another point). A more serious problem arises in the case in which a single point feature on the map may have been assigned (erroneously) two or more records by the compilersof the Excel file containing references to Census record matches (and thus multiple map-ids will have been assigned to such single points. In this one of the records (and its map-id)must be deleted from the Excel file before it is joined to the coverage's point attribute table.

    Much of the labor and time required in cleaning the Excel files results from the fact that the compilers of the file frequently matched multiple point features with a single census record. That is, multiple point features on the Hotchkiss share references to a single, unique Census Page#/Family# (Pop. Cen.) or Page#/Line# (Ag. Cen.). This is a case of a "many to one" match and may have happened for various reasons.

    One common cause is that multiple features on the map often actually are labeled wth identical names and, thus, appear to be owned by a single individual. For instance, many points exist on the Hotchkiss map that are labeled with the possessive form of an individual's name (i.e. "A. Crawford's). Invariably, however, there will be one point in this spatial cluster that is not labeled in the possessive.

    Although the interpretation cannot be verified, our working assumption is that points labeled with a possessive represent properties of the named while points not labeled possessively indicate place of residence of the named. The compilers of the Census Match files, however, ignoring those cases labeled in the possessive, typically assigned all points with the same name,to the same individual (unique record) in the Census records. While some, perhaps even all of the features labeled with a possesive may be dwellings owned by the individual indicated, they need not be. That such points represent barns, outbuildings, or other agricultural or manufacturing installations cannot be ruled out without more information. Even if these features are residences, however, it is important that they be associated with the the Census data related to their OCCUPANTS RATHER THAN THEIR OWNERS.

    If the task of reconstructing the routes of Census takers and of infilling more point to record matches is profitable, it may be possible to associate some of these points with Census households as families that rent their residences can be detected in the Population Census as they have zero Real Estate wealth.

    Many-to-one matches also appear to have other causes. It may also be the case, quite understandably, that multiple individuals within the county shared the same first and last names, and thus the possibility exists that the compilers of the files of matches to the Census records will have matched inadvertently more than one person/dwelling to the same Census data record. Such cases can only be resolved, if at all, by examination of the Page/Family Numbers of nearby matches.

    Cases have also been encountered (with the 1860 Census Records) in which points on the map indicated as belonging to individuals sharing a common family name but having different first(and middle) names/initials (e.g. A. Crawford and T. Crawford) have been matched to the same unique Page#/Family# or Line#. The only explanation for this situation seems to be the time lag between the recording of the 1860 Census and the drawing of the 1870 Hotchkiss map. That is, in 1860 A.Crawford and T.Crawford were listed as belonging to the same family because they were sons of the same father and lived with him under the same roof, but by 1870 the father had died and his estate had been divided among his hiers whose names were recorded by Hotchkiss. The difficulty here becomes that of deciding which point (if any) of those matched to the Census record should be retained as the most probably residence of the family's head of household in 1860.

    All cases of "many to one" matches are problematic and MUST BE rectified at this point. If not, the process of joining data tables (.pat and Excel file) will simply join the first occurence of an ID in the .pat with its first occurence in the Census Match data file - and this is dependent simply upon the (arbitrary) order of the records in these two data files. EACH DIGITIZED POINT ON THE MAP (EACH USER-ID) CAN BE MATCHED TO ONLY ONE UNIQUE RECORD IN EACH OF THE CENSUSES. LIKEWISE, A UNIQUE CENSUS RECORD CAN BE MATCHED TO ONLY ONE POINT IN SPACE. To match a single Census record entry to more than one point on the map will result in the replication of statistical census data in any aggregation of this data above the level of the household. In other words, not resolving cases of one-to-many matches will result in individuals being counted more than once whenever statistical data is aggregated/summarized at higher order spatial scales.

    As most replications of unique census records with the Excell matching files probably are based upon the replication of names (either intentionally or due to a multiplicity of W. Smiths, for example) in the map, a means of checking for them is to sort the Excel files (but not before Map-ids have been established) alphabetically on last names. The file can then be studied for duplicate last names and duplicate matches to single census records. The information on matches to census records should be removed fron all records except that one deemed most likely to represent the primary residence of the individual in question. It is also possible, however, that census records are duplicated in the Match files because of transcription/typographical errors (either in the original documents or in subsequent versioons. A more complete check, then, entails sorting the files by page# and fam#/line# for each of the three censuses and checking for additional duplicated matches. This sorting and checking procedure should follow a sort based on Last Name, First Name however.

    Because of the need to clean the Excel data files of many-to-one matches, three fields have been added to this original data file. These fields are named "P60NOTES," "A60NOTES." and "S60NOTES." Contained in these fields are brief descriptions of why a given point was or was not chosen from a series of points initially matched to a single record. In these fields, "prop. of " indicates that the point in question has been interpretated as a property of but not primary residence of the named individual. The term "desc. of " in the NOTES fields indicates that the named individual is listed as a dependent (not household head) within the family in question, and therefore no specific household data is available for this person. In both of these cases, if another point has been matched to the Census family in question, the NOTES field contains the additional annotation "cf. map-id xxx" to indicate the point for which the match was established.

    As a final stage of file cleaning prior to importation of the Census Match Excel files into Arc/Info, it is imperative that all potentially confusing characters be removed from or replaced in the Excel files prior to importation into Arc/Info. Commas, since they will be used as field delimiters (see below) must be replaced with colons (:). Forward slashes (/) should be replaced with underscores (_), and single right quotations (') with single left quotations (`). The fields containing page, line, and family number information that match records to census records must contain only numberical data although characters can be cantained within the original composite fields in which this information was intially recorded. Thus, remove all "M"s, "N"s, "/" and any other character information from these fields (PopCen60Page, PopCen60Fam, AgCen60Page, AgCen60Row, SlvCen60Row). Occassionally these fields may contain references to more than one census record. When this occurs, all but one of these references must be removed (though this information should be retained in another field (Orig, or Name fields).

    Are there other problems with these files? Probably, but only time will tell.

  6. Importing Data files and Joining with Arc/Info Point Attibute Tables (.pat)

    Once an Excel file has been checked and cleaned for any erroreous data, the file can be prepared for importing into Arc/Info. Arc/Info can read ascii text files with, by default, comma delimiters. Excel allows its files to be saved in comma delimited text format (.csv) and a copy of the corrected file should be so saved and ftp'ed to the vdhc/augusta/data directory on ptolemy (sending the file as ascii rather than binary data will prevent record delimiter characters (^M) from appearing at the end of each record of the text file).

    Once on ptolemy, the .csv file should be inspected (use the UNIX command "more ". Before importation, the first line of the comma delimited text file (containing field names) must be deleted. In xedit, with the cursor positioned at the beginning of a line will delete the line in its entirety. If extraneous characters exist in the text file (such as the record delimiters ^M mentioned above), these also must be removed. Such characters are best removed using the vi text editor.

    To open a vi session, from the UNIX prompt type "vi " The global search and replace command in vi for the ^M character for example is:
    :%s/v m//g
    [the general search and replace command structure is :%s/target value/replace value/g

    To exit vi and save changes, type:
    zz or :x

    Before the text file can be imported into Arc/Info, an Info file must be created to receive the data records. INFO is the data base module of Arc/Info and can be entered by typing from the Arc prompt: INFO IS CASE SENSITIVE AND YOU MUST SET CAP LOCKS SO ONLY UPPER CASE IS USED to issue commands within this module.

    Arc: INFO
    INFO EXCHANGE CALL
    03/06/1999 15:02:37
    INFO 9.42 11/11/86 52.74.63*
    Copyright (C) 1994 Doric Computer Systems International Ltd.
    All rights reserved.
    Proprietary to Doric Computer Systems International Ltd.
    US Govt Agencies see usage restrictions in Help files (Help Restrictions)

    ENTER USER NAME> ARC
    (enter ARC as user name at this prompt)

    ENTER COMMAND > (this is the INFO PROMPT)

    The structure of the INFO data file to be created should be the following, in which one field is created for each field to be imported with field types and widths corresponding to the data to be imported:

    ITEM NAME WIDTH OUTPUT TYPE N.DEC
    PNTSNORTH-ID 4 5 B -
    L-NAME 40 40 C -
    F-NAME 40 40 C -
    DISTRICT 25 25 C -
    LOC-NOTES 30 30 C -
    COUNTY 12 12 C -
    MAP# 2 2 I -
    GRID# 2 2 C -
    ORIG1860POP 50 50 C -
    POPCEN60 9 9 C -
    POP60_NOTES 50 50 C -
    POPCEN60PAGE 4 4 I -
    POPCEN60FAM 4 4 I -
    POPCEN60DWELL 4 4 C -
    POPCEN60NAME 50 50 C -
    ORIG1860AG 50 50 C -
    AGCEN60 9 9 C -
    AG60_NOTES 50 50 C -
    AGCEN60PAGE 4 4 I -
    AGCEN60LINE 4 4 I -
    AGCEN60NAME 50 50 C -
    ORIG1860SLV 50 50 C -
    SLV60_NOTES 50 50 C -
    SLVCEN60LINE 4 4 I -
    SLVCEN60NAME 50 50 C -

    The INFO command to create such a data file is DEFINE and can be used as follows: (the suffix .DAT should be used for all such data files defined in INFO).

    ENTER COMMAND >DEFINE PNTSMIDDLE.DAT
    ITEM NAME,WIDTH [,OUTPUT WIDTH] ,TYPE [,DECIMAL PLACES] [,PROT.LEVEL]
    1
    ITEM NAME>PNTSMIDDLE-ID,4,5,B
    5
    ITEM NAME>L_NAME,40,40,C
    45
    ITEM NAME>F_NAME,40,40,C
    85
    ITEM NAME>DISTRICT,25,25,C
    110
    ITEM NAME>LOC_NOTES,30,30,C
    140
    ITEM NAME>COUNTY,12,12,C
    152
    ITEM NAME>MAP#,2,2,I
    154
    ITEM NAME>GRID#,2,2,C
    156
    ITEM NAME>ORIG1860POP,50,50,C
    206
    ITEM NAME>POPCEN60,9,9,C
    215
    ITEM NAME>POP60_NOTES,50,50,C
    265
    ITEM NAME>POPCEN60PAGE,4,4,I
    269
    ITEM NAME>POPCEN60FAM,4,4,I
    273
    ITEM NAME>POPCEN60DWELL,4,4,I
    277
    ITEM NAME>POPCEN60NAME,50,50,C
    327
    ITEM NAME>ORIG1860AG,50,50,C
    377
    ITEM NAME>AGCEN60,9,9,C
    386
    ITEM NAME>AG60_NOTES,50,50,C
    436
    ITEM NAME>AGCEN60PAGE,4,4,I
    440
    ITEM NAME>AGCEN60LINE,4,4,I
    444
    ITEM NAME>AGCEN60NAME,50,50,C
    448
    ITEM NAME>ORIG1860SLV,50,50,C
    498
    ITEM NAME>SLV60_NOTES,50,50,C
    548
    ITEM NAME>SLVCEN60LINE,4,4,I
    552
    ITEM NAME>SLVCEN60NAME,50,50,C
    602
    ITEM NAME> return
    ODD RECORD LENGTH ROUNDED UP TO EVEN

    ENTER COMMAND > Q STOP (ends INFO session and returns to Arc: prompt)

    This process of DEFINING info data files to receive records for the Excel 'census match' files could be automated but such a script has not been written.

    Info files (including pat's, aat's, and .dat files) reside within an info subdirectory within the workspace in which they were created and are not visible with a simple "list" command from either to Arc or Unix prompts. Listing the contents of the info subdirectory, further, will only return a list system-defined names of all files and not file names assigned by the user. To view user names of INFO files within a given workspace, enter the Arc command "dir info" or, from within INFO, the command "DIR"

    Before proceeding, it is a wise idea to create a copy of your empty INFO file as if the adding of data is unsuccessful you will have to delete the entire file and start over with the DEFINE command. Having an empty backup is one way of saving time. To copy an INFO file, use the Arc command COPYINFO:

    Arc: copyinfo pntsmiddle.dat pntsmiddbak.dat

    Comma delimited records can be imported only from within INFO using the ADD command with the FROM option (specifying the entire path of the file to be imported). You must first have selected the INFO file into which data is to be imported/added:

    ENTER COMMAND >SELECT PNTSMIDDLE.DAT
    0 RECORD(S) SELECTED

    ENTER COMMAND >ADD FROM ~ HOME/GICLAB/SMT6Z/VDHC/AUGUSTA/DATA/PNTSMIDDLE.CSV

    If all goes well, after a pause while INFO reads the data a response will appear that INFO as added n number or records to the selected file. If the value of "n" corresponds to the number of records in the text file all is well. If it is less, then that means that certain records were skipped over because of difficulties with specific field entries within those records - fields may have been incorrectly defined (numberic rather than character, not wide enough to receive records, or a host of other problems). To find out the ids of the offending records, use the INFO command LIST followed by the -id field to see which records in the sequence were omitted:

    ENTER COMMAND >SEL NORTH.DAT
    350 RECORD(S) SELECTED

    ENTER COMMAND >LIST PNTSNORTH-ID
    $RECNO PNTSNORTH-ID
    1
    1
    2
    2
    3
    3
    4
    4
    5
    5
    6
    6
    7
    7
    8
    8
    9
    9
    10
    10
    11
    11
    12
    12
    13
    13
    14
    14
    15
    15
    16
    16
    17
    17
    18
    18
    19
    19
    20
    20
    21
    21
    22
    22
    MORE?

    If needs be, the entire list can be copied to an xedit file and printed (lpr ). Once all offending records have been identified, the .csv source file can be examined to see what the difficulties might be. Good luck. Removing blank spaces within character fields, even though the INFO field was wide enough to receive all data, has resulted in the records being subsequently accepted.

    If the ADD process was less than perfect, you must delete the file and try again. To delete an INFO file use the INFO command ERASE (the file must first be selected):

    ENTER COMMAND > SEL PNTSMIDDLE.DAT
    ENTER COMMAND > ERASE PNTSMIDDLE.DAT
    THIS COMMAND WILL ERASE THE SPECIFIED DF DO YOU WISH TO CONTINUE ( Y OR N ) > Y

    Now you can return to Arc and make a new copy from your backup (copyinfo pntsmiddlebak.dat pntsmiddle.dat) in preparation for your new try.

    Once all records are successfully added, the .dat file can be joined to the corresponding .pat of the point coverage. This ensures that the fields containing the information on matches to census records are physically joined in the gis data base. To join files use the Arc command JOINITEM

    Arc: joinitem
    Usage: JOINITEM
    {start_item} {LINEAR | ORDERED | LINK}
    Arc: joinitem pntsmiddle.pat pntsmiddle.dat pntsmiddle.pat pntsmiddle-id

    Always name the out_info_file exactly as the in_info_file, otherwise you will create a data file for which there is no corresponding coverage.

    As yet, no exact procedure has been established for joining/relating the resulant coverages/pats with the actual census data files. For the population census, the two fields POPCEN60PAGE and POPCEN60FAM together form a unique identifier for an individual family. Again, we are more interested in dwellings than in families, so we also will want to add this number to our .pat for each matched record.

  7. Creating "Locale" Polygon Coverages and Compiling Aggregate Statistical Data For These Areas

    Various spacial scales above that of the household can be created for which statistical data derived from the Censuses can be aggregated. One such supra-household level is the "locale" or "neighborhood," within which groups of points/households are grouped with one on the basis of their shared proximity to a place/point that has been deemed to be the center of a locale. While locales can be defined in a wide variety of ways, any given system must contain within a rationale for determining the locations of locale center points. One way of systematically imposing a matrix of locales upon geographical space, for example, would be to establish center points at standardized intervals, say every 5 miles. No point on the map, then, would lie more than 4.9999 miles from a center and it is to the closest center that a given point would be associated. Given that we already know much about the distribution of residences and other types of functional points, it makes more sense to use existing geographical data to define locales. Named places are abundant on the Hotchkiss map as are churches, schools, post offices, etc. Any of these features can be used to define the center points for locales.

    Point coverage of features that share a common POINT_TYPE code can be quickly extracted from a digitized coverage. To define locales on the basis of named places or other features that have not been previously digitized or uniquely coded, a new point coverage will have to be digitized.

    To extract a point coverage of features of the same POINT TYPE go into Arcplot and use the RESELECT and WRITESELECT commands:

    Arcplot: resel pntsmiddle point pnt_type = 2
    (Usage: RESELECT {logical_expression})
    Arcplot: writeselect midchurch.sel pntsmiddle point
    (Usage: WRITESELECT { })
    Arcplot: q

    quit from Arcplot and use Arc command RESELECT

    Arc: reselect pntsmiddle midchurch point midchurch.sel point
    (Usage: RESELECT {in_feature_class} {selection_file} {out_feature_class})

    This creates a new point coverage named in this example "midchurch" that contains all of the points in the Middle River Electoral District that have a PNT_TYPE code of 2 (meaning church).

    To convert this point coverage to a polygon coverage of locales centered upon these churches use the THIESSEN command in Arc. However, before issuing the THIESSEN command you must add the arcs representing the border of the electoral district to the point coverage of churches. This is because you want your locale polygon coverage to extend over the entirety of the district. The map extent of the point coverage, however, is determined by the distribution of points and will be less than that of the entire district. Add boundary arcs in Arcedit:


    Arcedit: edit midchurch
    Arcedit: de point arc
    Arcedit: draw
    Arcedit: editfeature arc
    Arcedit: get distmidd
    (Usage: GET )
    This command copies in the district boundary arcs.
    Arcedit: save
    Saving the coverage automatically extends its map extent
    Arcedit: quit

    Arc: thiessen midchurch midchurch2
    (Usage: THIESSEN {proximal_tolerance})

    The coverage named here is a polygon coverage comprised of Thiessen polygons each one centered upon a church. Thiessen polygons can be constructed by hand by first drawing lines between a center point and all neighboring points and marking the midpoint along each of these lines. Polygon boundaries are then made by drawing perpendiculars that pass through through the midpoints of each of the lines between center points. It is a classic means of defining the territories of central places.

    Once a polygon coverage of locales has been created these should be (if not already) assigned unique names or some sort of identifier. The next step is to create a field within the original point coverage into which the locale name/id can be added. To add locale name/id to the point attribute table, open an Arcedit session:


    Arcedit: edit pntsmiddle
    Arcedit: de points
    Arcedit: backcover midchurch2 3
    Arcedit: backenvironment arc
    Arcedit: arcplot textjustification center center
    Arcedit: arcplot textquality tightkern
    Arcedit: arcplot polygontext midchurch2 name
    [writes locale names on screen]
    Arcedit: draw
    Arcedit: ef point
    Arcedit: select polygon
    (Usage: SELECT {WITHIN | PASSTHRU})

    SELECT with the polygon option allows you outline a shape with the mouse and select all points contained. As closely as possibly, outline a shape corresponding to the boundaries of a locale (shown as a backcoverage). Once points are selected in this fashion:

    Arcedit: CALCULATE =

    Continue using the SELECT command with polygon option until all locale boundaries have been duplicated and a locale name/id value added for all points. Check work by selecting points for which locale name/id is void to make sure all points were successfully captured during the SELECT process.

    With a locale field filled for all points, any data associated with points can be summarized using the STATISTICS command in Arcplot;


    Arcplot: reselect pntsmiddle point locale = 'XXX'
    Arcplot: statistics pntsmiddle point
    Statistics: sum total_pop
    Statistics: sum total_slaves
    Statistics: end
    Arcplot:

  8. Recreating the Routes of Census Takers

    Given that Census Takers presumably followed linear routes, at least on a daily basis, and that the order of the original census records is based upon the order in which information was collected it is theoretically possible to reconstruct the routes of the Census Taker by drawing a line that connects points according to sequential page#/family#s. Being able to reconstruct the route of the Census Taker in geographical space holds the potential of allowing us to make additional matches of between mapped points and census records to the extent that we can confidently recreate the routes on the basis of existing matches.

    There are at least two ways of drawing a recreated Census Taker route. Both methods require that the information matching named points to specific Census records be physically joined to the relevant point attribute table.

    The first method is completely manual and entails displaying points (in Arcedit or Arcplot) along with corresponding Page# and Family# for all matches. The user then must visually inspect the map and attempt to draw a line connecting all matched points in sequence. While all matched points on a single page or within a range of pages can be reselected to minimize the "clutter" at any time, this is still a cumbersome and time consuming method.

    The second method of recreating Census Taker routes relies upon Arc/Info commands and functions. The first step is to sort the point attribute table of a point coverage by Census Page# and Family#. This is done in INFO. NOTE, THAT SORTING FEATURE ATTRIBUTE TABLES WILL COMPLETELY DESTROY THE RELATIONSHIP BETWEEEN X,Y COORDINATE PAIRS AND ATTACHED ATTRIBUTE INFORMATION UNLESS THE TABLE IS RESORTED TO ITS ORIGINAL ORDER. FEATURE ATTRIBUTE TABLES ARE ORGANIZED ACCORDING TO THE FIELD #. IF THE FEATURE ATTRIBUTE TABLE IS SORTED IT MUST BE RESORTED BY THE FIELD.

    Arc: info
    INFO EXCHANGE CALL
    07/06/1999 17:20:05
    INFO 9.42 11/11/86 52.74.63*
    Copyright (C) 1994 Doric Computer Systems International Ltd.
    All rights reserved.
    Proprietary to Doric Computer Systems International Ltd.
    US Govt Agencies see usage restrictions in Help files (Help Restrictions)
    ENTER USER NAME>arc

    ENTER COMMAND >SEL PNTSMIDDLE.PAT
    562 RECORD(S) SELECTED

    ENTER COMMAND >RESEL POPCEN60PAGE NE 0
    146 RECORD(S) SELECTED

    ENTER COMMAND >SORT ON POPCEN60PAGE,POPCEN60FAM

    ENTER COMMAND >Q STOP

    Now use the UNGENERATE command in Arc to create an output text file containing an x,y coordinate pair for each record in the resorted point attribute table for which a match to Population Census exists.

    Arc: ungenerate point pntsmiddle midpnts.gen v
    (Usage: UNGENERATE {NODES | NONODES})

    This file contains three columns of data. The first is a record number, the second the x-coordinate of the point, the third the point's y-coordinate. Open the file in Excel and delete the first column (record number) and save the file as comma delimited text. Open the file in XEDIT and add a beginning record number (any number will do) to the first line of the file. The final two lines of the file should both contain "END". Thus,

    999 [record number]
    x,y [records of x,y coordinates of all points]
    x,y
    x,y
    x,y
    ....
    END
    END [be sure to enter a carriage return at the end of this line]

    Now use the GENERATE command in Arc to create a line coverage that connects the points in the text file, reading each point as a subsequent node in the line. Since the .pat was sorted on page#/fam#, the resultant line will connect all matched points according to the sequence of collection. Of course the course of the line will be affected by the number of unmatched families between any two matched points.

    Arc: generate midline
    (Usage: GENERATE )

    generate: input midpnts.csv
    generate: line

    generate: quit

    Arc: build midline line [To create line topology]

    You now have a line coverage named connecting sequential points (according to PageNumber/Family Number) in the Middle River Electoral District. Of course, the Census Takers probably did not collect data by Census District and thus a county-wide points coverage should be used in this procedure. BEFORE GOING FURTHER, RESORT THE POLYGON ATTRIBUTE TABLE OF THE COVERAGE BY THE # FIELD:

    Arc: INFO
    INFO EXCHANGE CALL
    07/06/1999 17:39:35
    INFO 9.42 11/11/86 52.74.63*
    Copyright (C) 1994 Doric Computer Systems International Ltd.
    All rights reserved.
    Proprietary to Doric Computer Systems International Ltd.
    US Govt Agencies see usage restrictions in Help files (Help Restrictions)
    ENTER USER NAME>ARC

    ENTER COMMAND >SEL PNTSMIDDLE.PAT

    ENTER COMMAND >SORT ON PNTSMIDDLE#

    ENTER COMMAND >Q STOP

    Now, the line coverage created can be displayed in Arcplot or Arcedit and compared to Census records to see if additianal matches are possible.

  9. Merging District-Wide Coverages Into County-Wide Coverages

    Point coverages have been created in a series of coveragas based on Electoral Districts. The join these into a single county-wide coverage use the APPEND command in Arc (MAPJOIN will not work for point coverages).

    Arc: append
    Usage: APPEND {NOTEST | template_cover | feature_class...feature_class} {NONE | FEATURES | TICS | ALL}

    Before using APPEND it is important to check that all items, their definitions (type and width), and the item order in the files to be appended are identical to one another. The command will not work otherwise. Therefore, before issuing APPEND, use the LIST command to verify that the .pats of all coverages to be appended are defined the same.

    The NONE option must be used to ensure that user-ids are not modified during APPEND. (NONE is the default option). Note, however, that if the point attribute tables of the appended coverages contain any duplicate user-ids the output coverage will maintain the duplicate id values.

    Arc: append newcov point none
    Enter Coverages to be APPENDed (Type END or a blank line when done):
    =====================================================================


    Enter the 1st coverage: pntspstr
    Enter the 2nd coverage: pntsmiddle
    Enter the 3rd coverage: end

    Appending coverages...
    Arc:

  10. Adding or Relating Data Bases to the Attribute Tables of Digital Geographic Coverages

    Associating point features with census data records is the last major step to be accomplished by the project. Because of time constraints, this has not yet been done. Several necessary steps are envisioned, however, and these are outlined here.

    1. Population Census data files must be aggregated on individual dwelling numbers, thus summarizing demographic data for each discrete dwelling. It is important to collapse data around Dwelling# rather than Family# as individual dwellings listed in the Census may have housed multiple families. Nevertheless, this data aggregation procedure must retain the page and family numbers recorded in the Census.

    2. Slave Holding Census must be aggregated by owner's name. Individuals may be listed multiple times (but always in successive records) in this Census if they own slaves that have been "rented out" at are working at other locations. Aggregation of this census may have to be done by hand, but it may be possible to script the aggregation process.

    3. Using a unique identifier concatenated from the fields for Census Page Number and Census Family Number, the point coverage attribute table can be related to the aggregated Population Census data base. The dwelling number associated with each unique Page#/Family# must then be added to the point coverage attribute table. (Study documentation of INFO command RELATE to understand how to perform this operation.)

    4. At this point, it also makes sense to add the unique MAP-ID associated with each point in the GIS coverage to its corresponding Page#/Dwelling# record in the aggregated Census Data base. Adding MAP-IDs directly into the Census data bases allows these files to be sorted on this field, thus providing an additional necessary check to make sure that individual records in the Census data bases have been matched to ONE AND ONLY ONE point on the map. (see above regarding many-to-one matches).

    5. Other fields and values (such as District Name and Locale) may also be added to the Census data bases once these have been related to GIS attribute tables.

    6. A concatenated field comprised of Page#/Dwelling# can now be used to relate point coverage attribute tables to aggregated Population Census Records.

    7. Rather than physically joining the aggregated Population Census data base to the GIS attribute table, it is better to build a RELATE file (using either Page#/Dwelling# or Map-ID as the relate item) that can temporally join the geographical and statistical data bases. (Study documentation of Arc command RELATE to understand how to do this).

  11. Staunton

    Compiling data for Staunton and then digitizing the city's points involved a process similar to the one explained in Checking and Cleaning Census Match Excel Files, which was used to complete the digitization of each of the county's electoral districts. However, the VCDH staff was forced to manipulate this process in order to accomodate Staunton's unique circumstances. Before reading the following explanation of these changes, be sure to study the procedures that were used to compile data for and then digitize the rest of the county.

    Staunton, located in the Beverle District, exists on the Hotchkiss map in two forms: on the "augmap2r.tif" image and in more detail as an insert which was clipped and saved as "stntn.tif." To include points within the insert, "stntn.tif" was georeferenced and rectified, resulting in "stntnbr.tif." Inevitably, the two images did not fit perfectly. That is, when "stntnbr.tif" was drawn over the larger map, the various line coverages (streams, roads, and railroads) did not follow perfectly the features on the insert. To remedy this, the digitizer edited the coverages.

    Compiling an Excel file for Staunton was difficult and time-consuming. Prior to the initiation of the data base, VCDH staff produced an Excel spread sheet for Staunton by cross-referencing census and tax record information. The file, called "Stcynew.xls," included the following information for all of the city's tax payers: last name, first name, other, population census information, agricultural census information, slaveowner census information, acres, rods, poles, residence, estate, lot number, building value, lot and building value, tax amount, city tax amount, and notes. While the "aumap.xls" file maintained the record order in which points were entered, the "Stcynew.xls" file did not. It was therefore impossible to locate names on the map simply by following the list of names in the file. The "Stcynew" file also included a number of names that did not exist on the map and could therefore not be included in the final Excel file.

    A second source of information for Staunton was the Staunton Fire Insurance Depositions. Compiled between 1850 and 1860 by "The Mutual Assurance Society Against Fire on Buildings of the State of Virginia," the depositions include the following information: policy number, policy holder's name, location of building, bordering homes or businesses, occupant's name, building value, total value of the policy, and a description of the one or more buildings included in the policy. Company agents also drew sketches of individual insured buildings. Thus, each of the policies is linked on the Valley site to a preliminary drawing of the buildings on that block. These sketches and their associated policy information allowed the project's staff to associate names (and various information) with points that were not labeled on the Hotchkiss map.

    The first step in compiling Staunton's data was to scan the "stntnbr.tif" image for labels and produce a list of these names. These names were then cross-referenced with the "Stcynew.xls" in order to determine whether or not tax record information was available. The tax records use a coding system for the locations of buildings which includes: N for New "Town," O for "Old Town," B for "Beverley Addition," S for "Staunton," and OL for "Outlying." This coding system also includes numbers that refer to tax grid blocks. An image of Staunton's tax grid can be accessed through the Valley's insurance deposition index. The image is called "taxgrid.tif." The coding system used in the "Stcynew.xls" file made it possible to determine which tax record was associated with which buidling. Unfortunately, the tax grid does not include areas classified as "Staunton" or "Outlying." Thus, it was impossible to associate a tax record with a name that appears more than once on the map in either of these areas.

    Cross-referencing labeled points with the tax record information produced a list of 226 points. Some of these names were successfully associated with tax record data, while others were not. Points were often clustered near a label. In these cases, it was assumed that all of the points within a lot (which is contained within a polygon on the Hotchkiss map) belonged to the person whose label appeared in that lot. Each of these points was then linked to its appropriate census record information (population, agricultural, and/or slaveowner), if possible. Because only one point can be associated with each census record (see Checking and Cleaning Census Match Excel Files), one of the points within a lot was matched to its occupant's census record while the rest of the points were classified as the property of that person.

    The city of Staunton raised a number of issues for the VCDH staff concerning occupancy versus ownership. The staff used the following reasoning in determining how to be consistent and accurate with regard to data compilation:

    1. Labels outside of the city most likely refer to the property's owner. But, because we have no information regarding the occupancy of each of these properties, it was assumed that the owner was also the occupant. Thus, census information is associated with this occupant/owner and his or her name appears under the "Last Name" and "First Name." Note: Although there is a category for "Owner's Last Name" and Owner's First Name," the name is not repeated under these headings.

    2. Within the city of Staunton, tax records and insurance depositions have allowed the staff to determine in certain instances if a building is occupied by one individual, but owned by another. In these cases, two names appear. The "Last Name" and "First Name" refer to the occupant, while the "Owner's Last Name" and "Owner's First Name" refer to the owner. Again, census information is associated with the occupant. These points are unique in that tax record and insurance deposition data indicate not how much the occupant pays, but how much the owner pays. The VCDH staff does not view this as an inconsistency within the data base because, most importantly, the information associated with these records gives the audience a better understanding of the building, its value, its physical qualities, etc.

    After cross-referencing and inputting tax record and census data, the spread sheet compiler accessed the Staunton Fire Insurance Depositions on the web and added appropriate information to the list of 226 labeled points. Information relating to the insurance depositions includes: policy number, building, location, bordering properties, building value, total policy value, building type, year, and description.

    At this point, an Excel file existed which included 226 points as well as their census, tax record, and/or insurance deposition information. In order to digitize these points, however, it was necessary to create a field for the "Map ID" and fill this field with a series of unique numbers. The first "Map ID" for the "pntsstan" coverage was 2037, because the last point in the "pntsbev" coverage was 2036. Although the Excel file for Staunton was not complete at this point, the VCDH staff went ahead and began the digitizing process in Arcedit. The "pntsstan" coverage was created and the "pntsstan-id" field was added to the "pntsstan.pat." The digitizer then added the 226 points to the coverage, making sure that each point was given the appropriate "pntsstan-id."

    The next phase of the Staunton project was to associate unlabelled points on the map with information from the census records, tax records and insurance depositions. The tax grid and the preliminary drawings of blocks available on the insurance deposition page allowed the VCDH staff to relate names and other information with points that Hotchkiss failed to label. The process for this phase of the project was reversed. Instead of compiling the data and then digitizing the points, the VCDH staff began by adding the points to the "pntsstan" coverage in Arcedit using the next number in the sequence of "pntsstan-id" values. At the same time that the digitizer was adding the points to the "pntsstan" coverage, she recorded each "pntsstan-id" on the hard copy of the GIF images next to the appropriate building. After over 50 unlabeled points were digitized, these points were added to the Excel file with their respective data. The "Map ID" for point corresponded with the "pntsstan-id" that was recorded on the GIF images during the digitization process.

    Ater inputting this data into the Excel file, it was necessary to return to Arcedit and "CALC" point types for each of the digitized points. To determine the point type for points that had been associated with insurance depositions, the digitizer refered to the "MASbuildtype" in the Excel file. Many of these points were classified as both a dwelling and a business. Thus, a "pnt_type" for "Residence and Business" (or 46) was added to the "points.aml" list. When necessary, other point types were added to this same list. Throughout the rest of the county, points labeled with a first and/or last name are classified as "residences." The same rule has been used in Staunton.

    Next, the digitizer returned to Arcedit. At this point, two types of points remained undigitized. First, there were numerous unlabeled points that had not been associated with data and therefore did not require a place in the Excel file. These points were digitized and their point types were classified as "unknown" (or 99). Throughout the rest of the county, points like these (unlabeled and unassociated with data) were given the classification of "residence" (or 1) due to the high likelihood that these points were indeed residences. In the Staunton area, where businesses existed in greater numbers, this assumption could not be made.

    Finally, the last points that required digitization were the churches, mills, factories, cemeteries, etc. That is, points that were labeled, but not by a first and/or last name. These points were coded according to their "l_name" (or label) and "pnt_type," using the "CALC" command. After all of these points were digitized, the "pntsstan" coverage included 1014 records.

    The final phase of the Staunton project is to join the information from the "staunton.xls" file to the Staunton and coverage (see Importing Data Files and Joining with Arc/Info Point Attribute Tables (.pat)) and merge that coverage with the county-wide coverage (see Merging District-Wide Coverages Into County-Wide Coverages).

  12. Preparing the Final Data File

    As of late August, all of the county's electoral districts (and the city of Staunton) have been digitized. With the exception of the city of Staunton, each of the coverages has been joined to its respective data file. Staunton is unique in that its data file includes census, tax record, and fire insurance policy information. The VCDH staff is currently determining whether or not to include this information within the coverage itself or link it to the coverage externally.

    Currently, the VCDH staff is returning to each of the Excel files and taking the necessary steps to ensure that all of the information is accurate. One step in preparing the final data file is to make sure that more than one point has not been associated with the same page, family, and dwelling number. This has already been taken care of within each of the districts, but not between them. Oftentimes Augusta County residents own buildings in two or more districts. In an effort to find individuals who appear in more than one district, the data compiler created a "final.xls" file. This file includes: last name; first name; district; page, family, and dwelling number in the population census; page and line in the agricultural census; and line in the slave owners census. This file was then sorted first by page number in the population census and then by family number. Scrolling down the file, the data compiler was able to find names that had been associated with the same identifiers. In order to determine which of the points was the individual's main residence (or the point to which the populatin census refers), the VCDH staff looked at neighboring points and their associated page, family, and dwelling number.

    In a number of cases, labels on the map that included only the last name queried multiple entries in the population census. In these cases, M_M was used. But, when this same name queried only one response in the agricultural or slave owners census, the compilers of the original Excel file made the mistake of including that individual's data. There is no way of knowing whether or not these matches are accurate. Thus, the VCDH staff is using the "final.xls" file to examine each of the files, cross-referencing the searches on the web, and determining which matches are valid.