Compiled by:
Steve Thompson and Ariel Lambert, June 1999-August 1999
TABLE OF CONTENTS
OVERVIEW OUTLINE OF PROCEDURES
OVERVIEW
Development of a digital GIS data base for Augusta County, Va. is
based upon the existence of a detailed map of the county draw by Jedediah
Hotchkiss in 1870 and derived in large measure from existing Confederate
Army maps produced during the Civil War. In addition to showing major and
minor roads as well as rivers, streams, and smaller water courses, the
Hotchkiss map is significant in that in shows locations of
over 2000 named structures. Although mills (flouring, saw, and paper),
churches, schools, mines, and a variety of manufacturing establishments
(black smithies, potteries, forges) are shown on the map, the vast
majority of named structures are private residences with the name
corresponding either to the property's owner or inhabitant.
Viewed alone, the Hotchkiss map is capable of providing many insights into
the physical and cultural geography of Augusta Co. during the Civil War
and the immediately subsequent period. The major goal of this project,
however, is to use the Hotchkiss map as a basis for projecting detailed
Census records (population, agricultural, manufacturing, and slave
holding) of the county for 1860 and 1870 into space. The abundant family
names provided by the map provide the key which enables us to link Census
records to inhabited space.
OUTLINE OF PROCEDURES
The original Hotchkiss map was photographed by Special Collections
in Alderman library. In its published form, the map consists of
twenty-four paper sections (refered to as "quads" by the photographer)
arranged in six rows each of which is comprised of four sections and all
of which are affixed to a single canvas backing. So that the map could be
easily folded, ca. 1/2 inch spaces were left between individual map sections.
The Special Collections photographer shot the map in twenty-one sections,
with each photograph corresponding to a 1:1 reproduction of a section/quad
of the original map. The map sections were numbered quad01 through quad24
by the photographer, beginning in the upper left-hand corner of the map and
proceeding from left to right and from top to botton. Thus, the quads were
numbered as follows:
The upper left (NW), upper right (NE), and lower right (SE) quads (numbers
01, 04, and 24) where not photographed as the margins of the county map
did not extend into these sections. A map of the city of Staunton drawn
at a smaller scale occupies the lower left hand corner of the map (quads
17, 18, 21,and 22). The twenty-one photographed sections were delivered
to the VCDH as full color TIF images, each one about 17 MB in size.
The six blocks were saved both as full color, compressed TIF images
(block01-06.tif) and as black and white, uncompressed TIFs
(block01b-06b.tif). The final stage of combining the six blocks into a
single image required, for reasons of file size, that the blocks be
converted to black and white while the geo-registration and rectification
of the resultant image (see below) required an uncompressed TIF image.
The final stage entailed edge-matching and joining the six blocks into a
single black and white image file. This image file, named "augmap," was
saved in both compressed (augmap.tif) and uncompressed (augmap2.tif)
formats. The insert of Staunton wholly contained in block05 was clipped
and saved as the uncompressed "stntn.tif."
Edge-matching and joining of the quads and blocks was a tedious
and not wholly perfectable process as both the paper and the canvas
backing are very elastic and individual sections of the map appear to
have been variably stretched and distorted over the past 130 years. In
addition, the edges of the paper sections were often frayed and worn
so that less than perfect matches often could not be achieved.
Assigning real-world coordinate values to the individual pixels of the
augmap2.tif image, known as "geo-referencing" was carried out in Arc/Info
using the Arc commands REGISTER and RECTIFY. Within REGISTER, links were
initially made to county boundary and hydrology vector line coverages from
the U.S. Census (Tiger/Line data) and reasonable results were obtained.
Better results were achieved by establishing links between the Augusta Co.
image and georeferenced TIF files of 1:24000 scale USGS quadrangle maps
(Digital Raster Graphics (DRGs), as numerous stable points such as
churches, road intersections, etc. could be located on both target
(Hotchkiss) and source (USGS) maps.
Perfect geo-referencing of the Hotchkiss image was not possible due to
various factors. First, as mentioned, the original paper map appears to
have been stretched and distorted significantly. Second, distortions were
undoubtedly compounded both during photography and subsequent editing,
edge-matching, and joining of map sections and blocks. Third, the
cartographic precision of the original Hotchkiss map appears to be less
than that of modern maps of the county. This is particularly notable along
the northwestern and southeastern borders of the county, both of which lie
in mountainous terrain. The most significant departures in the actual
contours of the county's boundary between the Hotchkiss map and modern
maps occur at the southwestern and southeastern corners of the county.
Approximately twenty links were established between the DRG source images
and the Hotchkiss image and included a number of points along the county's
boundary and throughout the internal area of the county. Links were
added and deleted until the RMS error of all links was less than 500
meters. Lower average RMS errors could not be achieved despite much
experimentation. In the main, then, points on the geo-referenced
Hotchkiss image (as indicated by there x,y coordinates) lie no more than
500 meters from their "actual" locations and often times are significantly
closer.
The RECTIFIED Hotchkiss image file is labeled "augmap2r.tif" (with
corresponding `world file' "augmap2r.tfw), and it is this rectified image
that was used as a background for all subsequent digitizing of vector
data.
A series of digital vector coverages have been produced to date using the
rectified raster image of the 1870 Hotchkiss map and this process of
digitizing continues. All digitizing is being carried out within the
ArcEdit module of Arc/Info using "heads up" or "on screen" methods.
Essentially, features are traced from the rectified image, with the
resultant digital "coverages" being in the same real-world coordinate
system as the source image (the UTM coordinate system is being used
throughout this project). Digitized coverages of each of the three basic
types - line, polygon, and point - have been created.
To date, two county-wide line coverages have been digitized, one detailing
hydrology (stream1870) and the other roadways (roads1870). Line features
representing water courses in the hydrology coverage have all been coded
(within a field name "RANK" added to the arc attribute table (aat) of the
coverage so that all streams are classified into one of three types
(major, lesser, and minor). Stream length was the criterion upon which
this classification was based (>12000 m = Rank 1/major, 6000 - 12000 m =
Rank 2/lesser, and <6000m = Rank 3/minor). A second field named "NAME"
was also added to the aat of the hydrology coverage to contain stream
names as they appear on the Hotchkiss map.
Digitized roadways have also been classified according to a
tripartite scheme. Within the aat of the roads coverage a field named
"RD_TYPE" was added to contain this information. Roads classed as type 1
are considered "major roads" and are represented by double solid
lines on the Hotchkiss map. Type 2 roads are "minor" and are represented
by single solid lines on the original map. Finally, type 3 roads or
"paths" are those routes shown by single dashed lines by Hotchkiss. As
with the hydrology coverage a RD_NAME field was also added to the aat to
contain road names, though most of these features are not named on the
original map.
At least one further county-wide line coverage detailing railroads
is envisioned.
The most basic polygon coverage digitized represents the
boundaries of Augusta County. This coverage is named "bord1870".
The only locational information that accompanies the available census data
places individual households within one of three intra-county divisions.
These are named "1st subdivision", "north subdivision" and "Staunton."
Only the boundaries of the Staunton division are currently known, though
basically this internal division of the county appears to correspond
roughly to "north, south, and center." It provides a somewhat finer
spatial grain with which to view the census data, but remains crude even
if the division boundaries can be determined.
The boundaries of six electoral districts plus Staunton are portrayed on
the Hotchkiss map and these have been digitized into a single county-wide
coverage named "dist1870." A single field named "DISTRICT" was added to
this coverage's polygon attribute table (pat) to contain the name of each
district polygon. The Staunton district has not yet been added to this
coverage as this awaits georeferencing and rectification of the insert
image (stntn.tif). Eventually, the electoral district polygons will be
used to contain statistical data aggragated from the household level as
provided by the various censuses.
Polygon coverages of each of the individual electoral district boundaries
have also been created. These were extracted from the above county-wide
coverage. Thus far these coverages have mainly been used in various ways
to assist the digitizing of other data.
Experimentation has begun with the creation of lower order polygons,
currently termed "locales." Locales are areas centered on named places
such as towns or mill seats but also may be defined by the locations of
churches, schools, or other "centralizing" institutions. Polygon
coverages of locales are now being generated by transforming point
coverages using the Arc command THIESSEN (which creates a matrix of
Thiessen polygons around an array of points). Statistical data from the
various censuses will be aggregated at the local level either for the
entire county or within individual electoral districts.
The heart of this project entails the digitization of point
coverages from the Hotchkiss image that record the locations of all named
(and unnamed) structures and establishments that appear on the map. It
is through the establishment of links between map names and census names
that a fully spatially referenced statistical database will be
generated.
The process now being followed entails digitizing all point features on
the map and assigning each a unique identifier that can be used to
join/relate the gis point coverages to a series of data files containing
information regarding matches between points on the Hotchkiss map and
records contained in the 1860 population, agricultural, and slave holding
censuses. The initial task of matching map points to census records was
carried out by VCDH staff prior to the initiation of this gis data base.
Results of this earlier work are contained in an Excel spread sheet named
"aumap.xls." The compilers of this spread sheet worked systematically by
election district, proceeding typically from point to point along roadways,
recording named points along with general locational information (toponym
and reference within a grid that was superimposed over the map) and
indicating whether the point could be matched to a record in any of the
three censuses (pop, agric., and slave). The compilers of "aumap.xls"
clearly maintained the record order in which points were entered/matched
within this spread sheet. Consequently, it has been possible to break
"aumap.xls" up into a series of sheets based upon electoral district.
These district-wide spead sheets are named "beverle.xls," "middle.xls,"
"north.xls," "pastures.xls," "rvheads.xls," and "south.xls." Records for
points within the Staunton electoral district presumably exist, but these
are not contained within "aumap.xls." Digitizing of point features from
the Hotchkiss map, then, is proceeding by electoral district, and once all
districts have been completed these can be joined to form a single,
seamless, county-wide coverage.
Digitizing point features began in the northwestern corner of the map with
the North River District and is proceeding from west to east and north to
south. To date, complete point coverages containing all named and unnamed
points have been digitized for the North River and Middle River Districts;
digitizing of the Pastures district is in progress. To join the district
point coverages to the district data files containing information on
matches to census records, unique identifiers must be created in both the
gis and Excell files that link points with a corresponding Excell record.
IDs were first created for the North River district Excell records simply
by creating a field (MAP-ID) and filling it with a numerical series
beginning with "1." NB signifies that the Excel files have NOT BEEN
sorted and should not be sorted prior to the addition of MAP-IDs for each
record. Once MAP-IDs have been added to a district Excel file, the entire
file is printed to be used to monitor the digitizing process.
Within ArcEdit, adding point features to a coverage entails the software
automatically adding a unique "user-id" to each feature. The user-id
field is the fourth field in the coverage's point attribute table and is
generated automatically. The name of this field is always
Because of the extremely repetive nature of point digitizing (there are
approximately 1000 points to be digitized within each of the 6 electoral
districts) this process has been automated and is now being carried out
with the use of an AML script named "points.aml" This aml, of course,
cannot automatically correct entry errors which must be corrected manually
outside of the script. Once all named points contained with the Excel
file for a given district have been digitized, points.aml can be used to
add and assign point types to any additional locations (usually unnamed)
within the district. This entails adding, typically, several hundred
additional points to those contained in the Excel file. Assignment of
user-ids to these points can take place irrespective of sequence. Once
the last point within a district has been digitized, the next number in
the id sequence can be used to define the first user/map-id to be used in
the next district to be digitized.
Once all points within a district have been successfully digitized, the
next step is to join the data records contained in the corresponding Excel
file to the coverage's point attribute table. Before this is carried out,
however, the Excel file MUST BE checked and cleaned of any erroreous or
ambiguous entries.
Occasionally, transcription or spelling errors are encountered in the
Excel file during the process of digitizing and these should be
corrected. Very occassionally, Excel file records may be encountered for
which no clear point on the map can be found (and thus digitized). In
this case, the Excel file record can be deleted (not deleting the file
record will have no consequence upon the gis data base as lacking a match
to the ids in the point attribute table, the Excel record will not be
imported - as long as the digitizer did not assign this user-id to another
point). A more serious problem arises in the case in which a single point
feature on the map may have been assigned (erroneously) two or more
records by the compilersof the Excel file containing references to Census
record matches (and thus multiple map-ids will have been assigned to such
single points. In this one of the records (and its map-id)must be deleted
from the Excel file before it is joined to the coverage's point attribute
table.
Much of the labor and time required in cleaning the Excel files results
from the fact that the compilers of the file frequently matched multiple
point features with a single census record. That is, multiple point
features on the Hotchkiss share references to a single, unique Census
Page#/Family# (Pop. Cen.) or Page#/Line# (Ag. Cen.). This is a case
of a "many to one" match and may have happened for various reasons.
One common cause is that multiple features on the map often actually are
labeled wth identical names and, thus, appear to be owned by a single
individual. For instance, many points exist on the Hotchkiss map that are
labeled with the possessive form of an individual's name (i.e.
"A. Crawford's). Invariably, however, there will be one point in this
spatial cluster that is not labeled in the possessive.
Although the interpretation cannot be verified, our working assumption is
that points labeled with a possessive represent properties of the named
while points not labeled possessively indicate place of residence of the
named. The compilers of the Census Match files, however, ignoring those
cases labeled in the possessive, typically assigned all points with the
same name,to the same individual (unique record) in the Census
records. While some, perhaps even all of the features labeled with a
possesive may be dwellings owned by the individual indicated, they need
not be. That such points represent barns, outbuildings, or other
agricultural or manufacturing installations cannot be ruled out without
more information. Even if these features are residences, however, it is
important that they be associated with the the Census data related to
their OCCUPANTS RATHER THAN THEIR OWNERS.
If the task of reconstructing the routes of Census takers and of infilling
more point to record matches is profitable, it may be possible to
associate some of these points with Census households as families that
rent their residences can be detected in the Population Census as they
have zero Real Estate wealth.
Many-to-one matches also appear to have other causes. It may also be the
case, quite understandably, that multiple individuals within the county
shared the same first and last names, and thus the possibility exists that
the compilers of the files of matches to the Census records will have
matched inadvertently more than one person/dwelling to the same Census
data record. Such cases can only be resolved, if at all, by examination
of the Page/Family Numbers of nearby matches.
Cases have also been encountered (with the 1860 Census Records) in
which points on the map indicated as belonging to individuals sharing a
common family name but having different first(and middle) names/initials
(e.g. A. Crawford and T. Crawford) have been matched to the same unique
Page#/Family# or Line#. The only explanation for this situation seems to
be the time lag between the recording of the 1860 Census and the drawing
of the 1870 Hotchkiss map. That is, in 1860 A.Crawford and T.Crawford
were listed as belonging to the same family because they were sons of the
same father and lived with him under the same roof, but by 1870 the father
had died and his estate had been divided among his hiers whose names were
recorded by Hotchkiss. The difficulty here becomes that of deciding which
point (if any) of those matched to the Census record should be retained as
the most probably residence of the family's head of household in 1860.
All cases of "many to one" matches are problematic and MUST BE
rectified at this point. If not, the process of joining data tables (.pat
and Excel file) will simply join the first occurence of an ID in the .pat
with its first occurence in the Census Match data file - and this is
dependent simply upon the (arbitrary) order of the records in these two
data files. EACH DIGITIZED POINT ON THE MAP (EACH USER-ID) CAN BE MATCHED
TO ONLY ONE UNIQUE RECORD IN EACH OF THE CENSUSES. LIKEWISE, A UNIQUE
CENSUS RECORD CAN BE MATCHED TO ONLY ONE POINT IN SPACE. To match a
single Census record entry to more than one point on the map will result
in the replication of statistical census data in any aggregation of this
data above the level of the household. In other words, not resolving
cases of one-to-many matches will result in individuals being counted more
than once whenever statistical data is aggregated/summarized at higher
order spatial scales.
As most replications of unique census records with the Excell
matching files probably are based upon the replication of names (either
intentionally or due to a multiplicity of W. Smiths, for example) in the
map, a means of checking for them is to sort the Excel files (but
not before Map-ids have been established) alphabetically on last names.
The file can then be studied for duplicate last names and duplicate
matches to single census records. The information on matches to census
records should be removed fron all records except that one deemed most
likely to represent the primary residence of the individual in question.
It is also possible, however, that census records are duplicated in the
Match files because of transcription/typographical errors (either in the
original documents or in subsequent versioons. A more complete check,
then, entails sorting the files by page# and fam#/line# for
each of the three censuses and checking for additional duplicated matches.
This sorting and checking procedure should follow a sort based on Last
Name, First Name however.
Because of the need to clean the Excel data files of many-to-one matches,
three fields have been added to this original data file. These fields are
named "P60NOTES," "A60NOTES." and "S60NOTES." Contained in these fields
are brief descriptions of why a given point was or was not chosen from a
series of points initially matched to a single record. In these fields,
"prop. of
As a final stage of file cleaning prior to importation of the Census Match
Excel files into Arc/Info, it is imperative that all potentially confusing
characters be removed from or replaced in the Excel files prior to
importation into Arc/Info. Commas, since they will be used as field
delimiters (see below) must be replaced with colons (:). Forward slashes
(/) should be replaced with underscores (_), and single right quotations
(') with single left quotations (`). The fields containing page, line,
and family number information that match records to census records must
contain only numberical data although characters can be cantained within
the original composite fields in which this information was intially
recorded. Thus, remove all "M"s, "N"s, "/" and any other character
information from these fields (PopCen60Page, PopCen60Fam, AgCen60Page,
AgCen60Row, SlvCen60Row). Occassionally these fields may contain
references to more than one census record. When this occurs, all but one
of these references must be removed (though this information should be
retained in another field (Orig, or Name fields).
Are there other problems with these files? Probably, but only
time will tell.
Once an Excel file has been checked and cleaned for any erroreous
data, the file can be prepared for importing into Arc/Info. Arc/Info can
read ascii text files with, by default, comma delimiters. Excel allows
its files to be saved in comma delimited text format (.csv) and a copy of
the corrected file should be so saved and ftp'ed to the vdhc/augusta/data
directory on ptolemy (sending the file as ascii rather than binary data
will prevent record delimiter characters (^M) from appearing at the end of
each record of the text file).
Once on ptolemy, the .csv file should be inspected (use the UNIX command
"more
To open a vi session, from the UNIX prompt type "vi
To exit vi and save changes, type:
Before the text file can be imported into Arc/Info, an Info file must be
created to receive the data records. INFO is the data base module of
Arc/Info and can be entered by typing from the Arc prompt: INFO IS CASE
SENSITIVE AND YOU MUST SET CAP LOCKS SO ONLY UPPER CASE IS USED to issue
commands within this module.
Arc: INFO
ENTER USER NAME> ARC
ENTER COMMAND >
(this is the INFO PROMPT)
The structure of the INFO data file to be created should be the following,
in which one field is created for each field to be imported with field
types and widths corresponding to the data to be imported:
The INFO command to create such a data file is DEFINE
ENTER COMMAND >DEFINE PNTSMIDDLE.DAT
ENTER COMMAND > Q STOP
(ends INFO session and returns to Arc: prompt)
This process of DEFINING info data files to receive records for the Excel
'census match' files could be automated but such a script has not been
written.
Info files (including pat's, aat's, and .dat files) reside within an info
subdirectory within the workspace in which they were created and are not
visible with a simple "list" command from either to Arc or Unix prompts.
Listing the contents of the info subdirectory, further, will only return a
list system-defined names of all files and not file names assigned by the
user. To view user names of INFO files within a given workspace, enter
the Arc command "dir info" or, from within INFO, the command "DIR"
Before proceeding, it is a wise idea to create a copy of your empty INFO
file as if the adding of data is unsuccessful you will have to delete the
entire file and start over with the DEFINE command. Having an empty
backup is one way of saving time. To copy an INFO file, use the Arc
command COPYINFO:
Arc: copyinfo pntsmiddle.dat pntsmiddbak.dat
Comma delimited records can be imported only from within INFO using the
ADD command with the FROM option (specifying the entire path of the file
to be imported). You must first have selected the INFO file into which
data is to be imported/added:
ENTER COMMAND >SELECT PNTSMIDDLE.DAT
ENTER COMMAND >ADD FROM ~
HOME/GICLAB/SMT6Z/VDHC/AUGUSTA/DATA/PNTSMIDDLE.CSV
If all goes well, after a pause while INFO reads the data a response will
appear that INFO as added n number or records to the selected file. If
the value of "n" corresponds to the number of records in the text file all
is well. If it is less, then that means that certain records were skipped
over because of difficulties with specific field entries within those
records - fields may have been incorrectly defined (numberic rather than
character, not wide enough to receive records, or a host of other
problems). To find out the ids of the offending records, use the INFO
command LIST followed by the -id field to see which records in the
sequence were omitted:
ENTER COMMAND >SEL NORTH.DAT
ENTER COMMAND >LIST PNTSNORTH-ID
If needs be, the entire list can be copied to an xedit file and printed
(lpr
If the ADD process was less than perfect, you must delete the file and try
again. To delete an INFO file use the INFO command ERASE (the file must
first be selected):
ENTER COMMAND > SEL PNTSMIDDLE.DAT
Now you can return to Arc and make a new copy from your backup (copyinfo
pntsmiddlebak.dat pntsmiddle.dat) in preparation for your new try.
Once all records are successfully added, the .dat file can be
joined to the corresponding .pat of the point coverage. This ensures that
the fields containing the information on matches to census records are
physically joined in the gis data base. To join files use the Arc command
JOINITEM
Arc: joinitem
Always name the out_info_file exactly as the in_info_file, otherwise you
will create a data file for which there is no corresponding coverage.
As yet, no exact procedure has been established for joining/relating the
resulant coverages/pats with the actual census data files. For the
population census, the two fields POPCEN60PAGE and POPCEN60FAM together
form a unique identifier for an individual family. Again, we are more
interested in dwellings than in families, so we also will want to add this
number to our .pat for each matched record.
Various spacial scales above that of the household can be created for
which statistical data derived from the Censuses can be aggregated. One
such supra-household level is the "locale" or "neighborhood," within which
groups of points/households are grouped with one on the basis of their
shared proximity to a place/point that has been deemed to be the
center of a locale. While locales can be defined in a wide variety of
ways, any given system must contain within a rationale for determining the
locations of locale center points. One way of systematically imposing a
matrix of locales upon geographical space, for example, would be to
establish center points at standardized intervals, say every 5 miles.
No point on the map, then, would lie more than 4.9999 miles from a center
and it is to the closest center that a given point would be associated.
Given that we already know much about the distribution of residences and
other types of functional points, it makes more sense to use existing
geographical data to define locales. Named places
are abundant on the Hotchkiss map as are churches, schools, post offices,
etc. Any of these features can be used to define the center points for
locales.
Point coverage of features that share a common POINT_TYPE code can be
quickly extracted from a digitized coverage. To define locales on the
basis of named places or other features that have not been previously
digitized or uniquely coded, a new point coverage will have to be
digitized.
To extract a point coverage of features of the same POINT TYPE go into
Arcplot and use the RESELECT and WRITESELECT commands:
Arcplot: resel pntsmiddle point pnt_type = 2
quit from Arcplot and use Arc command RESELECT
Arc: reselect pntsmiddle midchurch point midchurch.sel point
This creates a new point coverage named in this example "midchurch" that
contains all of the points in the Middle River Electoral District that
have a PNT_TYPE code of 2 (meaning church).
To convert this point coverage to a polygon coverage of locales
centered upon these churches use the THIESSEN command in Arc. However,
before issuing the THIESSEN command you must add the arcs representing the
border of the electoral district to the point coverage of churches. This
is because you want your locale polygon coverage to extend
over the entirety of the district. The map extent of the point coverage,
however, is determined by the distribution of points and will be less than
that of the entire district. Add boundary arcs in Arcedit:
Arc: thiessen midchurch midchurch2
The coverage named here
Once a polygon coverage of locales has been created these should be (if
not already) assigned unique names or some sort of identifier. The next
step is to create a field within the original point coverage into which
the locale name/id can be added. To add locale name/id to the point
attribute table, open an Arcedit session:
SELECT with the polygon option allows you outline a shape with the mouse
and select all points contained. As closely as possibly, outline a shape
corresponding to the boundaries of a locale (shown as a backcoverage).
Once points are selected in this fashion:
Arcedit: CALCULATE
Continue using the SELECT command with polygon option until all locale
boundaries have been duplicated and a locale name/id value added for all
points. Check work by selecting points for which locale name/id is void to
make sure all points were successfully captured during
the SELECT process.
With a locale field filled for all points, any data associated with points
can be summarized using the STATISTICS command in Arcplot;
Given that Census Takers presumably followed linear routes, at
least on a daily basis, and that the order of the original census records
is based upon the order in which information was collected it is
theoretically possible to reconstruct the routes of the Census Taker by
drawing a line that connects points according to sequential
page#/family#s. Being able to reconstruct the route of the Census Taker in
geographical space holds the potential of allowing us to make additional
matches of between mapped points and census records to the extent that we
can confidently recreate the routes on the basis of existing matches.
There are at least two ways of drawing a recreated Census Taker route.
Both methods require that the information matching named points to
specific Census records be physically joined to the relevant point
attribute table.
The first method is completely manual and entails displaying points (in
Arcedit or Arcplot) along with corresponding Page# and Family# for all
matches. The user then must visually inspect the map and attempt to draw
a line connecting all matched points in sequence. While all matched
points on a single page or within a range of pages can be reselected to
minimize the "clutter" at any time, this is still a cumbersome and time
consuming method.
The second method of recreating Census Taker routes relies upon Arc/Info
commands and functions. The first step is to sort the point attribute
table of a point coverage by Census Page# and Family#. This is done in
INFO. NOTE, THAT SORTING FEATURE ATTRIBUTE TABLES WILL COMPLETELY DESTROY
THE RELATIONSHIP BETWEEEN X,Y COORDINATE PAIRS AND ATTACHED ATTRIBUTE
INFORMATION UNLESS THE TABLE IS RESORTED TO ITS ORIGINAL ORDER. FEATURE
ATTRIBUTE TABLES ARE ORGANIZED ACCORDING TO THE FIELD
Arc: info
ENTER COMMAND >SEL PNTSMIDDLE.PAT
ENTER COMMAND >RESEL POPCEN60PAGE NE 0
ENTER COMMAND >SORT ON POPCEN60PAGE,POPCEN60FAM
ENTER COMMAND >Q STOP
Now use the UNGENERATE command in Arc to create an output text file
containing an x,y coordinate pair for each record in the resorted point
attribute table for which a match to Population Census exists.
Arc: ungenerate point pntsmiddle midpnts.gen
v
This file contains three columns of data. The first is a record number,
the second the x-coordinate of the point, the third the point's
y-coordinate. Open the file in Excel and delete the first column (record
number) and save the file as comma delimited text. Open the file in XEDIT
and add a beginning record number (any number will do) to the first
line of the file. The final two lines of the file should both contain
"END". Thus,
999 [record number]
Now use the GENERATE command in Arc to create a line coverage that
connects the points in the text file, reading each point as a subsequent
node in the line. Since the .pat was sorted on page#/fam#, the resultant
line will connect all matched points according to the sequence of
collection. Of course the course of the line will be affected by the
number of unmatched families between any two matched points.
Arc: generate midline
generate: input midpnts.csv
generate: quit
Arc: build midline line [To create line topology]
You now have a line coverage named
Arc: INFO
ENTER COMMAND >SEL PNTSMIDDLE.PAT
ENTER COMMAND >SORT ON PNTSMIDDLE#
ENTER COMMAND >Q STOP
Now, the line coverage created can be displayed in Arcplot or Arcedit and
compared to Census records to see if additianal matches are possible.
Point coverages have been created in a series of coveragas based on
Electoral Districts. The join these into a single county-wide coverage
use the APPEND command in Arc (MAPJOIN will not work for point coverages).
Arc: append
Before using APPEND it is important to check that all items, their
definitions (type and width), and the item order in the files to be
appended are identical to one another. The command will not work
otherwise. Therefore, before issuing APPEND, use the LIST command to
verify that the .pats of all coverages to be appended are defined the
same.
The NONE option must be used to ensure that user-ids are not modified
during APPEND. (NONE is the default option). Note, however, that if the
point attribute tables of the appended coverages contain any duplicate
user-ids the output coverage will maintain the duplicate id values.
Arc: append newcov point none
Appending coverages...
Associating point features with census data records is the last
major step to be accomplished by the project. Because of time
constraints, this has not yet been done. Several necessary steps are
envisioned, however, and these are outlined here.
Compiling data for Staunton and then digitizing the city's points
involved a process similar to the one explained in Checking and Cleaning Census Match Excel Files, which
was used to complete the digitization of each of the county's electoral
districts. However, the VCDH staff was forced to manipulate this
process in order to accomodate Staunton's unique circumstances. Before
reading the following explanation of these changes, be sure to study
the procedures that were used to compile data for and then digitize the
rest of the county.
Staunton, located in the Beverle District, exists on the Hotchkiss map
in two forms: on the "augmap2r.tif" image and in more detail as an
insert which was clipped and saved as "stntn.tif." To include points
within the insert, "stntn.tif" was georeferenced and rectified,
resulting in "stntnbr.tif." Inevitably, the two images did not fit
perfectly. That is, when "stntnbr.tif" was drawn over the larger map,
the various line coverages (streams, roads, and railroads) did not
follow perfectly the features on the insert. To remedy this, the
digitizer edited the coverages.
Compiling an Excel file for Staunton was difficult and time-consuming.
Prior to the initiation of the data base, VCDH staff produced an Excel
spread sheet for Staunton by cross-referencing census and tax record
information. The file, called "Stcynew.xls," included the following
information for all of the city's tax payers: last name, first name,
other, population census information, agricultural census information,
slaveowner census information, acres, rods, poles, residence, estate,
lot number, building value, lot and building value, tax amount, city
tax amount, and notes. While the "aumap.xls" file maintained the
record order in which points were entered, the "Stcynew.xls" file did
not. It was therefore impossible to locate names on the map simply by
following the list of names in the file. The "Stcynew" file also
included a number of names that did not exist on the map and could
therefore not be included in the final Excel file.
A second source of information for Staunton was the Staunton
Fire Insurance Depositions. Compiled between 1850 and 1860 by "The
Mutual Assurance Society Against Fire on Buildings of the State of
Virginia," the depositions include the following information: policy
number, policy holder's name, location of building, bordering homes or
businesses, occupant's name, building value, total value of the policy,
and a description of the one or more buildings included in the policy.
Company agents also drew sketches of individual insured buildings.
Thus, each of the policies is linked on the Valley site to a
preliminary drawing of the buildings on that block. These sketches and
their associated policy information allowed the project's staff to
associate names (and various information) with points that were not
labeled on the Hotchkiss map.
The first step in compiling Staunton's data was to scan the
"stntnbr.tif" image for labels and produce a list of these names. These
names were then cross-referenced with the "Stcynew.xls" in order to
determine whether or not tax record information was available. The tax
records use a coding system for the locations of buildings which
includes: N for New "Town," O for "Old Town," B for "Beverley
Addition," S for "Staunton," and OL for "Outlying." This coding system
also includes numbers that refer to tax grid blocks. An image of
Staunton's tax
grid
can be accessed through the Valley's insurance
deposition index. The image is called "taxgrid.tif." The coding system
used in the "Stcynew.xls" file
made it possible to determine which tax record was associated with which
buidling. Unfortunately, the tax grid does not include areas
classified as "Staunton" or "Outlying." Thus, it was impossible to
associate a tax record with a name that appears more than once on the
map in either of these areas.
Cross-referencing labeled points with the tax record information
produced a list of 226 points. Some of these names were successfully
associated with tax record data, while others were not. Points were
often clustered near a label. In these cases, it was assumed that all
of the points within a lot (which is contained within a
polygon on the Hotchkiss map) belonged to the person whose label appeared
in that lot. Each of these points was then linked to its appropriate census record
information (population, agricultural, and/or slaveowner), if possible.
Because only one point can be associated with each census record (see
Checking and Cleaning Census Match Excel Files), one
of the points within a lot was matched to its occupant's census record
while the rest of the points were classified as the property of that
person.
The city of Staunton raised a number of issues for the VCDH staff
concerning occupancy versus ownership. The staff used the following
reasoning in determining how to be consistent and accurate with regard
to data compilation:
After cross-referencing and inputting tax record and census data, the
spread sheet compiler accessed the Staunton
Fire Insurance Depositions on the web and added appropriate
information to the list of 226 labeled points. Information relating to
the insurance depositions includes: policy number, building, location,
bordering properties, building value, total policy value, building
type, year, and description.
At this point, an Excel file existed which included 226 points as well
as their census, tax record, and/or insurance deposition information.
In order to digitize these points, however, it was necessary to create
a field for the "Map ID" and fill this field with a series of unique
numbers. The first "Map ID" for the "pntsstan" coverage was 2037,
because the last point in the "pntsbev" coverage was 2036. Although
the Excel file for Staunton was not complete at this point, the VCDH
staff went ahead and began the digitizing process in Arcedit. The
"pntsstan" coverage was created and the "pntsstan-id" field was added
to the "pntsstan.pat." The digitizer then added the 226 points to the
coverage, making sure that each point was given the appropriate
"pntsstan-id."
The next phase of the Staunton project was to associate unlabelled
points on the map with information from the census records, tax records
and insurance depositions. The tax grid and the preliminary drawings of
blocks available on the insurance deposition page allowed the VCDH staff
to relate names and other information with points that Hotchkiss failed
to label. The process for this phase of the project was reversed.
Instead of compiling the data and then digitizing the points, the VCDH
staff began by adding the points to the "pntsstan" coverage in Arcedit
using the next number in the sequence of "pntsstan-id" values. At the
same time that the digitizer was adding the points to the "pntsstan"
coverage, she recorded each "pntsstan-id" on the hard copy of the GIF
images next to the appropriate building. After over 50 unlabeled points
were digitized, these points were added to the Excel file with their
respective data. The "Map ID" for point corresponded with the
"pntsstan-id" that was recorded on the GIF images during the digitization
process.
Ater inputting this data into the Excel file, it was necessary to return
to Arcedit and "CALC" point types for each of the digitized points. To
determine the point type for points that had been associated with
insurance depositions, the digitizer refered to the "MASbuildtype" in the
Excel file. Many of these points were classified as both a dwelling and
a business. Thus, a "pnt_type" for "Residence and Business" (or 46) was
added to the "points.aml" list. When necessary, other point types were
added to this same list. Throughout the rest of the county, points
labeled with a first and/or last name are classified as "residences." The
same rule has been used in Staunton.
Next, the digitizer returned to Arcedit. At this point, two types of
points remained undigitized. First, there were numerous unlabeled points
that had not been associated with data and therefore did not require a
place in the Excel file. These points were digitized and their point
types were classified as "unknown" (or 99). Throughout the rest of the
county, points like these (unlabeled and unassociated with data) were
given the classification of "residence" (or 1) due to the high likelihood
that these points were indeed residences. In the Staunton area, where
businesses existed in greater numbers, this assumption could not be made.
Finally, the last points that required digitization were the churches,
mills, factories, cemeteries, etc. That is, points that were labeled,
but not by a first and/or last name. These points were coded according to
their "l_name" (or label) and "pnt_type," using the "CALC" command.
After all of these points were digitized, the "pntsstan" coverage
included 1014 records.
The final phase of the Staunton project is to join the information from
the "staunton.xls" file to the Staunton and coverage (see Importing Data Files and Joining with Arc/Info Point
Attribute Tables (.pat)) and merge that coverage
with the county-wide coverage (see Merging District-Wide
Coverages Into County-Wide Coverages).
As of late August, all of the county's electoral districts (and the city
of Staunton) have been digitized. With the exception of the city of
Staunton, each of the coverages has been joined to its respective data
file. Staunton is unique in that its data file includes census, tax
record, and fire insurance policy information. The VCDH staff is
currently determining whether or not to include this information within
the coverage itself or link it to the coverage externally.
Currently, the VCDH staff is returning to each of the Excel files and
taking the necessary steps to ensure that all of the information is
accurate. One step in preparing the final data file is to make sure
that more than one point has not been associated with the same page,
family, and dwelling number. This has already been taken care of within
each of the districts, but not between them. Oftentimes Augusta County
residents own buildings in two or more districts. In an effort to find
individuals who appear in more than one district, the data compiler
created a "final.xls" file. This file includes: last name; first
name; district; page, family, and dwelling number in the population
census; page and line in the agricultural census; and line in the slave
owners census. This file was then sorted first by page number in the
population census and then by family number. Scrolling down the file, the
data compiler was able to find names that had been associated with the
same identifiers. In order to determine which of the points was the
individual's main residence (or the point to which the populatin census
refers), the VCDH staff looked at neighboring points and their associated
page, family, and dwelling number.
In a number of cases, labels on the map that included only the
last name queried multiple entries in the population census. In these cases,
M_M was used. But, when this same name queried only one response in the
agricultural or slave owners census, the compilers of the original Excel
file made the mistake of including that individual's data. There is no
way of knowing whether or not these matches are accurate. Thus, the
VCDH staff is using the "final.xls" file to examine each of the files,
cross-referencing the searches on the web, and determining which matches
are valid.
:%s/
[the general search and replace command structure is :%s/target
value/replace value/g
INFO EXCHANGE CALL
03/06/1999 15:02:37
INFO 9.42 11/11/86 52.74.63*
Copyright (C) 1994 Doric Computer Systems International Ltd.
All rights reserved.
Proprietary to Doric Computer Systems International Ltd.
US Govt Agencies see usage restrictions in Help files (Help
Restrictions)
(enter ARC as user name at this prompt)
ITEM NAME
WIDTH
OUTPUT
TYPE
N.DEC
PNTSNORTH-ID
4
5
B
-
L-NAME
40
40
C
-
F-NAME
40
40
C
-
DISTRICT
25
25
C
-
LOC-NOTES
30
30
C
-
COUNTY
12
12
C
-
MAP#
2
2
I
-
GRID#
2
2
C
-
ORIG1860POP
50
50
C
-
POPCEN60
9
9
C
-
POP60_NOTES
50
50
C
-
POPCEN60PAGE
4
4
I
-
POPCEN60FAM
4
4
I
-
POPCEN60DWELL
4
4
C
-
POPCEN60NAME
50
50
C
-
ORIG1860AG
50
50
C
-
AGCEN60
9
9
C
-
AG60_NOTES
50
50
C
-
AGCEN60PAGE
4
4
I
-
AGCEN60LINE
4
4
I
-
AGCEN60NAME
50
50
C
-
ORIG1860SLV
50
50
C
-
SLV60_NOTES
50
50
C
-
SLVCEN60LINE
4
4
I
-
SLVCEN60NAME
50
50
C
-
ITEM NAME,WIDTH [,OUTPUT WIDTH] ,TYPE [,DECIMAL PLACES] [,PROT.LEVEL]
1
ITEM NAME>PNTSMIDDLE-ID,4,5,B
5
ITEM NAME>L_NAME,40,40,C
45
ITEM NAME>F_NAME,40,40,C
85
ITEM NAME>DISTRICT,25,25,C
110
ITEM NAME>LOC_NOTES,30,30,C
140
ITEM NAME>COUNTY,12,12,C
152
ITEM NAME>MAP#,2,2,I
154
ITEM NAME>GRID#,2,2,C
156
ITEM NAME>ORIG1860POP,50,50,C
206
ITEM NAME>POPCEN60,9,9,C
215
ITEM NAME>POP60_NOTES,50,50,C
265
ITEM NAME>POPCEN60PAGE,4,4,I
269
ITEM NAME>POPCEN60FAM,4,4,I
273
ITEM NAME>POPCEN60DWELL,4,4,I
277
ITEM NAME>POPCEN60NAME,50,50,C
327
ITEM NAME>ORIG1860AG,50,50,C
377
ITEM NAME>AGCEN60,9,9,C
386
ITEM NAME>AG60_NOTES,50,50,C
436
ITEM NAME>AGCEN60PAGE,4,4,I
440
ITEM NAME>AGCEN60LINE,4,4,I
444
ITEM NAME>AGCEN60NAME,50,50,C
448
ITEM NAME>ORIG1860SLV,50,50,C
498
ITEM NAME>SLV60_NOTES,50,50,C
548
ITEM NAME>SLVCEN60LINE,4,4,I
552
ITEM NAME>SLVCEN60NAME,50,50,C
602
ITEM NAME> return
ODD RECORD LENGTH ROUNDED UP TO EVEN
0 RECORD(S) SELECTED
350 RECORD(S) SELECTED
$RECNO PNTSNORTH-ID
MORE?
ENTER COMMAND > ERASE PNTSMIDDLE.DAT
THIS COMMAND WILL ERASE THE SPECIFIED DF
DO YOU WISH TO CONTINUE ( Y OR N ) > Y
Usage: JOINITEM
{start_item} {LINEAR | ORDERED | LINK}
Arc: joinitem pntsmiddle.pat pntsmiddle.dat pntsmiddle.pat
pntsmiddle-id
(Usage: RESELECT
Arcplot: writeselect midchurch.sel pntsmiddle point
(Usage: WRITESELECT
Arcplot: q
(Usage: RESELECT
Arcedit: edit midchurch
Arcedit: de point arc
Arcedit: draw
Arcedit: editfeature arc
Arcedit: get distmidd
(Usage: GET
This command copies in the district boundary arcs.
Arcedit: save
Saving the coverage automatically extends its map extent
Arcedit: quit
(Usage: THIESSEN
Arcedit: edit pntsmiddle
Arcedit: de points
Arcedit: backcover midchurch2 3
Arcedit: backenvironment arc
Arcedit: arcplot textjustification center center
Arcedit: arcplot textquality tightkern
Arcedit: arcplot polygontext midchurch2 name
[writes locale names on screen]
Arcedit: draw
Arcedit: ef point
Arcedit: select polygon
(Usage: SELECT
Arcplot: reselect pntsmiddle point locale = 'XXX'
Arcplot: statistics pntsmiddle point
Statistics: sum total_pop
Statistics: sum total_slaves
Statistics: end
Arcplot:
INFO EXCHANGE CALL
07/06/1999 17:20:05
INFO 9.42 11/11/86 52.74.63*
Copyright (C) 1994 Doric Computer Systems International Ltd.
All rights reserved.
Proprietary to Doric Computer Systems International Ltd.
US Govt Agencies see usage restrictions in Help files (Help
Restrictions)
ENTER USER NAME>arc
562 RECORD(S) SELECTED
146 RECORD(S) SELECTED
(Usage: UNGENERATE
x,y [records of x,y coordinates of all points]
x,y
x,y
x,y
....
END
END [be sure to enter a carriage return at the end
of this line]
(Usage: GENERATE
generate: line
INFO EXCHANGE CALL
07/06/1999 17:39:35
INFO 9.42 11/11/86 52.74.63*
Copyright (C) 1994 Doric Computer Systems International Ltd.
All rights reserved.
Proprietary to Doric Computer Systems International Ltd.
US Govt Agencies see usage restrictions in Help files (Help
Restrictions)
ENTER USER NAME>ARC
Usage: APPEND
Enter Coverages to be APPENDed (Type END or a blank line when done):
=====================================================================
Enter the 1st coverage: pntspstr
Enter the 2nd coverage: pntsmiddle
Enter the 3rd coverage: end
Arc: