scale | length | points |
---|---|---|
tiger 2010 | 1.877432 | 2152 |
tiger 2020 | 1.877094 | 2097 |
500k 2010 | 1.852025 | 201 |
500k 2020 | 1.854092 | 236 |
5m 2010 | 1.767873 | 50 |
20m 2010 | 1.623553 | 27 |
Geographic gazetteer
Describe, retrieve and prepare dataframes with geographic boundaries of various geographic units of the USA.
Source files
Source data files are downloaded from web and cached locally.
Census Bureau: - TIGER Data Products Guide: Which Product Should I Use? - Cartographic Boundary Files. 2018 and before, 2019 and after. Simplified representations of selected geographic areas from the Census Bureau’s MAF/TIGER geographic database. Small scale (limited detail) spatial files clipped to shoreline. - TIGER/Line shapefiles. Most comprehensive geographic dataset in full detail. - Relationship files. These text files describe geographic relationships. There are two types of relationship files; those that show the relationship between the same type of geography over time (comparability) and those that show the relationship between two types of geography for the same time period. - LSAD codes. Legal/Statistical Area Description Codes and Definitions - FIPS codes - Gazeteer reference files - Character encoding. Files from 2014 and earlier use “ISO-8859-1”, 2015 and after use “UTF-8”.
Shapefile format
Census Bureau shapefiles come as zipped folders and can be read directly with geopandas.
XML metadata
Most zipped shapefile folders contain XML documents with metadata. Helper functions here parse these files for inspection. In later years files ending with .iso.xml
adhere to ISO standards and can be more easily parsed for feature descriptions.
Scale
Shapefiles are available in different scale. TIGER is the most precise, then follows 1:500,000, then 1:5,000,000, and 1:20,000,000 is the lowest resolution.
Shapefile revisions change from year to year. Between year differences are clearly visible in all scales except TIGER.
Table below compares boundaries of Tolland County, Connecticut, taken from shapefiles in different years and scales. “Length” column is is boundary length in shape units (degrees), and “points” is the total number of points in the polygon.
Map below visualizes boundary differences.
States
CODE | ABBR | NAME | ALAND | AWATER |
---|---|---|---|---|
01 | AL | Alabama | 131174048583 | 4593327154 |
02 | AK | Alaska | 1478839695958 | 245481577452 |
04 | AZ | Arizona | 294198551143 | 1027337603 |
05 | AR | Arkansas | 134768872727 | 2962859592 |
06 | CA | California | 403503931312 | 20463871877 |
08 | CO | Colorado | 268422891711 | 1181621593 |
09 | CT | Connecticut | 12542497068 | 1815617571 |
10 | DE | Delaware | 5045925646 | 1399985648 |
11 | DC | District of Columbia | 158340391 | 18687198 |
12 | FL | Florida | 138949136250 | 31361101223 |
13 | GA | Georgia | 149482048342 | 4422936154 |
15 | HI | Hawaii | 16633990195 | 11777809026 |
16 | ID | Idaho | 214049787659 | 2391722557 |
17 | IL | Illinois | 143780567633 | 6214824948 |
18 | IN | Indiana | 92789302676 | 1538002829 |
19 | IA | Iowa | 144661267977 | 1084180812 |
20 | KS | Kansas | 211755344060 | 1344141205 |
21 | KY | Kentucky | 102279490672 | 2375337755 |
22 | LA | Louisiana | 111897594374 | 23753621895 |
23 | ME | Maine | 79887426037 | 11746549764 |
24 | MD | Maryland | 25151100280 | 6979966958 |
25 | MA | Massachusetts | 20205125364 | 7129925486 |
26 | MI | Michigan | 146600952990 | 103885855702 |
27 | MN | Minnesota | 206228939448 | 18945217189 |
28 | MS | Mississippi | 121533519481 | 3926919758 |
29 | MO | Missouri | 178050802184 | 2489425460 |
30 | MT | Montana | 376962738765 | 3869208832 |
31 | NE | Nebraska | 198956658395 | 1371829134 |
32 | NV | Nevada | 284329506470 | 2047206072 |
33 | NH | New Hampshire | 23189413166 | 1026675248 |
34 | NJ | New Jersey | 19047825980 | 3544860246 |
35 | NM | New Mexico | 314196306401 | 728776523 |
36 | NY | New York | 122049149763 | 19246994695 |
37 | NC | North Carolina | 125923656064 | 13466071395 |
38 | ND | North Dakota | 178707534813 | 4403267548 |
39 | OH | Ohio | 105828882568 | 10268850702 |
40 | OK | Oklahoma | 177662925723 | 3374587997 |
41 | OR | Oregon | 248606993270 | 6192386935 |
42 | PA | Pennsylvania | 115884442321 | 3394589990 |
72 | PR | Puerto Rico | 8868896030 | 4922382562 |
44 | RI | Rhode Island | 2677779902 | 1323670487 |
45 | SC | South Carolina | 77864918488 | 5075218778 |
46 | SD | South Dakota | 196346981786 | 3382720225 |
47 | TN | Tennessee | 106802728188 | 2350123465 |
48 | TX | Texas | 676653171537 | 19006305260 |
49 | UT | Utah | 212886221680 | 6998824394 |
50 | VT | Vermont | 23874175944 | 1030416650 |
51 | VA | Virginia | 102257717110 | 8528531774 |
53 | WA | Washington | 172112588220 | 12559278850 |
54 | WV | West Virginia | 62266474513 | 489028543 |
55 | WI | Wisconsin | 140290039723 | 29344951758 |
56 | WY | Wyoming | 251458544898 | 1867670745 |
Counties
Source data
Cartographic Boundary Files are available for 1990, 2000, 2010, 2013 and every year after that.
TIGER/Line Shapefiles in shapefile format are available for 2000, 2007 and every year after that. 1992 and 2006 are available in legacy format.
Changes
County changes happen whenever decided by local authoritities. Annually released boundary files reflect boundaries effective January 1 of the reference year. List of changes here.
Substantial county boundary changes are those affecting an estimated population of 200 or more; changes of at least one square mile where an estimated population number was not available, but research indicated that 200 or more people may have been affected; and annexations of unpopulated territory of at least 10 square miles.
CRS in 1990 and 2000 is unknown, created dataframes have “naive geometries”.
Census Tracts
Code is 11 digits: 2 state, 5 county, 4+2 tract.
Changes over time
Major changes to tract codes and shapes change after decennial censuses, with smaller changes in between years.
The first four digits of the tract code are “permanent.” When tracks get large (+8000 residents), tracts are split and 2 digit tag is used (same with the split of splits):
1990 | 2000 | 2010 |
---|---|---|
1000 | 1000.01 | 1000.03 |
1000 | 1000.01 | 1000.04 |
1000 | 1000.02 | 1000.05 |
1000 | 1000.02 | 1000.06 |
The naming convention for merges (population falls below 1,200) and boundary revisions are less clear-cut.
When changes (splits, merges, redefinitions) occur, the relationship of new tracts to old tracts is crosswalked.
There is a master file, as well as two files that provided the identifiers of tracts that were “substantially changed” between decennials. The two files of significantly changed census tracts consist only of a list of census tracts that exhibited a change of 2.5-percent or greater. Tract relationships may be one-to-one, many-to-one, one-to-many, or many-to-many.
Relationship files are currently available for 2010 (relative to 2000) and 2000 (relative to 1990).
ZIP Code Tabulation Area (ZCTA)
ZIP Code Tabulation Areas (ZCTAs) are generalized areal representations of United States Postal Service (USPS) ZIP Code service areas. The USPS ZIP Codes identify the individual post office or metropolitan area delivery station associated with mailing addresses. USPS ZIP Codes are not areal features but a collection of mail delivery routes.
ZCTAs are build from census block, thus blocks can be used as cross-walk to other geographics that partition into blocks. Relationship files are available for blocks, counties, county subdivisions, places, tracts, and for ZCTA changes over time.
Cartographic boundary and TIGER shapefiles. 2000 files are only 3-digit codes. For some reason, the 2010 CB file is almost x10 bigger than other years - 527mb.
2000 | 2010 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | |
---|---|---|---|---|---|---|---|---|---|---|
AREA | X | |||||||||
PERIMETER | X | |||||||||
Z399_D00_ | X | |||||||||
Z399_D00_I | X | |||||||||
ZCTA3 | X | |||||||||
NAME | X | X | ||||||||
LSAD | X | X | ||||||||
LSAD_TRANS | X | |||||||||
geometry | X | X | X | X | X | X | X | X | X | X |
GEO_ID | X | |||||||||
ZCTA5 | X | |||||||||
CENSUSAREA | X | |||||||||
ZCTA5CE10 | X | X | X | X | X | X | X | |||
AFFGEOID10 | X | X | X | X | X | X | X | |||
GEOID10 | X | X | X | X | X | X | X | |||
ALAND10 | X | X | X | X | X | X | X | |||
AWATER10 | X | X | X | X | X | X | X | |||
ZCTA5CE20 | X | |||||||||
AFFGEOID20 | X | |||||||||
GEOID20 | X | |||||||||
NAME20 | X | |||||||||
LSAD20 | X | |||||||||
ALAND20 | X | |||||||||
AWATER20 | X |
Congressional Districts
A geographical and political division in which voters elect representatives to the U.S. House of Representatives. Each state establishes its congressional districts based on population counts, with the goal of having districts as equal in population as possible. (ESRI dictionary)
About Congressional Districts (Census) - All congressional districts population are supposed to be equal throughout the state to equally be able to elect the representative - They don’t cross state lines, but may cross all other classifications such as Census tracts. - They DO cross county boundaries - Map of CT for reference - Closer breakdown of District 1 in CT - States are required to redraw the district lines every 10 years after the Census is released (except single district states)
In 33 states, state legislatures play the dominant role in congressional redistricting. In eight states, commissions draw congressional district lines. In two states, hybrid systems are used, in which the legislatures share redistricting authority with commissions. The remaining states comprise one congressional district each, rendering redistricting unnecessary (AK, DE, DC, MT, ND, SD, VT, WY). Link
Gerrymandering can and often does occur with congressional districts lines to help whomever the in power party is to make them stay in power. Examples
School Districts
The U.S. has more than 13,000 geographically defined public school districts. These include districts that are administratively and fiscally independent of any other government, as well as public school systems that lack sufficient autonomy to be counted as separate governments and are classified as a dependent agency of some other government—a county, municipal, township, or state. Most public school systems are Unified districts that operate regular, special, and/or vocational programs for children in Prekindergarten through 12th grade.
- School districts are complex and have almost no consistency from state to state because they are formulated by the local town government in most public school cases.
- Boundary files
- Since they vary by local government then changes happen every year in many places throughout the US.
Native American Reservations
About | Definitions | Data
Diffrent breakdowns are avalible, going as small as tracts and block groups (link).
Build this module
Converted notebook "nbs/geography.ipynb" to module "rurec/geography.py".