I'll walk through the steps I took to join three datasets - Building footprint, PLUTO, and PAD to generate the above map.
- BIN: The Building Identification Number assigned by the Dept of Buildings.
- PLUTO: a CSV file (one for each borough) that contains all tax blocks/lots.
- PAD: An address database. Notably, this can contain multiple rows for one unique combination of (BIN, Boro, Tax Block, Tax Lot)
Data quality notes:
- There is a significant amount of missing data in the PAD dataset. Notably, all the null-valued buildings for each borough have a BIN of 1000000, 2000000, 3000000, etc.
- PLUTO uses boro like 'MN' 'BX' BK', while PAD uses 1,2,3.
- You should reduce tax lots and blocks to numeric valued columns, since what is 405 in PLUTO is 00405 in PAD.
From my last post - you can consume the Shapefiles directly with ogr2ogr into a PostGIS table.
Then add a date column like follows:
The CREATE TABLE statement is quite large, it's in the GitHub repo here. After you've created the table, import all 5 boroughs like this: (the Queens dataset had a corrupted row which I corrected in vi).
The CREATE TABLE statement is in this gist. I reduced the PAD dataset to only unique bins, because two rows that have the same BIN just specified an alias address.
Then I normalized the data as follows: Delete any rows from PAD where the BIN is not known.
Create a secondary table which will be unique over BINs.
Make PAD joinable over the `borough` column on nyc_pluto_13v1.
Now to do the join to populate the building dataset, you will need to do a large join:
This will be extremely slow - you should add some composite indices onto nyc_pluto_13v1 on (block, lot).
It's also recommended to make sure your local PostGIS is tuned well. on my 8GB macbook air I gave 2GB of shared_buffers to postgresql. These two blog posts are the ones I usually refer to when tuning PostgreSQL: link
The final result is a building dataset that you can color-code by age, as shown above.
Some interesting regions on the building age map:
Brooklyn Heights - the oldest buildings in the city.
Borough Park has an interesting pattern of building ages -
Lefferts Manor historic district.