I'll walk through the steps I took to join three datasets - Building footprint, PLUTO, and PAD to generate the above map.
BIN: The Building Identification Number assigned by the Dept of Buildings.
PLUTO: a CSV file (one for each borough) that contains all tax blocks/lots.
PAD: An address database. Notably, this can contain multiple rows for one unique combination of (BIN, Boro, Tax Block, Tax Lot)
Data quality notes:
1) There is a significant amount of missing data in the PAD dataset. Notably, all the null-valued buildings for each borough have a BIN of 1000000, 2000000, 3000000, etc.
2) PLUTO uses boro like 'MN' 'BX' BK', while PAD uses 1,2,3.
3) You should reduce tax lots and blocks to numeric valued columns, since what is 405 in PLUTO is 00405 in PAD.
From my last post - you can consume the Shapefiles directly with ogr2ogr into a PostGIS table.
Then add a date column like follows:
alter table building_1012 add column year numeric(4,0);
The CREATE TABLE statement is quite large, it's in the GitHub repo here. After you've created the table, import all 5 boroughs like this: (the Queens dataset had a corrupted row which I corrected in vi).
copy nyc_pluto_13v1 from '/Users/Bdon/workspace/nyc_pluto_13v1/mn13v1.csv' delimiter ',' header csv; copy nyc_pluto_13v1 from '/Users/Bdon/workspace/nyc_pluto_13v1/bk13v1.csv' delimiter ',' header csv; copy nyc_pluto_13v1 from '/Users/Bdon/workspace/nyc_pluto_13v1/qn13v1.csv' delimiter ',' header csv; copy nyc_pluto_13v1 from '/Users/Bdon/workspace/nyc_pluto_13v1/bx13v1.csv' delimiter ',' header csv; copy nyc_pluto_13v1 from '/Users/Bdon/workspace/nyc_pluto_13v1/si13v1.csv' delimiter ',' header csv;
The CREATE TABLE statement is also in (the gist)[https://github.com/bdon/mappluto_postgresql].
I reduced the PAD dataset to only unique bins, because two rows that have the same BIN just specified an alias address.
copy bobaadr from '/Users/Bdon/workspace/pad13a/bobaadr.txt' delimiter ',' header csv;
Then I normalized the data as follows:
Delete any rows from PAD where the BIN is not known.
delete from bobaadr where bin = '1000000' or bin = '2000000' or bin = '3000000' or bin = '4000000' or bin = '5000000';
Create a secondary table which will be unique over BINs.
create table bobaadr_onlyvalid_uniquebin as select * from bobaadr where 1=2; insert into bobaadr_onlyvalid_uniquebin select distinct on (bin) * from bobaadr;
Make PAD joinable over the
borough column on nycpluto13v1.
alter table bobaadr_onlyvalid_uniquebin add column borocode char(2); update bobaadr_onlyvalid_uniquebin set borocode = 'MN' where boro = '1'; update bobaadr_onlyvalid_uniquebin set borocode = 'BX' where boro = '2'; update bobaadr_onlyvalid_uniquebin set borocode = 'BK' where boro = '3'; update bobaadr_onlyvalid_uniquebin set borocode = 'QN' where boro = '4'; update bobaadr_onlyvalid_uniquebin set borocode = 'SI' where boro = '5';
Now to do the join to populate the building dataset, you will need to do a large join:
update building_1012 b set year = n.year_built from bobaadr_onlyvalid_uniquebin a, nyc_pluto_13v1 n where a.bin = b.bin and a.borocode = n.borough and a.block = n.block_numeric and a.lot = n.lot_numeric;
This will be extremely slow - you should add some composite indices onto nycpluto13v1 on (block, lot).
create index bobaadr_onlyvalid_uniquebin_bin_idx on bobaadr_onlyvalid_uniquebin(bin); create index nyc_pluto_13v1_block_lot_idx on nyc_pluto_13v1(block,lot);
It's also recommended to make sure your local PostGIS is tuned well. on my 8GB macbook air I gave 2GB of shared_buffers to postgresql. These two blog posts are the ones I usually refer to when tuning PostgreSQL: link
The final result is a building dataset that you can color-code by age, as shown above.
Some interesting regions on the building age map:
Brooklyn Heights - the oldest buildings in the city.
Borough Park has an interesting pattern of building ages.
Lefferts Manor historic district.