brandon liu; bdon.org : Notes

NYC PLUTO Dataset

date: 29/06/2013
buildingage

(NYC buildings by age. Thanks to @Caged for the color scheme from his Portland map)

The PLUTO dataset - the most extensive land use dataset published by the NYC city government - was made free last week (h/t @SR_Spatial).

I'll walk through the steps I took to join three datasets - Building footprint, PLUTO, and PAD to generate the above map.

Some terminology:

Data quality notes:

Importing building_1012

From my last post - you can consume the Shapefiles directly with ogr2ogr into a PostGIS table.

Then add a date column like follows:

alter table building_1012 add column year numeric(4,0);

importing PLUTO

The CREATE TABLE statement is quite large, it's in the GitHub repo here. After you've created the table, import all 5 boroughs like this: (the Queens dataset had a corrupted row which I corrected in vi).

copy nyc_pluto_13v1 from '/Users/Bdon/workspace/nyc_pluto_13v1/mn13v1.csv' delimiter ',' header csv; copy nyc_pluto_13v1 from '/Users/Bdon/workspace/nyc_pluto_13v1/bk13v1.csv' delimiter ',' header csv; copy nyc_pluto_13v1 from '/Users/Bdon/workspace/nyc_pluto_13v1/qn13v1.csv' delimiter ',' header csv; copy nyc_pluto_13v1 from '/Users/Bdon/workspace/nyc_pluto_13v1/bx13v1.csv' delimiter ',' header csv; copy nyc_pluto_13v1 from '/Users/Bdon/workspace/nyc_pluto_13v1/si13v1.csv' delimiter ',' header csv;

Importing PAD

The CREATE TABLE statement is in this gist. I reduced the PAD dataset to only unique bins, because two rows that have the same BIN just specified an alias address.

copy bobaadr from '/Users/Bdon/workspace/pad13a/bobaadr.txt' delimiter ',' header csv;

Then I normalized the data as follows: Delete any rows from PAD where the BIN is not known.

delete from bobaadr where bin = '1000000' or bin = '2000000' or bin = '3000000' or bin = '4000000' or bin = '5000000';

Create a secondary table which will be unique over BINs.

create table bobaadr_onlyvalid_uniquebin as select * from bobaadr where 1=2; insert into bobaadr_onlyvalid_uniquebin select distinct on (bin) * from bobaadr;

Make PAD joinable over the `borough` column on nyc_pluto_13v1.

alter table bobaadr_onlyvalid_uniquebin add column borocode char(2); update bobaadr_onlyvalid_uniquebin set borocode = 'MN' where boro = '1'; update bobaadr_onlyvalid_uniquebin set borocode = 'BX' where boro = '2'; update bobaadr_onlyvalid_uniquebin set borocode = 'BK' where boro = '3'; update bobaadr_onlyvalid_uniquebin set borocode = 'QN' where boro = '4'; update bobaadr_onlyvalid_uniquebin set borocode = 'SI' where boro = '5';

Now to do the join to populate the building dataset, you will need to do a large join:

update building_1012 b set year = n.year_built from bobaadr_onlyvalid_uniquebin a, nyc_pluto_13v1 n where a.bin = b.bin and a.borocode = n.borough and a.block = n.block_numeric and a.lot = n.lot_numeric;

This will be extremely slow - you should add some composite indices onto nyc_pluto_13v1 on (block, lot).

create index bobaadr_onlyvalid_uniquebin_bin_idx on bobaadr_onlyvalid_uniquebin(bin); create index nyc_pluto_13v1_block_lot_idx on nyc_pluto_13v1(block,lot);

It's also recommended to make sure your local PostGIS is tuned well. on my 8GB macbook air I gave 2GB of shared_buffers to postgresql. These two blog posts are the ones I usually refer to when tuning PostgreSQL: link

The final result is a building dataset that you can color-code by age, as shown above.

Some interesting regions on the building age map:

Brooklyn Heights - the oldest buildings in the city.

bkheights screenshot

Borough Park has an interesting pattern of building ages -

boropark screenshot

Lefferts Manor historic district.

plg screenshot