The NYC OpenData site has a very appealing sounding dataset here (as of 5/25/2013): Building Footprints - NYC OpenData
Upon downloading it you'll notice it's a directory building_1012.gdb. What?
It's an ESRI File Geodatabase (in GDAL parlance, "FileGDB") and the commenters on the NYC OpenData site are not pleased:
So you're welcome to follow the yellow brick road and perform the pagan ritual of compiling GDAL with FileGDB support, which is beautifully outlined here. But **DON'T BOTHER**.
The proprietary binaries ESRI distributes to access FileGDBs only work with files created by ArcCatalog 10.0 or later. The ones published are made using ArcCatalog 9.3.
For the old format you __need__ ArcCatalog apparently. I've saved you the pain and made a Shapefile available here:
Once you have this as a Shapefile, there's still a few caveats.
I chose to put the data finally into PostGIS, since a) that's where the rest of my data is and b) with a large datasource like this (~300MB) Mapnik seemed to render faster reading from Postgres as opposed to directly from the Shapefile. More to investigate there.
This is what I used to convert the Shapefile to PostGIS:
You'll need the --skip-failures because some of the exported geometries from ArcCatalog are MultiPolygons. So you'll be dropping a few buildings on the floor, but for my use case this doesn't really matter. I'm sure there's a way to get ogr2ogr to play nicely with those.
The building_1012 table has a 'bin' column. This is the Building Identification Number used by the Department of Buildings. You can do a BIN lookup here to find all the metadata: [NYC Department of Buildings](http://a810-bisweb.nyc.gov/bisweb/bispi00.jsp)
Next, load the PostGIS functions and spatial_ref_sys table into our database:
Add a geometry column to our building_1012 table:
Now convert your WKB geometries:
You'll notice that the exported projection is SRID=900914:
To visualize this in TileMill you can use the 'Custom' SRS with the aforementioned proj4text.