2025-04-13 Weekly Notes

13 April 2025

So, I finally!!! finished refactoring the code and included the Gini Coefficient in my analysis. So now my code runs smoothly and get the metrics at different spatial levels and some metrics at the building level, including a differentiation between euclidean and manhattan distances to a public park. So I calculated the green inequality using the total number of trees, the available trees within 10, 25, 50, 75, 100 m for all buildings and the accessibility to parks at the same spatial resolution as deprivation is measured in England (image on the left shows the Gini Coefficient for the total number of trees in London; higher is more unequal).

In addition to this, I’ve been preparing on the ESL technical interview happening this week, which is mostly about projects where I’ve used ML or DL. In preparation for this and for Google’s Geo for Good Summit in late August, which I had in my goals this year (they accept application until late April), I resumed my work with their Foundation Model (FM)1. When I started doing my PhD, I came across the local climate zones (LCZ), which are a classification of urban areas based on their physical and morphological characteristics, namely buildings and vegetation. There is only one global dataset, published a couple of years ago that uses some of the same remote sensing products used in the FM. Many approaches to LCZs are done for specific cities by manually labelling regions and training small classifiers (some of them are found in WUDAPT). My hypothesis is that urban features are not uniform from one city to another; for instance, trees are not the same in London to Rio de Janeiro, so an open midrise area is different in those two places, because one may have more deciduous trees and the other more palm trees. So, my guess is that this FM model (and other Foundation models for that matter) can pick up those differences given the huge number of variables, to create a better land use classes in the urban context. I managed to get a working example from the download part (which I had struggled with in the past; don’t use XEE just yet, at least for FM) to a PCA (see second image for a representation of LCZ based on FM; each dot is a 100x100 m pixel in London) and a small neural network. My intention is to submit this to the Geo for Good Summit, so I will be working on this in the next months as well and then trying to combine that with my tree infrastructure analysis of England. I found XBatcher as a good library to generate training batches for Pytorch from Xarray objects, but I am searching for better ways to sample data for the train/test split, particularly in the geospatial context, as XBatcher just generates the chips in the correct size for any n-dimensional array, so the split has to happen before.

In the side project with the Estates division, I managed to sync the internal Esri Python environment with my code (that uses other libraries like pyautocad) and generate the dreaded geodatabase 😂 using arcpy. It’s very difficult to integrate external tools to the Esri ecosystem, so I have to stick to this vector data format, even though there are better ones, as pointed by Michael and Anil.

Finally, I recently got some good news as well, I successfully passed my French course and I got the Kettle’s Yard award from the Department of Architecture (small travel grant).

  1. This is not the official name of the model.